Last year, I correctly/luckily predicted Yoenis Cespedes to win the 2013 Home Run Derby. I’m back at it again for 2014, and have another prediction. Read below for the explanation of my method, and projected outcome of the 2014 Home Run Derby; Monday July 14th, 8:00pm EST, Target Field in Minneapolis, MN.
Like 2013, my formula is based on the correlation coefficients (weights) between the individual statistics examined, and the players’ finishing positions in the Home Run Derby. Unlike last year, I now have two data sets to pull from (2012, 2013), and the field of participants has grown from 8 to 10. I looked at the correlation coefficients from 2012 and 2013, then determined the variance (trend) between the two data sets.
I wanted to use the weights from 2012 and 2013 to try and predict reasonable 2014 weights, as well as take advantage of the progression between the two years. I also wanted to differentiate between “good” and “bad” predictors. The average and standard deviation of the absolute value of trends were taken. One standard deviation above the absolute value average of trends was considered to be the threshold between “good” and “bad” predictors. From this point on, all calculations were used with only these “good” stat categories whose absolute value trends fell below the threshold. I also had exceptions to avoid “double counting” of stats. HR would be automatically excluded if HR/FB and FB% are chosen, GB/FB would be excluded if If GB% and FB% are chosen, and FB% would be excluded if IFFB% and OFFB % were chosen. The image above shows the progression of the category weights from 2012 to 2013, and the various 2014 projections. It’s interesting to note the Flyball(IFFB%-OFFB%), Prototypical-Power(HR-ISO-K%), and Batted Ball(GB%-GB/FB-LD%) Clusters.
So, what I decided to do was create a “cone” with an upper and lower bound of players’ potential finishes. The idea is similar to how hurricane paths are tracked. Run all of the models, and take the average. One bound is the average of the 2012 and 2013 coefficients respectively weighted by their average absolute values. The other bound is similar, but trends are thrown in to the weighted average. These weights are then multiplied by a players prominence or lack in a category compared to their competitors, and the sum of these intermediate scores create a final score.
This creates the cone, and the average of the two score bounds is used to rank this year’s Home Run Derby participants.
My model gives last year’s HR Derby champion facing off against who many believe (including myself) to be the most prolific power hitter in all of baseball. It’s going to be a show, for sure. An image of my my final prediction and spreadsheet is below.
This spreadsheet will be posted in the “Downloads” section shortly.