S_ATTsd : Phase 2 (Formerly S_ATTws)

See: S_ATTws : Phase 1info

Here is Phase 2 of my statistic comparing a team’s payroll to their attendance, and adjusted by winning percentage.  New for this version is the method in which teams from different eras are set to an equal standard.  For Phase 1, the salary and attendance numbers were taken as a fraction of the league total, which led to skewed results.  In this version, the ratios are determined by standard deviations away from the mean.  Also a set starting value of 10 was used to adjust for the additions and subtractions caused by the formula.  Without further ado, I present to you Phase 2.meansdev

topAbove are the top 25 teams ranked by S_ATTsd from 1990 to 2010.  The 1994 strike-shortened Montreal Expos top the list.  This is an extremely interesting team to look at.  The early 90’s were one of the most dominant periods in baseball history for the Expos.  Fan interest was high, and the team was on the verge of emerging as one of baseball’s best.  However, no team was hurt worse by the strike than the Expos.  The loss in revenue from ticket sales, TV and radio contracts, as well as the almost certain playoff games left the Expos franchise with little financial freedom.  The team was forced to send away centerpieces Larry Walker,  John Wetteland, Ken Hill, and Marquis Grissom due to the potential revenue lost by the strike.  A decade later, the team relocated to Washington D.C.

Also at the top of the list are Billy Beane’s famous Moneyball Oakland Athletics, the record setting 2001 Seattle Mariners, and the 2008 “new look” Tampa Bay Rays.  With the exception of the Mariners, these teams were able to win a ton of games with a limited payroll, and bringing out a good amount of fans.  The Mariners are an example of the ideal goal of my statistic.  This isn’t just a list of teams who win while not spending, it shows teams who spend well.  The 2001 Seattle Mariners team spent a ton; They had the league’s 11th highest payroll, which made up 3.81% of the league total.  However, the Mariners set the Major League record for wins that year, finishing at 116-46.  This legendary Mariners squad, as well as the 2004 St. Louis Cardinals are examples of teams who spent a lot, but did so wisely.

botThe bottom 25 is about what you would expect.  Here, teams are listed who lost a lot of games and spend a fortune in order to do so.  Some of these teams spent over $100 mil in payroll, just to finish the season nowhere close to the playoff hunt.  The 2003 Tigers have a Z-Score of 4.238. To put that in perspective, CERN needed 5-sigma level confidence to confirm the discovery of the Higgs Boson particle.  The 2003 Tigers vary from the mean by 4.238 sigma.  That team was almost confirmed to be awful on a high-speed particle physics level.  That’s not good.

distribThe above plot shows the distribution of S_ATTsd values.  They appear to be normally distributed with a slight skew to the right.  This skew means that it is more common for teams to spend efficiently than inefficiently.  However, the long right tail shows that there have been cases of teams with unnaturally high S_ATTsd values.

I’ve uploaded my spreadsheet in the downloads section.

S_ATTws : Phase 1

Click for a full-size image

After spending some time looking through the database from Baseball-Databank, I started looking into something that I found interesting.  I began examining a team’s total payroll as it relates to their attendance.  The correlation between the two isn’t phenomenally strong, but it’s an interesting topic to examine.  Payroll is a component of a team’s expenses, and fans attending the games bring in revenue.  So this lead to a thought; “Which teams are spending most efficiently”?  I’ve seen opinions where payroll is analyzed directly with wins.  This is a logical thing to look at, but having a winning team doesn’t guarantee that the organization is earning a return for their spending.  They make money by putting people in the seats.  A pure payroll / attendance figure shows which teams pay the least per every fan in the stands, but it doesn’t take the quality of the teams into account.  So, I decided to look further into this, and try to come up with a statistic which measures a team’s spending efficiency, while taking team performance into account.  It can also be thought of as efficiently spending for a successful team.


S_ATTws1HEADThe image to the right explains the context of the table.  It is organized by S_ATTws in ascending order, showing the top 25 teams from 1990-2009.  The 2006 Florida Marlins top the list, featuring young superstars such as Dontrelle Willis, Miguel Cabrera, Josh Johnson, Hanley Ramirez, and Dan Uggla.  After dismantling the 2003 World Series Championship winning team in the off-season, the ’06 Marlins had a payroll of just $15 million, less than 1% of the total MLB payroll.  The Marlins have historically struggled to bring fans out to the ballpark, but they show how a team’s success with limited financial commitments reflects well in S_ATTws.  A polar opposite would be the 1993 expansion Colorado Rockies, who come in at 20th on the list.  Their payroll and attendance numbers are astounding.  In the scatter plot at the top of the page, they’re the dot in the upper-left.  They averaged around 55,000 fans per game, and almost doubled the average total attendance for the year.  They led the league in attendance, while having the lowest payroll in the MLB.  Although the team wasn’t very competitive, as is the norm for first-year expansion teams, their payroll to attendance ratio skews their S_ATTws value downwards.  At 18th are the famous 2002 Moneyballing Oakland Athletics, who won 100+ games, while only making up 1.72% of the total MLB payroll.  However, the 2001 pre-Michael Lewis team is ranked 6 spots higher.  The trend that jumps out to me the most, is how successful the Montreal Expos of the early-mid 90’s were.  They appear in the top 25 six times with a data set range of 8 years.  The Expos are a prime example of the negative affects of the 1994 labor strike.  During the best season in team history, the season was canceled.  Due to the lost revenue from attendance and media contracts, one of the best organizations in MLB history (with the best logo) was forced to the cellar of the NL East, and had to pack up for a move to Washington D.C.  

S_ATTws1MEANstdevThe table above shows the worst 25 teams according to S_ATTws listed in descending order. The 2008 New York Yankees top the list. They went 89-73 while making up 7.74% of the MLB payroll, and led the league in attendance.  The Yankees have been notorious for spending big time money on big time stars ever since the late George Steinbrenner bought the team in 1973.  This table is relevant because it brings up an irregularity in S_ATTws that I will be fixing for Phase 2.  The attendance and salary numbers are ran through the formula as a fraction of the league totals.  This leads to a large difference between the rich and poor teams, due to the overall payroll numbers being spread far apart from each other.  Instead of using ratios, I’ll need to report the numbers as standard deviations away from the mean.  This way, the ratio will be standardized, and big-spending teams will not be penalized as harshly.  I’ve been struggling to calculate the correct standard deviations using SQL, but I’m making progress.  My code is a little bit of a mess right now, I’m sure I’ll look back on it one day and laugh at how sloppy and confusing everything is put together.  Phase 2 will be released when I fix this problem.

I exported the data from the SQL search query into an excel spreadsheet and have placed it in the downloads section.


P.S. In reviewing my post, I’ve realized that I need to standardize a team’s spending based on their market size.  You can’t fault the Yankees for spending all of this money when it’s there at their fingertips.  Maybe also adjust the attendance based on stadium capacity, but I’m leaning against it.  You make money for the pure number of tickets sold, not the percentage of seats filled.

And So It Begins…


So as my winter breaks ends, I’m embarking on a new task. Statistical databases are what makes analytics and sabermetrics possible, so it’s time that I throw myself into that. Through researching on the internet, I’ve found that MySQL is the most useful free tool for databases and search queries. I went ahead and downloaded that along with the MySQL Workbench GUI and the Baseball-Databank Database. I have no previous experience with databases, so this a completely new venture for me. I’m learning SQL along with general information about how databases work. In the image above, I modified code from Hardball Times and Statistically Speaking to output the career home run leaders for the formerly named Florida Marlins (through 3/28/11).

As I explore more into SQL and learning everything that I can do, I’ll start posting my own sabermetric analyses. I’m also looking into script for pulling data from MLB Gameday and Pitch f/x to have the most current and usable information.