The idea...
.through the creation of a computer
simulation, I established a means by which any two major league baseball
teams can be compared. For the purposes of this project, I used the
San Francisco Giants and the Los Angeles Dodgers. The reason for
my selection of the San Francisco Giants is that they are the team I follow.
The reason for selecting the Los Angeles Dodgers is slightly different.
Not only are the Giants and Dodgers long time rivals, but their statistics
are easily comparable, where one team will not always dominate the other.
.the comparisons are based on
the team batting statistics from the 2000 MLB Season. The use of
these batting statistics allows for the establishment of a foundation for
each team's batting averages based on the team's statistical breakdown
of the exact number of singles, doubles, triples and homeruns. The
teams have similar batting averages but contrastingly different statistical
breakdowns. Given this information, the results are quite interesting.
The exact numerical breakdown can be seen on the tree diagram. The
diagram illustrates, numerically, exactly what happens when the program
is run.
...how
it works...
.the program was created
entirely in Maple. The program uses random numbers to produce results
based on the inputs given to it. The program selects a random number
(throws a pitch), which determines a numerical output (in the strike zone
or out of the strike zone). If the pitch is out of the strike zone,
one of four things may happen: the pitch is a ball (four balls results
in a walk), the pitch hits the batter, the pitch is a strike (out of the
strike zone but swung at by the batter, three strikes results in an out),
or the pitch is hit hit into fair territory. If the pitch is in the
strike zone, one of two things may happen: the pitch is a strike (swung
on and missed by the batter or watched, three strikes results in an out),
or the pitch is hit into fair territory.
.once the pitch has been hit
into fair territory a random number is again selected to decide if the
result will be an out or a base hit. If the result is an out, the
program stores it as such. After three outs, the computer breaks
its cycle and ends the inning. If the result is a base hit, a random
number will again be selected to decide if the base hit is a single, double,
triple or homerun. The program tallies all runs scored at the end
of one inning as well as the number or pitches thrown. The program's
output shows runs scored, pitches thrown and the pitch-by-pitch result
of the inning.
.with the establishment of this
basic program a second program was created. The second program runs
the basic program a given number of times, in most cases nine (with the
exception of extra inning games) to simulate a complete game. The
output yields the total number of runs and pitches for the game as well
as a pitch-by-pitch account of the game.
.a third program was then created
to run the second program. The purpose of the third program was to
establish useful numerical results. The third program outputs the
average runs and pitch count for 100 games.


Minimum: 1.06
q1: 1.6, q2: 2.41, q3: 3.22
Maximum: 3.86
Line of best fit: y = -2.33+17.45x

Minimum: 1.31
q1: 1.71, q2: 2.51, q3: 3.19
Maximum: 3.94
Line of best fit: y = -2.31+17.63x



...National
League Championship Series...
.to make my simulation exciting,
I set up a situation where the San Francisco Giants were playing the Los
Angeles Dodgers in the National League Championship Series; the NLCS.
The Giants carried a team batting average of 0.275 going into the series
while the Dodgers carried a 0.265. As predicted, with a higher batting
average, the Giants won the series; they did so in six games. The
run production of the games can be seen in the graph below.

...free
agency...
.the Giants' free agent acquisition
during the season produced large increases in run production. The
graph below represents the increase in run production from one season to
the next. The boxes represent the first season and the crosses the
second. To simplify the graphs, the runs are plotted against the numbers
1-16, representing the averages. The averages vary between .200 and
.350 before the acquisition and .217 and .350 after. The Giants'
acquisition increased the run production from the previous year by an average
of 0.65 runs, shown in the graph on the left. If the Dodgers had
been able to acquire the free agent, their increase in run production would
not have been as productive as the Giants'. The Dodgers' run production
would have increased by an average of only .093 runs. This is seen
in the graph on the right. The blue boxes represent the Dodgers'
run production prior to the acquisition and the black crosses represent
the run production after.



...NLCS...again...
.during the off-season, the
San Francisco Giants, due to their lower payroll, were able to pick up
a free agent, all-star with a .350 batting average. This bolstered
their team average to 0.283. They again met the Dodgers in the National
League Championship Series. This time, however, despite picking the
new acquisition, the Dodgers beat the Giants in seven. The run production
of the games can be seen in the graph below.

...evaluation
of results...
.in the first NLCS the results
produced are exactly what I expected. The team with the higher batting
average and what I considered to be a better statistical breakdown won
the series. The numbers for the two teams were close enough to make
the results interesting, but not close enough to disprove my hypothesis.
However, in the seconds NLCS I was perplexed. The Giants free agent
acquisition bolstered the team average to 0.283, 0.008 points higher than
the previous year, while the Dodgers remained at 0.265. The Dodgers,
however, were able to win the second NLCS in seven games. There are
two peculiar aspects to the second NLCS. The first is that both the
Giants and the Dodgers average run production decreased quite drastically
from one year to the next, particularly strange given the Giants' free
agent acquisition. The second thing is that the Dodgers managed to
win the second NLCS despite having a lower run production average than
the Giants for the seven game series. Also peculiar is the Giants
inability to win the second NLCS despite having an average 0.018 points
higher than the Dodgers. I believe the program's use of random numbers
is to site for these peculiarities...
...evaluation
of program
.there are some aspects that
are not included in the program. Some assumptions needed to be made
for the program given time constraints and lack of in depth programming
knowledge. The assumptions can be divided up into four categories:
| Offensive assumptions:
1) No mental errors 2) No base stealing |
Batter assumptions:
1) Statistics based on 2000 S.F. Giants and L.A. Dodgers |
| Defensive assumptions:
1) No mental errors 2) No fielding errors 3) No special plays |
Pitcher assumptions:
1) No wild pitches 2) No pitcher fatigue 3) Uniform pitcher |
.some of these assumptions may produce results that do not coincide with realistic baseball results. For example, the pitch count produced by my program is extremely low when compared to real life results. Through experimentation, though, I have noted that slight variations in the 'in the strike zone out of the strike zone' percentages could cause the pitch count to vary up or down by as much as fifteen pitches. The program's use of random numbers as a means for producing results could also allow for unrealistically high or low outputs. However, in defense of the outputs produced by my programs, if I wanted the outputs to be identical to reality, I would have gone and bought a video game.
References
www.majorleaguebaseball.com
www.cnnsi.com
www.sfgiants.com
www.dodgers.com
Hogg and Tanis: Probability
and Statistical Inference
Dr. Cynthia Wyels