Posts tagged Stats Monkey

Computers Fall Short of Sports Reporters

The journalism industry is constantly being labeled as a dying profession, especially the print side. Leaders in the field and observant on lookers monkey-on-computerfrequently make statements to the liking of they do not know if newspapers will be around in five years. It is a pretty harsh statement to make when papers such as the New York Times and Washington Post are each generating larger profit margins than Wal-Mart and Dell combined, according to forbes.com. The state of the business as a whole is too large a topic to address here, but a specific job in the profession has come under a more pressing opponent recently.
Northwestern University’s Intelligent Information Laboratory, or InfoLab, recently had students create a software program that can generate an “average game day story”, called Stats Monkey. The software takes information about a game, specifically baseball at this point in time, like the pla-by-play and box score and generates a story based on two technologies

“By analyzing changes in Win Probability and Game Scores, the system can pick out the key plays and players from any baseball game. Second, the system includes a library of narrative arcs that describe the main dynamics of baseball games: Was it a come-from-behind win? Back-and-forth the whole way? Did one team jump out in front at the beginning and then sit on its lead? The system uses a decision tree to select the appropriate narrative arc. This then determines the main components of the game story and enables the system to put them together in a cohesive and compelling manner.”

The system sounds reasonable and easy enough. But the question on everyone’s mind when they hear about a robot, or a computer if you want to be more technical about it, writing is, “Does it apply the correct uses of grammar and sentence flow to create a piece people can and want to read?”
BOSTON — Things looked bleak for the Angels when they trailed by two runs in the ninth inning, but Los Angeles recovered thanks to a key single from Vladimir Guerrero to pull out a 7-6 victory over the Boston Red Sox at Fenway Park on Sunday.
Guerrero drove in two Angels runners. He went 2-4 at the plate.
“When it comes down to honoring Nick Adenhart, and what happened in April in Anaheim, yes, it probably was the biggest hit (of my career),” Guerrero said. “Because I’m dedicating that to a former teammate, a guy that passed away.”

Guerrero has been good at the plate all season, especially in day games.

During day games Guerrero has a .794 OPS. He has hit five home runs and driven in 13 runners in 26 games in day games.

After Chone Figgins walked, Bobby Abreu doubled and Torii Hunter was intentionally walked, the Angels were leading by one when Guerrero came to the plate against Jonathan Papelbon with two outs and the bases loaded in the ninth inning. He singled scoring Abreu from second and Figgins from third, which gave Angels the lead for good.

The Angels clinched the AL Division Series 3-0.

Angels starter Scott Kazmir struggled, allowing five runs in six innings, but the bullpen allowed only one runs and the offense banged out 11 hits to pick up the slack and secure the victory for the Angels.

J.D. Drew drove in two Red Sox runners. He went 1-4 at the plate.

Drew homered in the fourth inning scoring Mike Lowell.

“That felt like a big swing at the time,” said Drew. “I stayed inside the ball and put a good swing on it. I was definitely going to be ready to battle again tomorrow, but it didn’t work out.”

Drew has been excellent at the plate all season, especially in day games. During day games Drew has a .914 OPS. He has hit five home runs and driven in 17 runners in 36 games in day games.

Papelbon blew the game for Boston with a blown save. Papelbon allowed three runs on four hits in one inning.

Reliever Darren Oliver got the win for Los Angeles. He allowed no runs over one-third of an inning. The Los Angeles lefty struck out none, walked none and surrendered no hits.

Los Angeles closer Brian Fuentes got the final three outs to record the save.
Juan Rivera and Kendry Morales helped lead the Angels. They combined for three hits, three RBIs and one run scored.

Four relief pitchers finished off the game for Los Angeles. Jason Bulger faced four batters in relief out of the bullpen, while Kevin Jepsen managed to record two outs to aid the victory.

The story was generated on information from this year’s Angels-Red Sox October 11 playoff game. The story is nearly perfect. It hit on all of the major plays of the game, had quotes at appropriate times by appropriate players, and even had good syntax and grammar. However, the main story line for the game was the fact the Angels clinched the division series and were moving on the AL championship series against the Yankees.

While Stats Monkey is a good program it should not, and will not, replace monitor_01sports journalists for one reason. So much of reporting is more than just the “average game day story,” just like the fact congressional reporters have more work to do than report on the votes of bills. Ask any sports journalist, the game story is the most boring, and challenging for the same reason, part of the job. Each baseball team plays 162 games a year, in just the regular season. For a sports writer to write a piece for each game that is unique and interesting to read is not the high point of the job. Sports journalists, like congressional reporters again, enjoy getting into conversations with the players and coaches, getting on the field and in the clubhouse. Basically they want to write the stories behind the story, or game.

Alex Rodriguez is a great baseball player who has had a storied career. But despite that, one of the major story lines of this year’s post-season was that he was finally playing good in the playoffs unlike any other year before. Computers looking at stats might not have picked out the fact that while ARod was having huge number and was the player of most of the games, the underlying story was that he can now shake the stigma of not being able to win in the postseason.

Rich Gordon was one of the four professors who were in charge of the program Stats Monkey emerged from. Days after Stats Monkey went public, Gordon said it is unreasonable for journalists to get worried about their jobs because of this software program. He says it should be a good tool for sports journalists. It can free a reporter immediately after a game to do interview and have his story written, write a story about games professional journalists do not cover consistently, cover individual players, and let little league coaches get word to their growing fan bases.

Gordon too points out the program has its shortfalls, “it cannot account for events that don’t show in the box score or play by play (for instance, the infamous play in a 2003 Chicago Cubs playoff game in which a fan caught a foul ball that might otherwise havee been fielded for an out).”

Another play that comes to mind would be from this seasons World Series. Alex Rodriguez hit a line drive in game 3 that was initially ruled it did notmlb_ap_arod_umps_412 leave the park and the result of the play was a double, but video replay showed the ball hit a television camera and bounced back in resulting in a two run homer. The Yankees went on to win the game 8-5.

As Gordon says, “If your game story CAN be generated by a computer, at some point it WILL be generated by a computer. Human journalists will do–and should do–the kind of reporting and storytelling that computers can’t.”

Leave a comment »