Hard data

Hard data

By Tim Heaton

April 5th 2012 at 10:02AM

Tim Heaton looks at the importance of statistics and analytics for developers

The bright new future is data driven. Big dataset management, statistical analysis and visualisation of data are key skills in the future of business and will become more and more valuable.

Many huge corporations are based upon data analysis at their core, and they take that in all kinds of interesting directions.

From Google.org’s flu tracking based upon search queries, through Sergey Brin’s part-personal search for the key to Parkinson’s Disease using data rather than traditional medical analysis, to Zynga and the real-time data feeds from its games, analysis of data is the new rock and roll.

We have a telemetry team here at CA, including a head of data for the studio, and although it may sound a very specific, specialised and perhaps dry role, it’s actually one of the widest and most progressive functions in the company.

Here are three examples of where relatively heavyweight data mining is making a difference for us.

THE DEVIL’S IN THE DATA

Firstly, within Total War: Shogun 2 we have a metrics system that has gathered 1.2 Tb of data so far. None of this data is linked to an individual user, but it is hugely valuable.

We have a large private Beta test team, so prior to launch we use the data they generate to help tweak and balance the game.

Post-launch we can see every element of how the game is played – from how long players are engaged and where they drop out and favourite game modes, down to individual battle, army and unit actions.

There are so many combinations of gameplay in Total War that the only real way to understand how they all combine is to suck it and see.

For example, since Shogun 2’s launch there have been 200,388,936 battles involving 393,413,982,039 samurai, of which 159,280,995,410 were slain. Players have played for 1,754 combined years in total.

We feed the analysis of this kind of stuff back into the design and technical teams, and it’s then up to them to use it as they see fit.

Secondly, it used to be that you finished developing a game, put it in a box, and waited perhaps three months for a statement accounting for your sales numbers.

Certainly you’d never be totally clear week-to-week about the numbers you were selling worldwide – it takes significant amounts of time for retailers to feedback on sell-through.

On PC, those days are gone. Because of digital distribution, and Total War’s registration through Steam, we can see large amounts of data in time-frames totally unheard of five years ago.

It gives us a window into sales patterns that we can use to understand how people consume Total War. Again, none of this is linked to an individual user, but we can see hour-to-hour data of purchasing and play.

This means we can understand what effect promotions have, how our pre-orders are working, even down to individual reviews and how they affect online purchases over the next few hours and days.

We can also see patterns in how people purchase across the different Total War titles, and how they engage with extra downloadable content – do they buy available downloadable content when they start playing a new Total War game, or do they wait a while and get into it first? And how long do they wait? We know, but we’re not telling.

And thirdly, the use of data within the production process is also vital. We have project management systems, used slightly differently between the Alien game and Total War, with bespoke data systems added onto a commercial package.

A TEST OF DATA

However, a good example of data analysis is the reporting from our bug database system during final testing phases. Total War is a highly complex game and at Alpha there are potentially thousands of issues, being dealt with by 200 or more developers and testers.

With Total War: Empire we struggled to manage to fix all those bugs in the time we had available. Since then we’ve added person-by-person analysis of progress combined with future forecasting. We analyse individual fix rates, fix success rates, and knock-on rates.

By adding in defect capture rates, regression rates etcetera, we can manage build frequencies, triage certain bugs, and add help where it is needed.

We never use this analysis as a stick. We never confront individuals with ‘you’re not fixing enough bugs’. It just allows us to see bottlenecks and add extra resource, and also to forecast ahead based on data that can’t be argued with.

I spoke to a friend who had created games for Facebook. He explained rule number one. ‘It’s not about the game. Don’t worry too much about that. It’s only about the data.’

We at CA will never feel like that. It’s all about the game, and about the individuals who make it and play it and their insights and passions and experience.

But, I have no doubt that we can make better games, more efficiently and with more focus when we supplement the craft of creating games with some hard data.