Mapping Contemporary Cinema

Short guide to Big Data

Everything you blog, tweet, search or post is having an impact on what you are going to be watching in the cinema many months from now.  Your use of the web has become a vital asset to producers, distributors and marketing agencies and is factored into their decision making processes. The large pool of data created by billions of users worldwide and harvested and analysed in real time is commonly referred to as big data. While technology-driven changes such as crowd funding and video on demand are significant, it is in fact big data that is changing the industry the most.

At the time of writing there are 2.7 Zetabytes of data in the digital universe. In 2006 the combined measured space of all computer hard drives amounted to roughly 160 Extabytes (Gantz, 2007). For comparison, 500 Extabytes are half a Zetabyte. This amount of data, large as it is, is supposedly going to be 50 times larger by 2020. Big data has been beneficial to many industries over the last few years. Google Flu Trends, for example, maps flu virus activity by using the aggregated data of online searches for flu-related health information. Nate Silver used a similar method to accurately predict the outcome of the 2008 US election down to each state. Some of these processes are still developing, like the LAPD’s system for predicting future crimes by analysing the distribution, correlation and potential consequences of previous crimes.

How then does big data influence the film industry? The answer is, through predicting audience viewing patterns, trends and possible market gaps. To explain this, one first has to look at what big data is and what it does. Big data is generally defined through three V’s: velocity, volume and variety. The data is processed in real time, which allows companies and manufacturers to access it quickly instead of waiting for a long collation and analysis process to take place. Volume stands for the wealth of information available, down to every detail of consumer behaviour. Netflix for example can not only see where you watch something, but also when you fast forward, when you rewind, which day you watch it or what you are browsing alongside what you are viewing. Data can be harvested from, to name a few, tweets, graphics, email addresses and browsing behaviour.

The prime example for successful use of big data would be the previously mentioned Netflix. For one, Netflix uses a recommendation algorithm based on your user history to predict future films you might watch. The aim of this is to increase your consumption, as the more people watch, the less likely they are to cancel their subscription. Therefore the accuracy of this algorithm is incredibly important, which is where the velocity, volume and variety of big data is key. Unlike traditional TV channels, Netflix functions over the internet and thus has direct access to their users preferences and viewing habits. Netflix collects data for each customer which shows them a precise undistorted image of their users’ preferences and habits. The direct access to this information and the speed at which this data is transmitted, allows Netflix to chart audience trends more accurately and more quickly than traditional distribution sources.

Furthermore, Netflix used big data in their decision to commission and produce the TV-Series House of Cards (2013-). Their data told them three vital things: firstly, that a lot of users watched David Fincher’s The Social Network (2010); secondly, that the British television series House of Cards (1990) was popular amongst these same users; and thirdly, that people who watched the British series also watched films starring Kevin Spacey and/or directed by David Fincher. Then, after the series was produced, Netflix created ten different trailers to promote the series to users based on their previously registered interests and tastes. Whilst other rental companies base their service on ratings, Netflix realises “that many of the ratings are aspirational rather than reflecting your daily activity” (Amatriain qtd in Vanderbilt). Instead of asking the user to actively provide feedback, Netflix analyses the data given through user activity and acts accordingly. Whilst people might claim they enjoy foreign films, their user activity might show that they are in reality watching Will Smith films. Thus, big data allowed Netflix to gear their trailers towards the right audiences thereby offering them what they really wanted rather than what they said they wanted. Kevin Spacey fans were shown a trailer featuring him, whereas audiences interested in films with female lead characters saw a trailer predominantly featuring the female cast.

The success of this is evident not only in Netflix’s growth in subscribers, but also in the fact that the likelihood of renewal for their self-produced content was at 70 per cent versus the average studios 35 per cent rate of renewal. It is not surprising then that Netflix also uses big data to pick the films they stream. The cost of licensing limits the site in their distribution capabilities, but by analysing data from users as well as data relating to views on piracy sites, Netflix strives to acquire films and shows that are popular. The show Prison Break (2005-2009), for example, was released on Netflix because it had proved very popular on piracy sites.

However, online streaming is not the only place big data is used in the film industry. IBM, for example, use big data to predict and analyse opening weekend box office earnings. This information, if fine-tuned, can be used to market films more effectively. IBM collected a wide range of data, such as film characteristics (genre, studio, rating, etc.) and online presence (Twitter popularity, Facebook Likes, shares, trailers, reviews). They then created algorithms based on data from 200 previous films. IBM also noticed that there is a relation between social communication and box office sales. Things such as Twitter volume and negative reviews have a direct impact on the box office outcome. IBM’s approach then was to analyse audience behaviour in a detailed way and to classify this data in tables for analysis. They then identified the most important variables such as Facebook likes, Wikipedia entries or website clicks in order to build a predictive model. This includes mapping different sentiments via geographical region, thus allowing studios to tailor their marketing to each location. IBM then predicted the success of ten summer blockbuster openings and was right on seven out of ten films.

Savvy entrepreneurs are using big data in the development stage of a film to optimise the potential financial success. One example for this is the company Worldwide Motion Picture Group led by Vinny Bruzzese. Bruzzese charges $20,000 to advise studios how they can improve their film scripts to maximise future profit. The company uses statistical data from both previous Box Office results and a database of focus group results and surveys to analyse the story structure based on its financial potential. Bruzzese looked at films with bowling alleys in them and concluded that they statistically tend to be box office flops and therefore it is financial unwise to include one (Barnes).

There is, however, an apparent flaw with using big data. Because it relies on past statistics and data, it can only steer film producers to create things similar to those that have already been successful, as was the case with House of Cards. In other words, films such as Best Exotic Marigold Hotel (2012) or Fight Club (1999) would never have been made based on big data as there were no previous trends or signs that films of this type would be successful. However, both these films have done exceptionally well, more so than many films driven by big data analysis. The concern in relation to the film industry then is that studios will limit creativity and be governed primarily by data and statistics. Furthermore, statistics are often biased. This is because “in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal)” (Taleb). With this Taleb means that an algorithm based on user data can more readily analyse the quantity of the input over the quality. Information is not automatically always positive and thus 1000 people could be tweeting about The Avengers (2012), but this might not be positive output. Data is “not simply collected; it is manufactured” (Poole). Thus, there is always an innate bias in the process of choosing the data. A specific target group for example might be specifically inclined to search and post information on Wikipedia, whereas a different audience demographic might prefer journals and newspapers as a source of information. As with the IBM example, certain audience groups do not have a vocal internet presence but make up a large part of the cinematic audience, such as children and the elderly. The algorithm thus would base a conclusion on a sample of frequent internet users, that sample does not necessarily translate into those cinemagoers with the greatest spending power or disposable income.

The question then arises as to how much power can be attributed to statistics and arithmetics in a creative industry. Popularity has never been a foolproof marker of quality and often films that have been lauded creatively have been unsuccessful financially. The Rocky Horror Picture Show (1975) was unsuccessful when first released, but later developed into a highly profitable cult hit. Similarly, if big data had existed in the 1940s, RKO might have refused to fund Citizen Kane (1941) as its complex narrative structure, debut director and commitment to political critique bear all the statistical signs of an unpopular film. While it did not fare well at the box office, the cinematic landscape today would be impoverished without it. Nowadays Citizen Kane is appreciated for its artistic and narrative value and is amongst one of the most highly regarded films in the world. Ultimately, studios, writers, distributors and filmmakers are still faced with the question of whether to go with the statistical numbers or their creative intuition.

There are possibilities that big data might lead to positive change. For example, studio executives might realise that 51 per cent of the audience demographic is female and perhaps try and cater to that market more (perhaps Orange is the New Black (2013- ) is partly driven by this aim). Or big data might enable lesser-known talents with a strong media presence to be given opportunities in cinema. However, there are apparent limitations to statistics. Fluctuations and sub-divisions might be more difficult to track in an algorithm. Thus the question remains how much you can predict the future by creating an algorithm of the past.


Barnes, Brooks. ‘Solving Equation of a Hit Film Script, With Data’. 5 May 2013. Web. 22 Mar. 2014.

Bulygo, Zach. ‘How Netflix Uses Analytics’ n.d. Kissmetrics. Web. 22 Mar. 2014.

Gantz, John F. ‘IDC White Paper: The Expanding Digital Universe’. 2007. 22 Mar. 2014.

McKinsey & Company. ’Big Data The Next Frontier for Innovation’. May 2011. Web. 12 Mar. 2014.

Noseworthy, Graeme. ‘Predicting Relationships between Social Signals and Box Office Sales’. 1 Nov. 2013. IBM Big Data Hub. Web. 12 Mar. 2014.

Poole, Steven. ‘Are You Read Era Big Data’. 29 May 2013. Web. 4 Mar 2014.

Roettgers, Janko. ‘For House of Cards and Arrested Development, Netflix favours big data over big ratings’. 02 Dec. 2013. Gigaom. Web. 4 Mar. 2014.

Taleb, Nassim Nicholas. ‘Big Data Means Big Errors People’. 1 Feb. 2013. Web. 4 Mar. 2014.

Vanderbilt, Tom. ‘Netflix Algorithm’. 8 Jul. 2013. Web. 4 Mar. 2014.

Written by Carla Steinberg, 2014; Queen Mary, University of London.

This article may be used free of charge. Please obtain permission before redistributing. Selling without prior written consent is prohibited. In all cases tis notice must remain intact.

Copyright © 2014 Carla Steinberg/ Mapping Contemporary Cinema


Print This Post Print This Post