They got it wrong – again. Despite most opinion polls and forecasts stating that Hilary Clinton would beat Donald Trump in the US presidential election, the reverse happened. Of course, you could argue that the pollsters were dead-on correct: polls called a tight race with Clinton shading it, and that’s exactly what happened – Clinton won the popular vote, after all – but Trump routed her in terms of electoral votes.

But in-depth polls were also done state-by-state, not least by pollster guru Nate Silver at FiveThirtyEight, who calculated that Trump had just a 29% chance of winning. Conservative voters were hugely underestimated, but how?

So did ‘shy’ Trump voters lie to pollsters? Are forecasts based on the wrong data? And can new technology – some of it from a -shocked Silicon Valley – help breathe new life into an industry that’s now in severe danger of being discredited?

Pollsters use questionnaires, demographics and algorithms ( Credit: Wikimedia)
How do opinion polls work?

Opinion polls are all about extrapolating trends from a relatively small data sample. The pollster asks people how they intend to vote, or how they did just vote, and algorithms are applied to create a demographically balanced national picture.

In a country of 231 million potential voters – although around 100 million don’t actually vote – it’s always going to be based as much on assumptions as on actual data. Key to this is voter turnout, which is very hard to predict; there’s simply no data on it until after election day. 

“The challenge of making any prediction from data is to make sure that the data is representative,” says Matt Jones, Analytics Strategist at data science consultancy Tessella. “Traditional statistical analysis of polling data and surveys will only be representative of those that bothered to take part, and that section of the voting population is not representative.”

Polls are given huge gravitas by the media to the extent that they can be decisive in whether people bother to vote or not – so they can swing an election.

Do pollsters need to use social media?
Limited data

Machine learning is already used when running election predictions. It’s part of standard statistical analysis. “As for any statistical analysis the single most critical factor is the amount of data available on which to run your algorithms, base your predictions,” says Claus Jepson, Chief Architect at Unit4. “As of the data set available is simply too limited to offer precise predictions, making it necessary to include human interpretations – hence making the predictions biased.”

For example, pollsters decide how much statistical weight to give to how many historical election results. “At some point in time the data available will be large enough for algorithms to effectively predict, less biased, outcomes based on polls,” thinks Jepson. 

Social media and sentiment analysis

Some of that ‘new’ data is from social media, which looks set to become a fresh tool for pollsters looking to changing opinions. “The use of ‘social listening’ of social media conversations and behaviour may have been an early warning of possible contradictions from official polls,” says Mark Skilton, Professor of Practice in the Information Systems & Management Group at Warwick Business School.

This is the science of sentiment analysis – when people write things in and Facebook posts, it’s possible to extract positive, negative, or neutral attitudes. No one is suggesting that pollsters just use to predict elections, but it can be used to improve a purely statistical model by adding a vital dynamic dimension. 

For example, BJSS SPARCK analysed 14 million tweets before the election and correctly predicted the outcome, uncovering that seven out of every ten tweets sent in the last four weeks of the campaign were in favour of Trump.

“When they use social media, people become less guarded about their true social and political affiliations,” says Simon Sear, Practice Leader of BJSS SPARCK. “Their language becomes unfiltered, they ‘like’ content that appeals to them and follow people and organisations which represent their values … contrast that with having to admit embarrassing sentiment and intentions to a potentially judgemental human pollster.” 

Machine learning and AI

Such sentiment analysis, however, comes with a heavy workload and also requires mathematical models. “There are three ways to make improved predictions – a better model, better data, and more data,” says Jeremy Perlman, VP Europe for Trifacta, which helps RBS, Santander and PepsiCo analyse data. “The problem is that data created on social media and the web is expanding at a ridiculous rate, so machine learning will be critical to making better predictions at massive scale.”

Jeremy Perlman, VP Europe for Trifacta

Since computing is increasingly exponential with the birth of super-computing in the cloud, the need to analyse more and more data shouldn’t be a major hurdle. “Computational devices can very effectively, with high precision and rapidly, gather millions of tweets, posts or similar and run sentiment analysis – to understand likes and dislikes,” says Jepson. “This, together with data from polls, will increase the precision of predictions.” 

During the EU Referendum vote in the UK, cognitive technology company Expert System and the University of Aberdeen analysed a sample of 5,000 tweets collected on June 20-21 to uncover voting intentions. It found that 64.75% of tweets from the UK were inclined to leave the EU. That overstated the final result by over 10%, which leads to an obvious conclusion; voters who preferred to leave the EU may merely have been more active on Twitter.

“For it to be possible for AI to predict an election outcome, one would need to analyse every available data variable that could affect a conscious human decision – including the weather forecast and historical data – in order to predict what the crowd will do,” says Dmitry Bagrov, MD UK of DataArt. “Any predictive platform would require access to every possible data variable and the capability to process this mountain of information.”

Could AI and machine learning be used by pollsters?
'Shy voter' problem

What if people lie to pollsters, or decide to vote when they usually don’t bother? The former feeds false data into models, and the latter comes without any data. “Pollsters have to contend with an increasing phenomenon known as ‘shy voters’; these are voters who don’t want to say who they’ll be voting for out of fear and/or embarrassment,” says Andrew Cameron-Webb, founder of AI-based social media management platform WeLikeIt.

He thinks shy voters were a big factor in the Scottish independence and EU referendums as well as in Trump’s success. “Pollsters need a more complete solution, a system that can track millions of separate data points on Twitter, Facebook, Google and YouTube, constantly monitoring public engagement and excitement about the candidates.” He calls it a ‘system that never sleeps’.

If social media can predict elections, why bother counting votes?
Small margins

Although he was one of many pollsters that got it wrong, Nate Silver underlined just how difficult it is to forecast elections in his post-election blog by investigating what would have happened if just one out of every 100 voters shifted from Trump to Clinton. “That would have produced a net shift of 2 percentage points in Clinton’s direction,” he writes, “giving her a total of 307 electoral votes … and she’d have won the popular vote by 3 to 4 percentage points, right where the final national polls had the race and in line with Obama’s margin of victory in 2012.” 

The lines are very fine in voter forecasting, but the use of sentiment analysis to constantly monitor the change in public mood could result in terrifying tech-led constitutional change. For if social media can be used to predict elections, why not use it to automatically elect a presidential candidate, or decide a referendum? Or even to justify contentious decisions when in office, using it as an ongoing and refreshed mandate?

“We’ll soon be capable of determining the result of an election with extremely high precision and accuracy,” says Jepson, who wonders if an election result could be produced just by sentiment analysis of social media. We use them to choose what news we read and what music we listen to, so why not elect politicians by algorithm? “Who knows, maybe the election process itself will become irrelevant in the future.”

Back in the summer, we asked: how will Brexit affect the UK's tech industry?

Related posts: