The Twitter data in the week up to the referendum, including polling day, clearly showed a bias towards leave, contrary to most traditional polls.
The base tool was configured within three days to analyse the data and present the findings live. It scaled to over 150 tweets/second.
Interact with the graphs on the right to see how sentiment changed over time, and how turbulent the Referendum night itself was.
average sentiment on polling day:
sentiment over time:
count of tweets:
How did we do it?
In the run up to the EU Referendum, we wanted to use the public’s opinion across the globe to see if it was possible to create a low cost, real time, analytics engine. We took the concept from idea to live in less than 10 working days.
The tool is a reusable solution that organisations can customize to their needs: it is low cost, easy to use and quick to deliver.
Organisations can leverage the power of Big Data and social media by themselves without the need to engage expensive digital agencies. We set out to see if we could quickly and cheaply disrupt this market.
In under 2 weeks, and for less than £10K, we built a re-useable solution that can be set up within a few hours to monitor and display the results of a real time event.
- Planned and agreed desired outcome Agreed on feature prioritisation and set deadlines for our minimum viable product (MVP).
- Commenced UX Research to create initial wireframes.
- Set up the core social analytics engine.
- Analysed Twitter, searching for specific keywords (around 15 keywords referenced to the target topic).
- Input sentiment analysis of the Twitter results, based on a sentiment lexicon. Then, calculated on a per-tweet bases immediately after being collected.
- Saved data into a database (AWS Aurora dB) including: date of tweet, language, author, content and of course, sentiment.
- To link the front end web page and the database, we made use of two Amazon Web Services tools: API Gateway and Lambda.
- From API Gateway we created a URL - which upon requesting triggers a Lambda function written in Python. The Lambda function queries the database, does some light calculations, and returns the results as JSON.
- These results make up the body of the URL’s response, which is then displayed on the webpage for the front-end to work with.
- Created interactive front-end UI from UX wireframes.
- Integrated front-end code into SPARCK Live site.
- Linked front-end code to the live system, and tested against a live stream of >300 Tweets per second to test integrity under load.
- Launched SPARCK LAB to the public through http://sparck.io/lab
What technologies did we use, and why?
Amazon Web Services (AWS) was the obvious choice for an agile, scalable, secure and cost effective environment. Prior to deploying any solution, a great deal of planning and evaluation took place in terms of the architecture, selection of technology components and all related aspects of integration. Every design was carefully reviewed and evaluated against AWS best practices and the AWS Well-Architected Framework and its pillars: Security, Reliability, Performance Efficiency and Cost Optimisation. While a full ISO 27001 ISMS (Information Security Management System) would be too much for this simple project, several advanced AWS checklists were used to ensure the AWS environment is configured in an agile standard, secure and consistent manner that allows to build up on good security and monitoring practices, regardless of future changes. Example of these best practices would include how the AWS account is configured (separate from the billing account), monitored (AWS CloudTrail/CloudWatch and external), documented (Confluence), how access is governed (complex passwords, MFA, central break-glass) and how critical data is stored (separate AWS account/off-site in a secure global repository).
Lambda offers a cheap, often free, stateless way to run a large volume of small intensity calculations. In our specific use case, we use it to poll the Aurora Database and work out the number of positive, neutral, and negative tweets we have collected. The service allows up to 1 million requests a month for free, and $0.2 thereafter per million. As long as a job doesn’t exceed a certain RAM limit or 300 seconds, it is an incredibly cost effective way to run calculations very quickly.
Aurora RDS was chosen, as it is an AWS database engine that combines speed and high-reliability with simplicity and cost-effectiveness. It delivers up to five times the throughput of a standard MySql database running on the same hardware. (source: Amazon – AWS).
Out of the box monitoring CloudTrail was enabled on a Global level (in case any future services would be used, in other regions, thus avoiding any potential future gaps in logging) and security kept in an AWS S3 bucket with MFA delete protection and life cycle archiving. As such each API call is logged and each log entry is validated through the use of digest files, thus allowing to detect whether any log files were changed, deleted of modified since its delivery.
Cloud Watch was used for alarms and notification, as it allows for alerting through alarms and the build in Simple Notification Service.
In short, the chosen technologies allow for an agile environment that is secure, high-performant, elastic and extremely cost effective.