2017 uk GENERAL ELECTION
The base tool was configured within three days to analyse the data and present the findings live. It scaled to over 150 tweets/second.
Interact with the graphs on the right to see how sentiment changed over time, and how turbulent the Referendum night itself was.
Sentiment analysis for UK Parties
How did we do it?
Similar to the US Presidential Eleciton, and the EU Referendum in the past, we wanted to analyse mass public opinion on the four parties. We took the concept from idea to live in 10 working days, with some modifications to the system. The tool is a reusable solution that organisations can customize to their needs: it is low cost, easy to use and quick to deliver.
- Planned and agreed desired outcome Agreed on feature prioritisation and set deadlines for our minimum viable product (MVP).
- Commenced UX Research to create initial wireframes.
- Set up the core social analytics engine.
- Analysed Twitter, searching for specific keywords (around 15 keywords referenced to the target topic).
- Python script generated for offline classification of tweets to build a corpus.
Saved data into a database (AWS Aurora dB)
including: date of tweet, language, author, content and of course, sentiment.
- To link the front end web page and the database, we made use of two Amazon Web Services tools: API Gateway and Lambda.
- From API Gateway we created a URL - which upon requesting triggers a Lambda function written in Python. The Lambda function queries the database, does some light calculations, and returns the results as JSON.
- These results make up the body of the URL’s response, which is then displayed on the webpage for the front-end to work with.
- Tweets are re-loaded into the database with the new sentiment analysis to ensure sentiment classifications are relevant for the current election.
- Created interactive front-end UI from UX wireframes.
- Integrated front-end code into SPARCK Live site.
- A couple of minor production changes for performance.
- Linked front-end code to the live system, and tested against a live stream of >300 Tweets per second to test integrity under load.
- Launched SPARCK LAB to the public through http://sparck.io/lab
What technologies did we use, and why?
Amazon Web Services (AWS) was the obvious choice for an agile, scalable, secure and cost effective environment. Prior to deploying any solution, a great deal of planning and evaluation took place in terms of the architecture, selection of technology components and all related aspects of integration. Every design was carefully reviewed and evaluated against AWS best practices and the AWS Well-Architected Framework and its pillars: Security, Reliability, Performance Efficiency and Cost Optimisation. While a full ISO 27001 ISMS (Information Security Management System) would be too much for this simple project, several advanced AWS checklists were used to ensure the AWS environment is configured in an agile standard, secure and consistent manner that allows to build up on good security and monitoring practices, regardless of future changes. Example of these best practices would include how the AWS account is configured (separate from the billing account), monitored (AWS CloudTrail/CloudWatch and external), documented (Confluence), how access is governed (complex passwords, MFA, central break-glass) and how critical data is stored (separate AWS account/off-site in a secure global repository).
Aurora RDS was chosen, as it is an AWS database engine that combines speed and high-reliability with simplicity and cost-effectiveness. It delivers up to five times the throughput of a standard MySql database running on the same hardware. (source: Amazon – AWS).
Out of the box monitoring CloudTrail was enabled on a Global level (in case any future services would be used, in other regions, thus avoiding any potential future gaps in logging) and security kept in an AWS S3 bucket with MFA delete protection and life cycle archiving. As such each API call is logged and each log entry is validated through the use of digest files, thus allowing to detect whether any log files were changed, deleted of modified since its delivery.
Cloud Watch was used for alarms and notification, as it allows for alerting through alarms and the build in Simple Notification Service.
In short, the chosen technologies allow for an agile environment that is secure, high-performant, elastic and extremely cost effective.