Project Overview
My Google Summer of Code project aimed to create a Twitter bot that would be able to participate in debates around Twitter, find tweets and debates that are overly angry and try to calm them down. While creating this bot, it has also been my task to take the existing Seed application, transform it into a Node module and add some capabilities to it. This secondary aim tied in nicely with the main focus as Seed was used for response generation. All of the aims of the project were fulfilled. However, there still is potential for improvements and future work on the bot by extending its capabilities.
Twitter Bot
The Twitter bot is a mixture of a Node.js frontend and a Python backend. The backend is there for analysis of tweets using pretrained neural network models. It is implemented as a microservice, served by a Flask server routing requests to relevant analysers. Apart from sending the tweet analysis requests, the frontend provides capabilities for connecting to Twitter for reading and sending Tweets and connecting to Seed for generating responses. A detailed description of all of its parts can be seen in the official GSoC repo.
Dataset
Since the task at hand is quite specific, there was a need for custom made datasets. There are two datasets in total, one for each analysis task. The final_dataset.csv is a dataset of about 7000 tweets, downloaded, cleaned and annotated by me, including columns for text, topic and anger level. The anger_dataset.csv contains 5000+ tweets, split 50/50 into angry and non-angry ones. These are partially outsourced and partially made up of the tweets from the final_dataset. The code for creating the datasets is in the GSoC repository, along with both of the datasets.
Context
There are many approaches for tackling the de-escalation in online settings. In this project we decided to follow the rules of the non-violent communication and conflict resolution. This type of communication aims for a calm discourse with the other party, regardless of their manner of communication. The aim is to shift focus from the sentiment (anger/hate..) on to the real source of the sentiment. It is characterized by honest attempts to understand what and why the other party feels. This can be achieved for example by posing questions about the topic, giving the other party space for expressing themselves and arriving at the root of the actual problem and so on. The ability to analyse topic of the tweet, as well as having access to the historic anger levels of the communication, makes this Twitter bot quite adept at carrying out tasks like these. The generated text is made random and human-like enough with the use of Seed that the 'botness' of the bot does not pose as big of a problem.
Pretotyping
Pretotyping is a type of pre-release product testing which focuses on making sure the final product is worth making, rather than being concerned how to make that product. Before releasing the bot into the wild waters of the internet, I did a number of sessions on a Twitter account called OpenDiscussion during which I was acting like a bot myself, looking for angry tweets to respond to, taking detailed notes of each step and 'generating' responses. This was done as a way of actually seeing how people would react to our bot, what sort of interaction makes them even more angry and which actually seems to help. It also benefited the creation of the bot in making us aware of many pitfalls that there are and which the bot would have to deal with. The notes can be seen in the pretotyping subfolder of the official repo.
All in all, the pretotyping sessions were a success. Although the response rates were not that great (people mostly did not respond) this is not an issue for a bot. When the people did respond, however, in the majority of cases, the anger levels dropped or disappeared completely. Being a bot for a minute, trying to act without any preconceived notions or opinions that I have, I could see that the biggest flaws of a bot when leading discussions with humans (no preconceived notions or opinions, does not experience anger, does not get offended) are also its biggest strengths. By not getting angry itself, the bot automatically takes down the anger levels of discussion, making it more civil and calm.
Seed Improvements
Over the course of this project, the original Seed application has been made into a Node module, which is now available as seedtext at the NPM repository. Here is the GitHub repository of seedtext. It has also been extended with capabilities of conditional generation, mentioned in this GitHub issue of the original Seed app, and with the possibility to define custom methods for importing Seed sketches or using built-in methods.
Seed by default generates totally random variations of the text. Through conditional generation, this randomness can be controlled. The Twitter bot uses this functionality to vary its wording and tone depending on the range of the anger level (using words like 'upset' at anger level > 0.5 and < 0.6 and using words like 'enraged' at anger level > 0.9). This enables the generated text to be much more subtle and human-like.
Changes to the importing of sketches and transformation into Node module make Seed available to a much wider audience. Previously, the capabilities of the application were tied to the web environment, now it can be used virtually anywhere.
Future Work
There are several possible ways in which my summer work on this GSoC project could be extended. One of the possible ways would be by extending the Twitter bot capabilities. Currently, the bot lacks a comprehensive database solution for participating in true debates. There is also the possibility of improving the Seed source files, making the generated text better. Improving the capabilities of the analysis backend and the neural networks in it is also a possibility. At the moment the test set accuracy for topic analysis is around 81%. With a bigger and better dataset, this could easily go higher.
I already talked with my supervisor about possible ways in which I could collaborate on the project in the future. We talked about rewriting the original Seed repository so that it uses newly made seedtext module and adding the possibility of creating bots with the push of a button to the web application that the original Seed currently has. I am looking forward to helping make these plans reality.
Acknowledgements
At the end of this report, I would like to express my thanks to the whole team of people from CLiPS and beyond whom I had the honour to meet this summer. Special shoutout goes to my supervisor, Frederik, who has been really great over the whole duration of the internship. I am also happy that I got the chance to meet other GSoC students this year and I hope we'll stay in touch in the future. Last, but not least, I would like to thank Google for making GSoC possible, it was a wonderful experience.