Improving political discourse analysis on Twitter with context analysis


In this study, we propose a new approach to perform political discourse analysis in social media platforms based on a widely used political categorisation schema in the field of political science, namely, the Comparative Manifestos Project’s category schema. This categorisation schema has been traditionally used to perform content analysis in political manifestos, giving a code that indicates the domain or category of each of the phrases in the manifestos. Therefore, in this work we propose the application of this political discourse analysis technique in Twitter, using as training data of 100 publicly available annotated political manifestos in English with around 85,000 annotated sentences. Furthermore, we also analyse the improvement that using 5,000 annotated tweets could provide to the performance of the political discourse classifier already trained with political manifestos. Finally, we have analysed the 2016 United States presidential elections on Twitter using the proposed approach. As our main finding, we have been able to conclude that both datasets (political manifestos and annotated tweets) can be combined in order to achieve better results, achieving improvements in the F-Measure of more than 15 points. Moreover, we have also analysed if contextual information such as the previous tweet or the political affiliation of the transmitter could improve classifier’s performance as it has already been proven for manifestos classification, introducing a novel method for political parties representation and finding that adding the previous tweet or the political leaning as contextual data does improve its performance.