Analyzing the Existence of Organization Specific Languages on Twitter

Abstract

The presence of organisations in Online Social Networks (OSNs) has motivated malicious users to look for attack vectors, which are then used to increase the possibility of carrying out successful attacks and obtaining either private information or access to the organisation. This article hypothesised that organisations have specific languages that their members use in OSNs, which malicious users could potentially use to carry out an impersonation attack. To prove these specific languages, we propose two tasks: classifying tweets in isolation by their author’s organisation and classifying users’ entire timelines by organisation. To accomplish both tasks, we generate a dataset of over 15 million tweets of five organisations, and we apply language dependant models to test our hypothesis. Our results and the ablation study conclude that it is possible to classify tweets and users by organisation with more than three times the performance achieved by a traditional ML algorithm, showing a substantial potential for predicting the linguistic style of tweets.