Identifying Palestinian Political Content from Arabic Tweets

  • Hussam ElKurd -----> Dr. Rebhi Baraka

 Using Twitter in news agencies and media has become widely popular, thus considerable

proportion of tweets greatly reflects the social perspective in the real world. Further many
people follow the news on twitter which attracts the news agencies to analyze and try to
know what is happening on Twitter. Press and media agencies are looking to find efficient
tools to analyze and classify tweets and this is due the difficulty and high cost of the manual
approaches. Various works have discussed and provided effective solutions for processing
tweets into understandable formats for machines to classify and analyze. While researches
receive much attention in languages and locales such as English, some languages such as
Arabic have not received much research attention despite the wide spread of Twitter in the
Arab world in general and Palestine in particular.
In this thesis we propose an approach using machine learning to automatically classify
Arabic tweets related to Palestinian political topics/content. The purpose of classifying
Palestinian Arabic political topics is that Palestine receives great attention in Arab news
and social media. The approach is based on collecting tweets using an application that we
develop based on the TwitterPalPol. It collects tweets from different Twitter API’s through
specific factors like keywords, region and language. Then we process the collected tweets
and classify part of them manually as Palestinian political and as not Palestinian political
in order to be used as the learning data set for the selected machine learning. This is used
in the algorithm to classify new tweets automatically. In addition we create two datasets
for learning, the first one includes all the collected tweets prepared for learning, and the
second include filtered tweets with creditability filter created to evaluate the creditability
for each tweet and ignore fake tweets. The filter is dependent on many factors related to
tweet properties, therefore we compare the results for the classification in both data sets
and find out the importance of the filter. The results was sufficient as they preserve ranges
between 97% and 80% in the main classification measurers like recall and precision.
Keywords: Machine Learning, Text Processing, Text Classification, Twitter.