Guillaume SAUSSAY's Porfolio | Projects - Data Analyses

Home Profile Projects Passions Contact

Data Analyses

In this part, you can find all the analysis projects I made during my curriculum just as my personal ones.
For the moment, you can only see one of them but more are coming! You can go through all of them by selecting the main section regarding each of these.
Do not hesitate to contact me if you have any questions about them~~!

Political Sentiment Analysis

During my studies at the Illinois Institute of Technology, I made an analysis project of the potential correlations between the American people mood and its implication on a political point of view. My review was divided in several points:

Collection of the sentiments expressed by Twitter users with regards the period of the year;

Aggregation of political content tweets reflecting user popular opinion with respect to a particular party;

Analysis of the possible correlations between the two previous data sets;

Examination of the possible extrapolation between the results of the previous analyses and those known of the local and midterm elections in the United States.

For that purpose, I used Amazon Web Services, in particular Amazon Elastic Compute Cloud (EC2), a scalable web server that provides high computing capacity in the Cloud and Amazon Elastic MapReduce (EMR), a Hadoop-based framework allowing to process vast amounts of data using the famous MapReduce data processing programming model.
The data I used consisted of more than 60 million tweets collected from 150,000 users during 10 months from November 2010 to August 2011. I measured the overall sentiment of the tweets with respect of the time of the year they were collected, just as the ones toward the major democrat and republican figures and the current President of the United States: President Obama.

This Project was really an EXCELLENT experience. I learned how to use Amazon Web Services and Twitter Streaming API, how to implement machine learning and semantic analyses algorithms using a lexicon, but also in implementing these algorithms by myself. All in all, I learned a lot in terms of applications, but also gained experience on the successful approaches and mental agility to conduct a data mining project. I personally retain the importance of the preprocessing steps and the fact that these ones take often a lot of time, in particular with such a big dataset, and the extreme importance of always questioning at each step: What are we seeking in doing that? Are the results what we are expected? If no, why? Where could we have made a mistake? How can we make our analyses more reliable? Etc.

The results allowed me, in particular, to show a (small) decorrelation between Twitter mood towards republican versus the one towards democrats and a strong correlation between these results and the ones of the US elections; which was an unsuspected good result!

You can find an extract of my final report here; do not hesitate to contact me if you have any questions about the methods I used, or my study in general.