Within the social science spectrum, there is an increasing interest for automatically analysing texts to derive computable data. This course’s content, in particular, investigates research drawn from internet-based data sources such as social media, online news, large digital archives, as well as public comments to news and products. This emerging field of studies is also called Computational Social Science, and is rapidly becoming an extremely sought-after specialization within data and social sciences (Lazer et al., 2009).
This workshop will provide insights into the basic concepts, challenges, and opportunities associated with data so large that traditional research methods (like manual coding) can no longer be applied. Instead, participants are introduced to strategies and techniques for analysing large quantities of text, and are then taught how to examine and incorporate concrete examples and templates than can then be shared and modified for self-research projects within this data science. Additionally, the class gives an introduction to the valuable programming language, Python, that can be used to apply those strategies within data analysis. Python is an extremely versatile program ranked as one of the top most useful programming languages today. Students can expect to have a clear understanding of Python’s basics, advancing their academic and professional opportunities and engagements.
The beginning of the course will include an overview about the program Python, how it differs from other programs such as SPSS or STATA, and an evaluation of the basic datatypes and structures Python uses. Common tools to work with Python (such as Jupyter notebooks) are introduced, and the most productive and convenient practices for using those tools are demonstrated to help students advance their skill sets and make their knowledge more marketable.
In the next segment, the basics of programming in Python will be demonstrated and discussed, followed by learning how to conduct a sentiment analysis. After this, an introduction into the techniques needed for automated content analysis (reading, cleaning, and processing natural language) is given, followed by examples of how to apply supervised and unsupervised machine learning (which can, for example, be used to identify specific topics within texts and their values).
In the last part of the class, web scraping and parsing for retrieving online content will be introduced and examined. At the end of the class, participants will be able to demonstrate these skills through a chosen project based on their own data or on content retrieved from the web.
The workshop is constructed to be very practical and applicable, with most of the teaching using a hands-on approach. Apart from lectures, time will be given to students to engage with the code and apply it to specific projects and fields of interest.