Automated Content Analysis using Python

This workshop will give an introduction to the valuable programming language, Python, that can be used to apply those strategies within data analysis. Python is an extremely versatile program ranked as one of the top most useful programming languages today. Students can expect to have a clear understanding of Python’s basics, advancing their academic and professional opportunities and engagements.

Course level
Master, PhD candidates and professionals from all disciplines
Session 2
11 January to 18 January 2020 
Coordinating lecturer                                      Felicia Loecherbach
Form(s) of instructionWorkshops (Lectures and learning sessions)
Form(s) of assessmentCompletion of final project
ECTS3 credits
Contact hours30 hours
Tuition fee

€800 - non-VU students and staff

€500 - VU students and staff

Master students and PhD candidates who are interested in learning Python, especially for analysing texts with methods of automated content analysis. The course covers the basics of Python programming and gives insights into sentiment analysis, opinion mining,, (un-) supervised machine learning, and data extraction through web scraping. This course will especially interest students who desire to fully engage themselves within Python programming opportunities, and who desire to learn how to overcome the challenges presented within organizing data that simply cannot be comprehended using traditional methods.  If you have doubts about your eligibility for the course, please contact us: graduatewinterschool@vu.nl.
No special requirements are needed.

Within the social science spectrum, there is an increasing interest for automatically analysing texts to derive computable data. This course’s content, in particular, investigates research drawn from internet-based data sources such as social media, online news, large digital archives, as well as public comments to news and products. This emerging field of studies is also called Computational Social Science, and is rapidly becoming an extremely sought-after specialization within data and social sciences (Lazer et al., 2009).
This workshop will provide insights into the basic concepts, challenges, and opportunities associated with data so large that traditional research methods (like manual coding) can no longer be applied. Instead, participants are introduced to strategies and techniques for analysing large quantities of text, and are then taught how to examine and incorporate concrete examples and templates than can then be shared and modified for self-research projects within this data science. Additionally, the class gives an introduction to the valuable programming language, Python, that can be used to apply those strategies within data analysis. Python is an extremely versatile program ranked as one of the top most useful programming languages today. Students can expect to have a clear understanding of Python’s basics, advancing their academic and professional opportunities and engagements.

The beginning of the course will include an overview about the program Python, how it differs from other programs such as SPSS or STATA, and an evaluation of the basic datatypes and structures Python uses. Common tools to work with Python (such as Jupyter notebooks) are introduced, and the most productive and convenient practices for using those tools are demonstrated to help students advance their skill sets and make their knowledge more marketable.
In the next segment, the basics of programming in Python will be demonstrated and discussed, followed by learning how to conduct a sentiment analysis. After this, an introduction into the techniques needed for automated content analysis (reading, cleaning, and processing natural language) is given, followed by examples of how to apply supervised and unsupervised machine learning (which can, for example, be used to identify specific topics within texts and their values). 

In the last part of the class, web scraping and parsing for retrieving online content will be introduced and examined. At the end of the class, participants will be able to demonstrate these skills through a chosen project based on their own data or on content retrieved from the web. 
The workshop is constructed to be very practical and applicable, with most of the teaching using a hands-on approach. Apart from lectures, time will be given to students to engage with the code and apply it to specific projects and fields of interest.

Upon completion of this course, students will be able to:
•    Identify research methods from computer science and computational linguistics which can be used for research in the domain of the social sciences; they can explain the principles of these methods and apply them to the analysis of texts;
•    Understanding the basics of the Python programming language and can work within Jupyter Notebooks;
•    Implement techniques from commonly used Python modules.

Trilling, D. (2017). Doing computational social science with Python: An introduction. Version 1.0. SSRN. Retrieved from
papers.ssrn .com

Felicia Löcherbach is a PhD Candidate within the department of Communication Science. She is studying the diversity of issues and perspectives in (online) news and how it is affected by recommender algorithms and selective exposure. She is supervised by Wouter van Atteveldt (Vrije Universiteit Amsterdam), Damian Trilling, and Judith Möller (University of Amsterdam). Her research is part of the project “Inside the filter bubble: A framework for deep semantic analysis of mobile news consumption traces” funded by a JEDS grant from NWO.

Her other research interests includes furthering the usage of computational methods in communication science. She contributes to the open-source project INCA with facilitating collection, processing and analysis of online (textual) data for social scientists, and she serves as editorial assistant of the journal Computational Communication Research.

Felicia obtained a Research Master in Communication Science at the University of Amsterdam (2018), and a Bachelors in Communication Science and Philosophy from the University of Erfurt (2016).

“First, solve the problem. Then, write the code.” (John Johnson)