Automatic Sampling and Analysis of YouTube Data (Online-Workshop)

Workshop Details

Date: February 24-25, 2021 (2-day workshop)

Time: 10 a.m. in Germany local time

Venue: Online via Zoom

Workshop Language: English

Instructors: Dr. Johannes Breuer, Julian Kohne, Dr. Rohangis Mohseni

Schedule

Wednesday, 24.02.
10:00-11:00	Introduction: Why ist YouTube data interesting for research?
11:00-11:30	Coffee Break
11:30-15:30	The YouTube API
12:30-13:30	Lunch
13:30-14:30	Tools for the automatic sampling of YouTube data
14:30-15:30	Collecting data with the tuber package for R
15:30-16:00	Coffee Break
16:00-17:30	Processing and cleaning user comments (in R)
Thursday, 25.02.
09:00-10:30	Basic text analysis of user comments
10:30-11:00	Coffee Break
11:00-12:00	Sentiment analysis of user comments
12:00-13-00	Lunch
13:00-14:00	Excursus: Retrieving video subtitles
14:00-14:30	Coffee Break
14:30-16:00	Practice session, questions, and outlook

Course description

YouTube is the largest and most popular video platform on the internet. The producers and users of YouTube content generate huge amounts of data. These data are also of interest to researchers (in the social sciences as well as other disciplines) for studying different aspects of online media use and communication. Accessing and working with these data, however, can be challenging. In this workshop, we will first discuss the potential of YouTube data for research in the social sciences, and then introduce participants to different tools and methods for sampling and analyzing data from YouTube. We will then demonstrate and compare several tools for collecting YouTube data. Our focus for the main part of the workshop will be on using the tuber package for R to collect data via the YouTube API and wrangling and analyzing the data in R (using various packages). Regarding the type of data, we will focus on user comments but also will also (briefly) look into other YouTube data, such as video statistics and subtitles. For the comments, we will show how to clean/process them in R, how to deal with emojis, and how to do some basic forms of automated text analysis (e.g., word frequencies, sentiment analysis). While we believe that YouTube data has great potential for research in the social sciences (and other disciplines), we will also discuss the unique challenges and limitations of using this data.

Target group

The workshop is aimed at people who are interested in using YouTube data for their research.

Learning objectives

Participants will learn how they can use YouTube data for their research. They will get to know several tools for collecting YouTube data and learn about their advantages and disadvantages. At the end of the workshop, participants should be able to automatically collect YouTube data, process/clean it, and do some basic (exploratory) analyses of user comments.

Prerequisites

Participants should at least have some basic knowledge of R and, ideally, also the tidyverse. Basic R knowledge can, for example, be acquired through the swirl (Learn R, in R) course “R Programming” (see https://swirlstats.com/), the DataCamp online course “Introduction to R” (https://www.datacamp.com/courses/free-introduction-to-r) or the RStudio Primer “Programming basics“ (https://rstudio.cloud/learn/primers/1.2), all of which are available for free. There also are many free online introductions to the tidyverse: For example, this blog post by Martin Frigaard (http://www.storybench.org/getting-started-with-tidyverse-in-r/) or this webinar by Thomas Mock (https://resources.rstudio.com/webinars/a-gentle-introduction-to-tidy-statistics-in-r).

(Workshop #5) Automatic Sampling and Analysis of YouTube Data (Online-Workshop)