Welcome to the official web page of the Social ComQuant Project!

(Workshop #10) Automatic Data Collection

 

CLOSED

Automatic Data Collection

 

Workshop Details

Date: 2-4 March 2021 (3-day workshop)

Time: 10 am (in Central European Time, CET)

Venue: Online via Zoom

Workshop Language: English

Instructors:  Dr. Jakob Jünger and Chantal Gärtner

Schedule

Day 1: Wednesday, 02.03.2022 (CET)
10:00 – 11:30 Introduction to automated data collection.

Techniques (webscraping, APIs, databases), data formats (HTML, CSV, JSON) and endpoints (assembling URLs, access restrictions)

11:30 – 11:45 Break
11:45 – 13:00 Practical Session: Reading API references and using APIs with Facepager.
13:00 – 14:00 Lunch Break
14:00 – 15:15 Practical session: Using APIs and data wrangling with Python.
Day 2: Thursday, 03.03.2022 (CET)
10:00 – 11:30 Introduction to webscraping. Techniques (HTML extraction, browser automation), extraction methods (CSS selectors, regular expressions) and access restrictions.
11:30 – 11:45 Break
11:45 – 13:00 Practical session: Webscraping and data extraction with Python
13:00 – 14:00 Lunch Break
14:00 – 15:15 Practical session: Webscraping and data wrangling with Python
Day 3: Friday, 04.03.2022 (CET)
9:00 – 10:30 Advanced techniques: Automated pipelines, error handling, boilerplate removal
10:30 – 11:00 Break
11:00 – 12:30 Advanced techniques: Browser automation with Selenium
12:30 – 13:30 Lunch break
13:30 – 14:30 Recap and open questions

Course description

The workshop gives an overview of automatic data collection via webscraping and application programming interfaces. We introduce basic skills using research software and Python and work on practical examples.

Target group

Participants should be interested in automatic data collection in the field of social sciences.

Master students, graduates (not doctoral candidates), interested non-scientists, doctoral candidates, postdoctoral researchers, experienced researchers, and undergraduate students are welcome to apply.

Learning objectives

The participants will have an overview of different methods of automatic data collection and data preparation. They will be able to do webscraping using the practical blueprints that will be provided in the workshop.

Requirements

The participants should have a basic understanding of Python and install a programming environment in advance (e.g Jupyterlab using Anaconda). We will provide information about the necessary software before the workshop.