CLOSED
Automatic Data Collection
Workshop Details
Date: 2-4 March 2021 (3-day workshop)
Time: 10 am (in Central European Time, CET)
Venue: Online via Zoom
Workshop Language: English
Instructors: Dr. Jakob Jünger and Chantal Gärtner
Schedule
Day 1: Wednesday, 02.03.2022 (CET) | |
10:00 – 11:30 | Introduction to automated data collection.
Techniques (webscraping, APIs, databases), data formats (HTML, CSV, JSON) and endpoints (assembling URLs, access restrictions) |
11:30 – 11:45 | Break |
11:45 – 13:00 | Practical Session: Reading API references and using APIs with Facepager. |
13:00 – 14:00 | Lunch Break |
14:00 – 15:15 | Practical session: Using APIs and data wrangling with Python. |
Day 2: Thursday, 03.03.2022 (CET) | |
10:00 – 11:30 | Introduction to webscraping. Techniques (HTML extraction, browser automation), extraction methods (CSS selectors, regular expressions) and access restrictions. |
11:30 – 11:45 | Break |
11:45 – 13:00 | Practical session: Webscraping and data extraction with Python |
13:00 – 14:00 | Lunch Break |
14:00 – 15:15 | Practical session: Webscraping and data wrangling with Python |
Day 3: Friday, 04.03.2022 (CET) | |
9:00 – 10:30 | Advanced techniques: Automated pipelines, error handling, boilerplate removal |
10:30 – 11:00 | Break |
11:00 – 12:30 | Advanced techniques: Browser automation with Selenium |
12:30 – 13:30 | Lunch break |
13:30 – 14:30 | Recap and open questions |
Course description
The workshop gives an overview of automatic data collection via webscraping and application programming interfaces. We introduce basic skills using research software and Python and work on practical examples.
Target group
Participants should be interested in automatic data collection in the field of social sciences.
Master students, graduates (not doctoral candidates), interested non-scientists, doctoral candidates, postdoctoral researchers, experienced researchers, and undergraduate students are welcome to apply.