New Social ComQuant Workshop Series: Introduction to Computational Social Science methods with Python
11-12 April 2023
March 12 , 2023 @23.59 (Turkey's time zone).
Venue: Koç University, Rumelifeneri Campus
(NOTE: Please beware that the event's format may be changed to ONLINE depending on the regulation changes announced by the Higher Education Council (YÖK)).
Undergraduate, master students, doctoral candidates, and experienced researchers who want to get introduced to the practice of Computational Social Science.
Participants are expected to know the basics of Python and have at least some experience using it. For the workshops, participants should bring a running system on which they can execute Jupyter Notebooks. We will be using Python 3.9 and several standard libraries that are part of the Anaconda 2022.10 distribution or can be installed on top of that. A list of libraries and versions of these libraries that participants should import will be circulated before the workshops. We recommend that participants install Anaconda 2022.10. Feel free to also work in a cloud like Google Colab. Consult this link for more detailed instructions on how to set up your computing environment.
Dr. N. Gizem Bacaksizlar Turbic is a postdoctoral researcher in the Computational Social Science departments at RWTH Aachen University and GESIS - Leibniz Institute for the Social Sciences. Her research areas include complex adaptive systems and social and political networks.
Dr. Arnim Bleier is a senior researcher in the Computational Social Science department at GESIS - Leibniz Institute for the Social Sciences. His research interests are in the field of Natural Language Processing and reproducibility. In collaboration with social scientists, he develops Bayesian models for the content, structure and dynamics of social phenomena.
Dr. Haiko Lietz is a postdoctoral researcher in the Computational Social Science department at GESIS - Leibniz Institute for the Social Sciences. His research interests are in computational sociology, network science, and complexity science.
Workshop schedule structure
11 April 2023, Tuesday
09:30 – 12:30 | Workshop 1: Introduction to Open Science tools and online self-training materials
(Dr. Arnim Bleier, Dr. Haiko Lietz, & Dr. N. Gizem Bacaksizlar Turbic)
LECTURE ROOM: CASE 127
Open Science comprises the principle to make data and methods openly available in order to increase the reproducibility of research. In Computational Social Science (CSS), as in all computational sciences, reproducibility means that a piece of research can be executed repeatedly by the producer of that research or by other researchers with identical results. Full reproducibility makes it necessary to document all data and code. One reason why reproducibility is necessary is that workflows in CSS are often quite complicated. Documentation helps researchers keep an overview of all research steps and, ultimately, ensure research quality. Another reason is that documentation easily allows researchers to collaborate or build upon existing data, methods, or both, all of which is in the general knowledge-production interest of science. The workshop will consist of two parts. In the first part, we will introduce Open Science software tools that allow researchers to collaborate, document their work, and demonstrate it to the outside world. In particular, we will introduce the programing coordination system Git, the interactive computing product Jupyter Notebook, and the code execution service Binder. Participants will learn how to share their computer code in GitHub, develop and document it using Jupyter Notebooks, and execute those in the cloud without having to install a programming language. In the second part of the workshop, we will introduce a set of teaching materials that add up to a coherent “Introduction to Computational Social Science methods with Python”. These materials developed in the Social ComQuant project include sessions about data collection, preprocessing, and analysis, make use of the Open Science tools introduced in the first part, and are fully self-explanatory to enable self-training.Go to the page of Workhop Series #1
12:30 – 14:00 | Lunch break
14:00 – 17:00 | Workshop 2: Introduction to network analysis with Python
(Dr. Haiko Lietz & Dr. N. Gizem Bacaksizlar Turbic)
LECTURE ROOM: CASE 127
Computational Social Science is often concerned with the traces of human behavior like those left by uses of social media, messaging services, or cell phones. Such digital behavioral data is genuinely relational and can, therefore, be studied using the formal techniques of network analysis. The basic units of networks called nodes can be actors (e.g., users), communicative symbols (e.g., hashtags), or even transactions (e.g., tweets). By focusing on the edges (relations) among nodes, network analysis is capable of creating insights that are not possible by merely doing statistics on the nodes and their attributes. In the workshop, we will give an introduction to how network data should be organized, how networks can be created in Python, and how they can be analyzed on three levels. On the micro level, we will introduce centrality analysis which results in numerical descriptions of nodes. On the meso level, we will introduce community detection, which results in sets of nodes that form groups or clusters. On the macro level, we will introduce measures that describe inequality in, and the cohesion of, the network in its entirety. We will be using network data from the Copenhagen Networks Study, which describes four different types of social relations among students over time. The workshop will alternate between live-coding demonstrations and periods in which participants apply that knowledge in context, both using Jupyter Notebooks. The software we will be using is NetworkX, a standard Python library that is simple to understand, provides a breadth of options and has a large user community.Go to the page of Workhop Series #2
12 April 2023, Wednesday
09:30 – 12:30 | Workshop 3: Introduction to machine learning with Python
(Dr. Arnim Bleier & Dr. Haiko Lietz)
LECTURE ROOM: SOS 238
Like Quantitative Social Science, Computational Social Science (CSS) is often concerned with the problem of explaining correlations in observational data. But beyond that, CSS is also concerned with predicting the numerical properties of observations or what categories they belong to. While explanations are also done in CSS with conventional statistical models (like the Generalized Linear Model), predictions are the turf of machine learning (ML). In the workshop, we will provide a basic understanding of ML, how predictions are made, and to what extent explanations are possible. We will touch upon the basics of supervised and unsupervised ML. Within supervised ML, regression is about predicting numbers, and classification is about predicting categories. Within unsupervised learning, clustering is about grouping observations, and dimensionality reduction is about grouping variables (in ML called features). In many cases, ML is performed on tables of observations (in rows) and features (in columns). We will be using such a non-social toy dataset to demonstrate the methods and a social dataset to learn about the practice of CSS. The workshop will alternate between live-coding demonstrations and periods in which participants apply that knowledge in context, both using Jupyter Notebooks. The software we will be using is scikit-learn, a standard Python library that is simple to understand, provides a breadth of options, and has a large user community. At the end of the workshop, participants will have an intuition about what ML can and cannot do. We will close with an outlook on how ML relates to Artificial Intelligence.Go to the page of Workhop Series #3
12:30 – 14:00 | Lunch break
14:00 – 17:00 | Workshop 4: Introduction to natural language processing and topic modeling with Python
(Dr. Nicolò Gozzi & Dr. N. Gizem Bacaksizlar Turbic)
LECTURE ROOM: SOS 238
Documents and full texts as data have a long history in the social sciences. Besides these, Computational Social Science is also concerned with new forms of text data that can be collected from digital platforms and the web. All such datasets resemble expressions of natural language and bring methods from computational linguistics and machine learning like Natural Language Processing (NLP) and automated content analysis to center stage. In the workshop, we will give an introduction to how text data can be preprocessed and analyzed in Python. In particular, we will discuss how information can be extracted from raw texts using regular expressions, how words can be reduced to their basic forms, what language models are, how they allow us to extract meaningful pieces of symbolic communication like n-grams, how grammatical parts of speech (e.g., nouns, verbs) can be identified, and how all those steps combine into a text preprocessing pipeline. At the end of such a pipeline stands a document-word matrix that is ready for analysis. For analysis, we will introduce Latent Dirichlet Allocation (also called topic modeling), a fully automated content analysis method that reduces the dimensionality of the document-term matrix. It assumes that documents are generated from topics and infers topics as groups of words. As data, we will use a popular text corpus still to be determined. The workshop will alternate between live-coding demonstrations and periods in which participants apply that knowledge in context, both using Jupyter Notebooks. The software we will be using are SpaCy and Gensim, two standard Python libraries for NLP and topic modeling.Go to the page of Workhop Series #4