Welcome to the official web page of the Social ComQuant Project!

(Workshop #6) Speech and General Audio Analysis for Social Computing

 

CLOSED

Speech and General Audio Analysis for Social Computing

 

Workshop Details

Date: 6-8 October 2021 (3-day workshop)

Time: 10 am in CET (=11 am in Turkey’s local time)

Venue: Online via Zoom

Workshop Language: English

Instructors: Dr. Björn Schuller, Dr. Shahin Amiriparian

Schedule

Day 1 (6.10.2021)

10.00 – 11.30 (CET)                    Session1: AI-based social signal processing (Prof. Dr. Björn Schuller & Dr. Shahin Amiriparian)

11.30 – 11.45 (CET)                    Break

11.45  – 13.00 (CET)                    Session2: Computational paralinguistics in the deep learning era (Prof. Dr. Björn Schuller & Dr. Shahin Amiriparian)

13.00 – 14.00 (CET)                    Lunch break

14.00 – 15.30 (CET)                  Session3: Hands-on machine learning for (social) signal processing, including feature extraction, feature analysis, and supervised learning (Dr. Shahin Amiriparian & Prof. Dr. Björn Schuller)

15.30            (CET)                    End of the day

 

Day 2 (7.10.2021)

10.00 – 11.30 (CET)                  Session4: Create your own deep neural networks (DNNs) based Android application for affect recognition (Dr. Shahin Amiriparian & Prof. Dr. Björn Schuller)

11.30 – 11.45  (CET)                  Break

11.45  – 13.00  (CET)                  Session5: Fine-tune your trained DNNs based affect recogniser (Dr. Shahin Amiriparian & Prof. Dr. Björn Schuller)

13.00 – 14.00  (CET)                   Lunch break

14.00 – 15.30  (CET)                   Session6: Finalise your DNNs based Android application and test it on your smartphone or embedded devices (Dr. Shahin Amiriparian & Prof. Dr. Björn Schuller)

15.30             (CET)                   End of the day

Day 3 (8.10.2021)

10.00 – 11.30  (CET)                    Session7: Why self-learning is crucial for multimodal signal processing (Dr. Shahin Amiriparian & Prof. Dr. Björn Schuller)

11.30 – 11.45  (CET)                    Break

11.45  – 13.00   (CET)                    Session8: Hands-on machine learning for (social) signal processing, including unsupervised learning and dimensionality reduction (Dr. Shahin Amiriparian & Prof. Dr. Björn Schuller)

13.00 – 14.00   (CET)                      Lunch break

14.00 – 15.30   (CET)                      Session9: Model evaluation, conclusions and discussion (Dr. Shahin Amiriparian & Prof. Dr. Björn Schuller)

15.30        (CET)                                       End of the day

Course description

During this workshop, the participants will be introduced to state-of-the-art machine learning techniques in the field of social computing, dyadic human-human and human-robot interaction, and computational paralinguistics. Deep learning methods such as recurrent attention-based autoencoders and convolutional neural networks for emotion recognition, sentiment analysis and general audio and speech processing tasks will be presented. In particular, three machine learning toolkits, auDeep (https://github.com/auDeep/auDeep), DeepSpectrum (https://github.com/DeepSpectrum/DeepSpectrum), and openSMILE (https://github.com/audeering/opensmile) will be used during this workshop. The participants will learn to create their own deep neural networks based Android applications for real-time signal processing (e.g., emotion recognition).

Following the research paper “Synchronization in interpersonal speech”[1] the participants will have the chance to conduct their own experiments to analyse the synchronisation between communication partners by extracting speech features from each individual. Furthermore, among others, the participants will make use of unsupervised representation learning techniques for various paralinguistics tasks and evaluate the performance of their trained neural networks.

[1] Amiriparian, Shahin, et al. “Synchronization in interpersonal speech.” Frontiers in Robotics and AI 6 (2019): 116.

Keywords

Computational paralinguistics, social signal processing, representation learning, autoencoders, convolutional neural networks

Learning Objectives

The participants can expect to get to know fancy state-of-the-art machine learning techniques for (social) signal processing and they should be able to run their own machine learning experiments for this task and evaluate their AI models.

Requirements

Basic AI knowledge.

Introductory literature:

1. Schuller, Björn, et al. “The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism.” Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France. 2013.

Download link: https://opus.bibliothek.uni-augsburg.de/opus4/files/44174/0051.pdf

 

2. Amiriparian, Shahin, et al. “Synchronization in interpersonal speech.” Frontiers in Robotics and AI 6 (2019): 116.

Download link: https://www.frontiersin.org/articles/10.3389/frobt.2019.00116/full

 

3. Amiriparian, Shahin. Deep representation learning techniques for audio signal processing. Diss. Technische Universität München, 2019.

Download link: https://mediatum.ub.tum.de/doc/1463108/1463108.pdf

 

4. Recommended GitHub repositories for this workshop:

https://github.com/auDeep/auDeep for unsupervised representation learning

https://github.com/DeepSpectrum/DeepSpectrum for transfer learning and deep feature extraction

https://github.com/DeepSpectrum/DeepSpectrumLite for light-weight mobile applications

https://github.com/audeering/opensmile for extraction of expert-designed features