- Trainer*in: Kevin Tang
Quantitative Methods for Linguistic Data: An Introduction to Statistics using R (Prof. Tang, Summer 2022, Wednesday: 10:30--12:00)
**Audience:** Students would like to do any English Linguistic courses with a quantitative component in the future. It can also be beneficial to those who are more literature-based but would like to do more digital humanities.
**Keywords: **
statistics, quantitative analysis, R, phonetics, phonology, language, linguistics
**Description:**
It is as necessary to be numerate as it is to be literate, but students in the field of humanities are often not as numerate as they are literate. They will need to evaluate evidence that are based on probability-based models or statistical results in many of the courses that they take in university, as they consider the efficacy of vaccination and the severity of the pandemic, as they begin to vote in local and national elections, as they search for employment on the job market after graduating, and so on. With an increasingly digital world filled with big data, a command of statistical reasoning is more important than ever. In this course, we will learn numeracy through linguistics, specifically through phonetics and phonology by learning to analyse the sounds of languages quantitatively.
How do we analyse the sounds of languages quantitatively? This course, Analysing the sounds of languages, covers the basics of quantitative methods using real data taken from the field of phonetics and phonology. We will provide a gentle introduction to the statistical program R (www.r-project.org) -- a program that is used by data scientists in the tech. industry and academic researchers. The course will consist of a combination of lectures, and plenty of hands-on exercises. We introduce research questions, such as “Do Southerners in the US really talk more slowly?” or “Why do we expect scholarly words to be longer than familiar words?” as a framework for introducing the numerical concepts required to answer research questions such as these. In this course, statistical methods are introduced with a research question and a solid understanding of the data, which is why we use real data and questions that are relevant to anyone who commands a spoken language. A good amount of space is also devoted to illustrating how to formulate and answer a research question, and hypothesis development and testing.
**Textbook:**
To get a sense of what we will do on this course, do check out the main textbook that we will be using https://kb.osu.edu/handle/1811/77848 (freely available). I look forward to numerating with you on phonetics and phonology.
Smith, Bridget J., Beckman, Mary E., and Foltz, Anouschka (2016). Analyzing the sounds of languages. Ohio State University. http://hdl.handle.net/1811/77848
- Lecturer: Marie Engemann
Academic Writing and Research - Linguistics
The methods course Academic Writing (Linguistics) can be taken as part of the “Methodenmodul” or to improve and deepen skills in academic writing (e.g. in preparation for the bachelor thesis). It provides a comprehensive introduction to scientific work in the field of English linguistics. Basic methods of scientific work are learned and deepened by means of concrete exercises and writing tasks - especially with regard to final theses in linguistics.
The prerequisite for a participation certificate is the active participation in the seminar, i.e. the completion of the tasks set within the framework of the seminar. The corresponding assignments will be made available on Moodle at the beginning of the semester.
- Trainer*in: akhilesh kakolu ramarao
- Trainer*in: Kevin Tang
Programming for Linguists (Prof. Tang, Summer 2022, Mondays: 12:30--14:00)
**Audience:** Students who would like to improve their employability by learning a highly desirable skill. Students who would like to do any English Linguistic courses with a quantitative component in the future. It can also be beneficial to those who are more literature-based but would like to do more digital humanities. Students who are interested in Artificial Intelligence.
**Description**: This class is an introduction to computer programming in the high level programming language Python. To make the course relevant to linguists, you will learn how Python can be used to solve some fun linguistic problems such as ‘What’s the most used word by William Shakespeare? Who has a bigger vocabulary: Jay-Z or Helene Fischer? and some fundamental linguistic tasks: part-of-speech tagging, syllabification, discovery of morphemes and phrases, and cryptography/author identification. We will learn to conduct some basic language processing, such as compiling frequency lists for segments/syllables/words and regular expressions. The class is suitable for students with little to no prior experience in computing or programming.
**Literature**:
(Tentative): Textbook: Horstmann & Necaise, Python for Everyone, 3rd. Ed. (eBook recommended)
You will need a computer for every class.
- Trainer*in: Kevin Tang
Biased Language Technology in a Social World (Prof. Tang, Summer 2022, Wednesday: 12:30--14:00)
**Description:**
Is technology really as innocent and as objective as they are said to be? As machine learning (ML) and Artificial Intelligence (AI) becomes more prominent in our life from speech and voice recognition by Alexa to automatic fake news warnings of social media posts, issues with social bias and fairness in language technology become more pertinent than ever before. Negative impacts that biased ML and AI could have for various social identities such as race, gender and culture.
We first introduce the concept of bias in language technology, and the different types of biases such as racial, gender, cultural biases. To begin to understand the cause of these biases, we will cover the basic underlying structure of some of the technologies such as Automatic Speech Recognition, hate speech detection and word association. To evaluate these biases, we will learn to generate test cases that can be used to evaluate trained systems, and the metrics that are used for measuring bias/fairness. Finally, we will cover the basics of bias mediation and techniques.
**Audience:** those interested in social factors (e.g., sociolinguistics, accents), digital humanities, computational ethics, and challenges in AI.
**Literature**
Given the rapidly developing nature of this topic, there is not a single textbook, but rather we would sample from existing research papers and handbook chapters.
e.g., Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and Machine Learning. 2019. URL: http://www.fairmlbook.org.
Feng, S., Kudina, O., Halpern, B. M., & Scharenborg, O. (2021). Quantifying bias in automatic speech recognition. arXiv preprint arXiv:2103.15122.
Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), E3635-E3644.
**Tentative topics:**
- Can machines be biased? What is bias? What are the different types of biases?
- Basics of language technologies
- Basics of Language models
- Basics of Sentiment analysis
- Basics of Vector semantics
- Basics of Automatic Speech Recognition
- How to measure bias/fairness
- Automatic Speech Recognition (e.g., speech misperception)
- Classification system (e.g., hate speech detection)
- Analogical association (e.g., gender-bias (male-doctor, female-nurse), racial-bias (white- doctor,black-janitor))
- How to mediate bias/fairness
- Data representation
- Algorithmic solution