Python for Linguists
**Audience:** Students who would like to improve their employability by learning a highly desirable skill. Students who would like to do any English Linguistic courses with a quantitative component in the future. It can also be beneficial to those who are more literature-based but would like to do more digital humanities. Students who are interested in Artificial Intelligence.
**Description**: This class is an introduction to computer programming in the high level programming language Python. To make the course relevant to linguists, you will learn how Python can be used to solve some fun linguistic problems such as ‘What’s the most used word by William Shakespeare? Who has a bigger vocabulary: Jay-Z or Helene Fischer? and some fundamental linguistic tasks: part-of-speech tagging, syllabification, discovery of morphemes and phrases, and cryptography/author identification. We will learn to conduct some basic language processing, such as compiling frequency lists for segments/syllables/words and regular expressions. The class is suitable for students with little to no prior experience in computing or programming.
**Literature**:
(Tentative): Textbook: Horstmann & Necaise, Python for Everyone, 3rd. Ed. (eBook recommended)
You will need a computer for every class.
Colloquium
This colloquium is for all students who want to discuss their project for a Bachelor, Master or doctoral thesis and who wish to receive feedback and support. The colloqium takes place every second week in person. The other weeks you would be required to work as a group. We will use the first session to decide on the topics of presentation, which will then have to become a part of the colloquium's program. In the in-person weeks, we will also cover research related skills, such as time-management, hypothesis generation, critical reading and more.
Requirements for 2 CPs are a set of assignments plus an active participation in all in-class activities. Requirements for 3 CPs are similar to those for 2 CPs but have more assignments. All will be described in the course syllabus that will be provided and discussed in the first session.
In case that you miss more than 2 sessions, you will have to compensate for this participation by handing in extra written work.
Quantitative Methods for Linguistic Data: An Introduction to Statistics using R (Tang, Wed: 14:30--16:00)
**Audience:** Students who would like to improve their employability
by learning a highly desirable skill. Students who would like to do any
English Linguistic courses with a quantitative component in the future.
It can also be beneficial to those who are more literature-based but
would like to do more digital humanities. Students who are interested in
Artificial Intelligence.
**Keywords: **
statistics, quantitative analysis, R, phonetics, phonology, language, linguistics
**Description:**
It
is as necessary to be numerate as it is to be literate, but students in
the field of humanities are often not as numerate as they are literate.
They will need to evaluate evidence that are based on probability-based
models or statistical results in many of the courses that they take in
university, as they consider the efficacy of vaccination and the
severity of the pandemic, as they begin to vote in local and national
elections, as they search for employment on the job market after
graduating, and so on. With an increasingly digital world filled with
big data, a command of statistical reasoning is more important than
ever. In this course, we will learn numeracy through linguistics,
specifically through phonetics and phonology by learning to analyse the
sounds of languages quantitatively.
How do we analyse the sounds
of languages quantitatively? This course, Analysing the sounds of
languages, covers the basics of quantitative methods using real data
taken from the field of phonetics and phonology. We will provide a
gentle introduction to the statistical program R (www.r-project.org) -- a
program that is used by data scientists in the tech. industry and
academic researchers. The course will consist of a combination of
lectures, and plenty of hands-on exercises. We introduce research
questions, such as ”Do Southerners in the US really talk more slowly?”
or ”Why do we expect scholarly words to be longer than familiar words?”
as a framework for introducing the numerical concepts required to answer
research questions such as these. In this course, statistical methods
are introduced with a research question and a solid understanding of the
data, which is why we use real data and questions that are relevant to
anyone who commands a spoken language. A good amount of space is also
devoted to illustrating how to formulate and answer a research question,
and hypothesis development and testing.
**Textbook:**
To get a
sense of what we will do on this course, do check out the main textbook
that we will be using https://kb.osu.edu/handle/1811/77848 (freely
available). I look forward to numerating with you on phonetics and
phonology.
Smith, Bridget J., Beckman, Mary E., and Foltz, Anouschka (2016). Analyzing the sounds of languages. Ohio State University. http://hdl.handle.net/1811/77848
** Requirements **
Requirements for 2 CPs are a set of assignments plus an active
participation in all in-class activities. Requirements for 3 CPs are
similar to those for 2 CPs but have more assignments. All will be
described in the course syllabus that will be provided and discussed in
the first session.
In case that you miss more than 2 sessions, you will have to compensate for this participation by handing in extra written work.
Ethics, Bias and Natural Language Processing (Tang, SS 2025, Thurs: 14:30--16:00)
**Description:**
Is technology really as innocent and as objective as
they are said to be? As machine learning (ML) and Artificial
Intelligence (AI) becomes more prominent in our life from speech and
voice recognition by Alexa to automatic fake news warnings of social
media posts, issues with social bias and fairness in language technology
become more pertinent than ever before. Negative impacts that biased ML
and AI could have for various social identities such as race, gender
and culture.
We first introduce the concept of bias in language
technology, and the different types of biases such as racial, gender,
cultural biases. To begin to understand the cause of these biases, we
will cover the basic underlying structure of some of the technologies
such as Automatic Speech Recognition, hate speech detection and word
association. To evaluate these biases, we will learn to generate test
cases that can be used to evaluate trained systems, and the metrics that
are used for measuring bias/fairness. Finally, we will cover the basics
of bias mediation and techniques.
**Audience:** those interested
in social factors (e.g., sociolinguistics), digital humanities,
computational ethics, and challenges in AI. Students who are interested
in Artificial Intelligence.
**Literature**Given the rapidly
developing nature of this topic, there is not a single textbook, but
rather we would sample from existing research papers and handbook
chapters.
e.g., Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and Machine Learning. 2019. URL: http://www.fairmlbook.org.
Feng,
S., Kudina, O., Halpern, B. M., & Scharenborg, O. (2021).
Quantifying bias in automatic speech recognition. arXiv preprint
arXiv:2103.15122.
Garg, N., Schiebinger, L., Jurafsky, D., &
Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic
stereotypes. Proceedings of the National Academy of Sciences, 115(16),
E3635-E3644.
** Requirements **
Requirements
for 2 CPs are a set of assignments plus an active participation in all
in-class activities. Requirements for 3 CPs are similar to those for 2
CPs but have more assignments. All will be described in the course
syllabus that will be provided and discussed in the first session.
In case that you miss more than 2 sessions, you will have to compensate for this participation by handing in extra written work.