Complete List of Files

Click here to see a complete list of all the files included in the corpus and their associated information.

Explanation of Variables

This page explains the operationalization of the individual variables that all corpus files were coded for.

Variable Explanation

Under construction

Complete explanation of variables will be added here.


The 'Education' variable reflects the highest academic diploma or qualification a speaker has obtained. The information is taken from publicly available sources. The educational level refers to the moment of the speech recording and may thus change for the same speaker over time or in different files. Its levels are an ordinal rank from 1 (lowest) to 5 (highest). The criteria used to determine the levels are shown in the table below.

Education level Criteria Example
1_VeryLow No completed formal edudcation, no diploma High school drop-out
2_Low High school diploma 18 year old high school graduate, freshman
3_Middle Bachelor degree, Master's degree in humanities or with an applied focus, other degrees from unaccredited institutions Bachelor of Science in Geography, Master's in Accounting, Doctor of Ministry
4_High Ph.D., J.D., Master's degree in science or engineering Ph.D. in linguistics
5_VeryHigh Ph.D. in any field and continuing involvement in research Professor at a university

Click here to show/hide distribution of 'Education' in the corpus..


The variable 'Gender' records the gender of the speaker. Classification is based simply on immediate perception of the speaker: gendered names, facial appearance, pitch of the voice, etc. For the vast majority of speakers, the levels are either 'Male' or 'Female'. In a small number of cases, however, the value may be different, e.g., for speakers who are transsexual or otherwise self-identify in meaningful ways as another kind of gender.

Click here to show/hide distribution of 'Gender' in the corpus..