Documentation

This page provides the documentation for The Student-Transcribed Corpus of Spoken American English. The information concerns both metadata on the files as well as social and situational variables for the speakers. All text files are completely coded for each of these pieces of information. Choose below to either see a list of all text files currently included in the corpus or to see an explanation of each of the coded variables.


“ Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing.” - Dick H. Brandon

Convenience-sampled data, like that used in this corpus, is quite difficult to document in a consistent and comprehensive way. The proposed classification attempts to make some sense of the myriad of parameters along which speech may vary. However, due to a person's mobility, limited availability of exact data, fluctuations in lifestyle and other difficulties, the information provided may not always be absolutely accurate.