Computational Linguistics and Language Learning: Error identification in advanced learners of Mandarin Chinese

Dates: 2022–24
Funding body: His Majesty's Government (HMG), The Secretary of State for Foreign and Commonwealth Affairs (RG.MODL.127288)
Value: £5,583
Primary investigator: Alison May (English)
Co-investigators: Binhua Wang (Centre for Translation and Interpreting Studies, School of Languages, Cultures and Societies)
Language at Leeds satellites: Interpreting Studies
Language at Leeds themes: Language learning and teaching

Focusing on written Mandarin Chinese, the principal aims of the project are:

to work with large, pre-existing, online or downloadable learner datasets, so that there is no need to create new datasets for the research;
to use a combination of computational, corpus, and qualitative methods to identify and categorise sets of errors by stratified learners (intermediate and advanced) in native (L1) (British) English speakers writing in Mandarin Chinese as an L2;
to explore the datasets to create an error set based on frequency, persistence, and severity (non-native vs. non-standard) at the different proficiency levels (intermediate and advanced);
to show annotated examples in a range of categories (orthographical, lexical, morphological, syntactic, discoursal, punctuation) for pedagogical purposes to improve advanced learner proficiency.