Speaker
Description
Register variation is a crucial aspect of language production. Corpus-based explorations of language have raised awareness of register variation and have yielded valuable insights into linguistic patterns associated with different registers (Biber et al. 1999). While register has important implications for any type of language production, it is particularly relevant to explore in learner language, because second language learners as novice users of the target language may not show the same register awareness as native or expert writers/speakers (Gilquin & Paquot 2008). However, studies comparing learner language registers are still relatively rare, with a few exceptions such as Fuchs et al. (2016) and Larsson et al. (2021).
One reason for the lack of register studies in learner corpus research is that, until recently, the most widely used learner corpora have covered a limited range of registers, most notably argumentative essays for writing (as in ICLE, Granger et al. 2020) and interviews for speech (as in LINDSEI, Gilquin et al. 2010). In addition, when different registers have been compared, the analysis has mainly been based on texts produced by different groups of learners (e.g. argumentative essays produced by one group of students and interviews produced by another group). However, collecting texts written by the same learners across registers offers the opportunity to investigate how they adapt their language use to different communicative situations (e.g. Kerz et al. 2022).
This paper sets out to describe the compilation of the STudent speech and writing Across Registers (STAR) corpus, a new corpus of student language productions that brings together texts from multiple registers produced by the same L2 English or L1 English students. The focus is on the L2 English component of the STAR corpus which contains data collected at UCLouvain from French-speaking learners of English who are students in their second year of English major studies.
The paper details the written and spoken registers included in the L2 English component of the corpus (e.g. career readiness essays, diary entries, a debate and an informal conversation between two students) as well as the steps that were taken to ensure comparability across the dataset. Rich metadata were collected about the L2 learners and the pedagogical tasks used to elicit language production across registers, relying on Paquot et al.’s (2024) Core Metadata Schema for Learner Corpora.
Once completed, the STAR corpus will be released in open access format and will make it possible for researchers to compare student language productions across registers while controlling for individual variables and styles, and to explore the effect of register on the linguistic features of novice writers and speakers of English.
References
Biber D., S. Johansson, G. Leech, S. Conrad & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Harlow: Longman.
Fuchs, R., Götz, S., & Werner, V. (2016). The present perfect in learner Englishes: A corpus-based case study on L1 German intermediate and advanced speech and writing. In Werner, V., Seoane, E., & Suárez-Gómez, C. (eds) Re-assessing the present perfect, pp. 297-338. De Gruyter.
Gilquin, G., De Cock, S., & Granger, S. (2010). The Louvain International Database of Spoken English Interlanguage. Handbook and CD-ROM. Presses universitaires de Louvain.
Gilquin, G., & Paquot, M. (2008). Too chatty: Learner academic writing and register variation. English Text Construction, 1(1), 41-61.
Granger, S., Dupont, M., Meunier, F., Naets, H., & Paquot, M. (2020). The International Corpus of Learner English. Version 3. Presses universitaires de Louvain.
Kerz, E., Neumann, S., & Niemietz, P. (2022). Assessing linguistic complexity and register flexibility in advanced second language learners: Evidence from group- and individual-level analyses. Register Studies, 4(1), 55‑90. https://doi.org/10.1075/rs.20014.ker
Larsson, T., Paquot, M., & Biber, D. (2021). On the importance of register in learner writing: A multi-dimensional approach. In E. Seoane & D. Biber (eds) Corpus-based approaches to register variation, pp. 235-258. John Benjamins.
Paquot, M., König, A., Stemle, E. W., & Frey, J.-C. (2024). The Core Metadata Schema for Learner Corpora (LC-meta): Collaborative efforts to advance data discoverability, metadata quality and study comparability in L2 research. International Journal of Learner Corpus Research, 10(2), 280-300.
| Principal domain of study | English linguistics and applied linguistics |
|---|