Language Preservation and Semantization: Prototyping Automated Glossing of an Endangered Mixed Language Corpus

Karim Tharani


This article discusses the prototyping of an online vocabulary learning tool for the oral language of the ginans, a corpus of gnostic hymn-like poems of the Ismaili community. The language of the ginans is mixed and borrows vocabulary from various Indo-Aryan and Perso-Arabic dialects. The teachings encoded in the oral language of the ginans, therefore, remain foreign to the English-speaking community members living in the Western diaspora. This study is based on the premise that for the tradition and the teachings of ginans to be preserved in the diaspora, the successive English-speaking generations of the Ismaili community must learn and understand the vocabulary of the ginans. The process through which humans learn and understand the vocabulary of a language is called semantization. The glossing of foreign language (L2) materials with meanings in the native language (L1) of learners has proven to be an effective enabler of semantization. The prototype glossed ginan utilizing lexical resources, including a concordance and an English glossary to facilitate semantization of the ginan vocabulary. Using the design-based research (DBR) methodology, the prototype was implemented over two iterative design cycles. During the evaluation of the prototype by target learners, over 90% of the participants indicated that they would make use of the prototype when made available publicly.

