Home > Node > Background & Goals

Background & Goals

Background: Polysynthetic languages are characterized by “words” composed of multiple morphemes, often to the extent that one long word can express the meaning contained in a multi-word sentence in language like English. To illustrate, consider the following example from Inuktitut, one of the official languages of the Territory of Nunavut in Canada. The morpheme -tusaa- (shown in boldface below) is the root, and all the other morphemes are synthetically combined with it in one unit.

(1) tusaa-tsia-runna-nngit-tu-alu-u-junga
      hear-well-be.able-NEG-DOER-very-BE-PART.1.S
       ‘I can't hear very well.’

Kabardian (Circassian), from the Northwest Caucasus also shows this phenomenon, the root being -še- :

(2)  wə-q’ə-d-ej-z-γe-še-ž’e-f-a-te-q’əm
      2SG.OBJ-DIR-LOC-3SG.OBJ-1SG.SUBJ-CAUS-lead-COMPL-POTENTIAL-PAST-PERF-NEG 
      ‘I would not let you bring him right back here.’

Many polysynthetic languages are among the world’s most endangered languages[1], with fragmented dialects and communities struggling to preserve their linguistic heritage. This workshop will enable discussion on existing, ongoing and recent work in Linguistics and Computational Linguistics. It will also reveal common problems and difficulties, especially as seen by language practitioners. The purpose is to establish an ongoing discussion of possible collaborations and hopefully a roadmap.

Polysynthetic languages are of interest both for research and for practical goals.  On the research side, these languages offer insights into gaining an understanding of human cognition and language capabilities.  From a theoretical perspective, polysynthetic phenomena have long challenged many linguistic theories, from syntax to phonology[2]. Polysynthetic languages have always created analysis challenges, both for traditional linguists (Greenberg 1960, Comrie 1981), as well as for traditional computational systems  (Byrd et al.  1986).   More recently, complex linguistic phenomena have been explored, and an entire book dedicated to polysynthesis pushed the field forward (Baker 1996). Computational contributions have moved the field even further, especially in the areas of corpus collection and annotation, and in parsing tools.

Since these languages are generally not of commercial value, the research community needs to address fundamental language complexity issues; research on these language could have unanticipated benefits on many levels.  While collections of annotated corpora (spoken and written) for major isolating, agglutinative and inflectional languages exist, there are significant additional complexities for polysynthetic languages:

  • tokenization (boundaries for units of meaning?), delimiting morphology from syntax,
  • lemmatization (where is the root?  Where are affixes? What about clitics?)
  • part-of-speech tagging
  • glossing (translation into other languages)

Linguistic data, be it text or audio, are scarce which has created challenges for language analysis as well as for revitalization efforts. Only recently have researchers started collecting well-designed corpora of polysynthetic languages (Arkhangelskiy et al.  2016 on Circassian; Kazeminejad et al. 2017 on Arapaho). 

 

Goals:  The goal of this workshop is to explore practical applications of recent developments in linguistics and computational linguistics to the preservation and revitalization of North American indigenous languages, and to build on the long history of research on polysynthesis combined with the more current computational interest in processing morphologically complex languages.  As such the program committee consists of theoretical linguists, computational linguists, anthropological linguists and experts in language revitalization. 

In the future, we aim to formulate a shared task, that meets the goals outlined in Levow, Bender et al. 2016, namely to “align the interests of the speech and language processing communities with those of endangered language documentation communities.”  In addition to coordinating with the  NSF-funded EL-STEC project, we will consult with the SIGMORPHON organizers (https://sites.google.com/view/conll-sigmorphon2017/home?authuser=0 and  http://www.aclweb.org/old_anthology/W/W16/W16-20.pdf#page=22) and Morpho Challenge project.

 

We welcome several kinds of presentation in order to create a well-rounded Workshop which addresses the many issues involved in language revitalization for these language families, including:

  • Research in linguistic theory relevant to polysynthesis
  • Research in computational linguistics on polysynthetic languages
  • Data collection and annotation projects (especially focused on language challenges)
  • Cultural aspects of handling polysynthetic languages
  • Teaching experiences and case studies at any level from K-adult

As such, the program committee consists of a wide range of experts to ensure that all submissions are reviewed fairly.

More Information:  http://languagescience.umd.edu/poly

 



[1] In fact, the majority of the languages spoken in the world today are endangered and disappearing fast ( see Bird 2009).Estimates are that, of the approximately 7000 languages in the world today, at least one disappears every day ( https://www.ethnologue.com).

[2] In particular, an interesting question can be posed of whether “syntactic morphology” has any special properties as compared to syntax (besides possible restrictions on complexity).