Research Focus

The ability to gather and analyze “Big Data” has had important impacts in many fields, including language. Companies like Google and Apple make use of the billions of words and sentences available in English, French or Chinese to build translation software and speech recognition systems. But big data isn’t always available - or necessary. Human children typically learn language very easily with only modest amounts of data. And for the vast majority of the world’s 7000 languages, the data available for building language technologies is relatively small. Understanding how human learners make such economical use of language input has the potential to translate into “smarter” methods in language technology. At the same time, it has important consequences for understanding the effects of reduced or impoverished language input for children, and for addressing language disorders across the lifespan.

The research components of UMD's NRT program address how language data at varying scales can be used efficiently by humans and machines. This overarching theme brings together UMD’s strengths in computational linguistics, machine translation, technology for Low Resource Languages, typical and atypical language and literacy development, neuroscience of language, language disorders, and second language learning.

Within this theme, interdisciplinary research teams work on different strands of research relating to data scale and quality. Teams of students and faculty engage in seminars and working groups during the academic year, in addition to participating in periodic, intensive research-only workshops. These workshops bring Maryland researchers together with invited faculty experts and student participants from other universities, to further research goals and to promote integrated, team-based approaches to complex problems spanning multiple fields.

Some of the research areas within the data-scale theme include:

  • Low-resource languages and field linguistics
  • ‘Language poverty’ and learner differences
  • Flexible automatic speech recognition
  • Prediction and millisecond-scale information management
  • Intelligent tutoring systems (computation & learning)
  • Flexible clinical diagnostic tools