How children learn their native language

Ridyansh
Nov 9, 2024
3 min read

Updated: Apr 19

I am delighted to feature Sarah Payne, a 3rd year Ph.D student in Linguistics at Stony Brook University and an NSF graduate research fellow. Under the guidance of Dr. Jordan Kodner and Dr. Jeff Heinz, Sarah is researching how children learn their native language, drawing both from experimental findings and from computational theory.

Sarah’s love for writing and math sparked the beginning of their fascinating journey into linguistics. They studied Latin in school, but it was after they took Spanish to meet their high school language requirements, that they were struck by the similarities and differences between Latin and Spanish, sparking an interest in linguistics. With a newfound interest in languages, Sarah decided to double major in computer science and linguistics.

During their undergraduate degree at University of Pennsylvania, they took a course on language acquisition with Dr. Charles Yang, who specializes in computational modeling of language acquisition. Sarah’s research today is largely influenced by Dr. Yang who they continue to work with even today. They wouldn’t be the researcher they are if they hadn’t met him. Sarah’s current advisor, Dr. Jordan Kodner was also a PhD student of Dr. Yang. Sarah’s research interest was further fueled by the resources and support that UPenn provided during their undergraduate years.

To overly simplify Sarah’s very complex work, they compare how humans learn language vs. how machines do. Sarah is intrigued by the question of how human beings learn on sparse data compared to machines that need swaths of data (the entire internet for example). When they worked in the area of natural language processing (NLP), they’d ask how humans learn and produce language, and the answer was often, “we have no idea”. So they have leaned more towards human language acquisition for their research. Their research topic is an intersection of overarching themes and the latest progress in a very rapidly evolving field. There are debates about whether large language models are cognitively plausible. They are doubtful that they are, so they are investigating claims to refute them. Sarah says that it’s riveting to study a field that is progressing so fast. They are constantly reading papers and wondering how findings fit with their models.

As for their computation methodology, because humans learn on small data, Sarah avoids complex methods. They mainly uses data wrangling (process of transforming raw data into more usable formats for analysis) and parsing databases such as CHILDES, which is a collection of transcripts of child language data from various languages and contexts. Such data, available in approximately 20 languages, often comes from transcriptions of experimentalists’ recordings of parent child interactions, sometimes from the 80s and 90s. Some of the most studied languages include English, Spanish, and French. They use this data to proxy vocabulary growth. Parsing databases for comparisons requires high-performance computing. For their research question, they can get by with simpler math. The hardest part however, is going from experimental findings to modeling what’s actually happening in the child’s mind. It’s about connecting vocabulary growth with observed phenomena like developmental regression, as an example. One has to think about memory access and rule application. Modeling isn’t as simple as applying a principle; there are a lot of nuances involved, they add.

Sarah’s research, among many real-world applications, has huge implications for low-resource NLP. Right now, models like ChatGPT need massive amounts of data, and only a few languages have that. Humans learn on less data, so if we model that, we could build better algorithms for smaller or even rare languages, reducing environmental impact too. Training huge models has high carbon emissions, so scaling back would also help the environment. Machine learning has been about getting bigger, but needs to learn efficiently like humans.

Building models that explain experimental findings is the most rewarding aspect of their journey. When a model not only mimics a pattern but offers an explanation, it feels like a breakthrough. We gain insights into potential learning mechanisms used by children. That’s exciting for Sarah and for the progress of our understanding of language acquisition and the burgeoning field of generative models. Sarah’s research is one to watch for its exciting implications!

LAnthropology

How children learn their native language

Recent Posts

Comments