We store 1.5 megabytes of information to master our native language, study finds.

Learning a language is a remarkable feat. The amount of cognitive effort it takes for a newborn to learn a language is tremendous. We often don’t realize this because children do it seemingly effortlessly and rapidly. But it’s not what it looks like. New research shows that language acquisition requires an enormous amount of cognitive function rather than just something encoded in our genes, as previously thought. 

The study is co-authored by UC Berkeley assistant professor  Steven Piantadosi and University of Rochester doctoral candidate Francis Mollica, and was published March 27 in the journal Royal Society Open Science.

The study found that between birth and the age of 18, learners store about 12.5 million bits (1.5 megabytes) of linguistic information in order to fully process how their native language works. When converted to binary code, the data would fill a 1.5 MB floppy disk, according to the study. The authors comment that “it may seem surprising but, in terms of digital media storage, our knowledge of language almost fits compactly on a floppy disk.” 

A bit is a basic computational unit in the form of a binary digit (1 or 0) that is used by computers to store and process information. Using the standard definition of eight bits to one byte of information, the study found that learners, on average, process up to 2000 bits of information every day, about 2 bits per minute, regarding how their native language works for the first 18 years of life. The authors write that this is a “remarkable feat of cognition.”

Most of the stored information relates to the lexical properties of words such as semantics and meaning in general rather than to syntactic properties and how words are ganged in a sentence. For example, when presented with the word “Turkey”, young learners would typically gather bits of information about the word by asking “is a turkey a bird? Yes or No? Does a turkey fly? Yes, or no?” and so on, until they grasping the full meaning of the word “turkey.” 

They arrived at their results by running various calculations about language semantics and syntax through computational models. Notably, the study found that linguistic knowledge focuses mostly on the meaning of words, as opposed to the grammar of language.

Researcher Piantadosi said that “a lot of research on language learning focuses on syntax, like word order. But our study shows that syntax represents just a tiny piece of language learning, and that the main difficulty has got to be in learning what so many words mean.”

The focus on meaning vs syntax is key in distinguishing human language from communication systems found in robots and machines such as Siri and Google assistant. In this regards, Piantadosi comments that “this really highlights a difference between machine learners and human learners. Machines know what words go together and where they go in sentences, but know very little about the meaning of words.”

Reference: 

Francis Mollica, Steven T. Piantadosi. Humans store about 1.5 megabytes of information during language acquisitionRoyal Society Open Science, 2019; 6 (3): 181393 DOI: 10.1098/rsos.181393