Comprehensive Summary
Wang et al. studied contextual embeddings, a primary feature of large language models, and how the distances between each contextual embedding of similar linguistic systems may be analogous to the strength of neuron activation while the brain is comprehending reading. Large scale datasets (ChineseEEG) were compiled during reading events in which study participants read the Chinese translations of The Little Prince and Garnett Dream, and the relation between distances found using computation and the frequency affiliated with the measures of different band powers during neural activity. The participants’ EEG and eye tracking movements were gathered and Python-MNE was used to process the EEG data. The English ZuCo1.0 set were also included in the study to improve the range of relevancy. The calculations showed varying results, and no significant effects were shown in the delta band, which supports the view that delta and theta frequency waves assist in lower-level brain functions. In contrast, gamma bands showed correlation with natural language processing traits in the ChineseEEG, but no effects were seen in the ZuCo1.0 set. Effects were also discovered for the frequency bands for both ChineseEEG and ZuCo1.0. Because a very small sample size was used, larger and more varied sample sizes must be used in study replication. The study also focused wholly on Chinese characters, so the findings cannot be generalized. They study shows findings that embeddings are related to the power of some bands, specifically gamma bands, during reading comprehension.
Outcomes and Implications
The study could improve understanding of how language models may mimic the neuronal systems in human brains to understand language. Additional research could make strides in the field of linguistics to help physicians better understand diseases that affect speech comprehension and communication.