The industry standard for transcriptions of children’s speech and dialogues.
A collection of prompted and spontaneous speech from 1,100 children (K-10), including word-level transcriptions. 5. Educational Platforms Kids101 txt
This research explores the hypothesis that children’s texts (like stories) explicitly state "commonsense" facts that adult texts often omit because adults assume the reader already knows them. It introduces childBERT , a model fine-tuned on children’s corpora to improve AI reasoning. 2. Advancing Language Models for Kids Paper: "KidLM: Advancing Language Models for Children" Advancing Language Models for Kids Paper: "KidLM: Advancing
A dataset containing over 60,000 poems written by children in grades 1 through 12, often used for age classification and sentiment analysis. 100 children (K-10)
"Do children texts hold the key to commonsense knowledge?"
LLMs often struggle to limit their vocabulary to age-appropriate levels. This research develops a dataset and pipeline for fine-tuning models specifically to simplify and generate stories for younger age groups. 4. Notable Children's Text & Speech Datasets
Depending on your focus, here are the most relevant academic papers and datasets involving children's text and AI: