Giving Kirundi a Voice in the AI Era
A fellowship narrative by Jules Cesar Junior NdayisengaI grew up speaking Kirundi, a language shared by over 12 million people across Burundi, the Democratic Republic of Congo, and a growing diaspora. It is the language of my family, my community, my identity. When I transitioned from a nursing career into AI engineering, I expected to find my mother tongue somewhere in the vast ecosystem of datasets, models, and platforms that power modern AI. It wasn't there. Not in OpenAI. Not in Google. Not in Meta. Not anywhere.
This absence isn't just a technical gap. It is a form of digital exclusion. When Siri or Google Assistant cannot understand a question asked in Kirundi, millions of Burundians are silently shut out of the technological revolution. Banks cannot deploy voice-based customer service for their Kirundi-speaking clients. NGOs cannot use AI to deliver health or agricultural information by voice to rural, non-literate communities. The language itself risks becoming invisible in the digital world.
I decided to change that. In November 2025, together with my co-founder, I launched Ijwi ry'Ikirundi AI, an open-source initiative to build the first comprehensive AI-ready datasets for the Kirundi language. The name means "The Voice of Kirundi," because that is exactly what we are building: a voice for a language that the AI world has never heard.
Starting from zero, I designed and built the entire technical infrastructure: a serverless contribution platform where anyone can submit Kirundi sentences and audio recordings without installing anything (built on GitHub Pages and Google Apps Script, with zero hosting cost); an automated Python ETL pipeline that cleans, validates, and deduplicates contributions; and a versioned, publicly accessible dataset hosted on Hugging Face Hub via Git LFS, structured for speech recognition (ASR), text-to-speech (TTS), and natural language processing (NLP) research.
In just a few months, our team has collected over 4,700+ validated sentences, processed more than 32,900 words, and attracted researchers and contributors, all at zero infrastructure cost. Every component is open-source and documented. The architecture is deliberately designed to be replicated for other low-resource African languages: Kinyarwanda, Swahili, and beyond.
Our ambition extends beyond data collection. The medium-term vision is to become the linguistic data hub for East Africa, targeting 250,000 sentences and 500 hours of audio within 18 months. The long-term goal is to enable working speech recognition, machine translation, and AI assistants that speak Kirundi. We aim to monetize access through a B2B API, serving banks, telecoms, and NGOs who want to automate services in Kirundi, making the project self-sustaining.
This work is deeply personal. I come from a non-traditional tech background. I trained as a nurse before teaching myself software engineering and AI. That journey taught me that the most impactful technology isn't built in Silicon Valley labs; it's built by people who understand the problems firsthand. Together with my co-founder, we bring complementary skills: I lead the technical architecture while we jointly drive community growth and strategic direction.
With Ijwi ry'Ikirundi AI, we are not just building a dataset. We are building digital sovereignty infrastructure, ensuring that the Kirundi language and the people who speak it have a place in the AI-powered future. Every sentence we collect is a step toward that future. Ikirundi cacu, Ijwi ryacu: Our Kirundi, our voice.