Kirundi AI Ecosystem - Dataset Overview
Kirundi AI Ecosystem - Contribution App
Kirundi AI Ecosystem - Data Pipeline
Kirundi AI Ecosystem - Community

Project Information

Kirundi AI Ecosystem 🇧🇮

A complete infrastructure to digitize the Kirundi language for AI applications. This flagship project combines a Gold Standard Dataset hosted on Hugging Face with a gamified web application that enables community-driven data collection.

Key Features

  • Gold Standard Dataset: 40,000+ sentences and audio clips with automated cleaning and validation
  • Serverless Architecture: Zero-cost backend using Google Apps Script as an API layer
  • Gamified Contribution: User-friendly web app for translations and audio recordings
  • Automated Pipeline: Python scripts for ETL, data cleaning (Pandas/RegEx), and Hugging Face deployment
  • CI/CD Workflows: Automated data validation and quality checks

Impact

This initiative is preserving the Kirundi language for the AI era, making it accessible for natural language processing, machine translation, and speech recognition applications. By building a comprehensive dataset and fostering a community of contributors, we're ensuring that low-resource languages like Kirundi are not left behind in the AI revolution.