Bhashini.ai Platform Architecture
System Context, Core Components, and Technology Stack
System Context, Core Components, and Technology Stack
As a platform that enables critical human–AI interactions across languages and use cases, Bhashini.ai considers it important to clearly explain how its systems are designed, how data flows through them, and how AI capabilities are delivered at scale.
The following architectural diagrams and technology disclosures are presented with the intent of demystifying our platform, establishing technical credibility, and enabling users, partners, and stakeholders to make informed decisions. By sharing our system context and internal container-level architecture, we aim to provide clarity on how different components interact, how security and scalability are addressed, and how services are responsibly operated.
Equally important, Bhashini.ai openly acknowledges the extraordinary contributions of the global open-source community that underpin many modern advances in language AI. Our platform thoughtfully integrates selected open-source models, datasets, and libraries across Text-to-Speech, Automatic Speech Recognition, Translation, and OCR, while layering on years of proprietary engineering, system integration, product development, and operational expertise to deliver secure, scalable, and production-grade services. We approach open source not as simple technology reuse, but as foundational building blocks that are used with transparency, proper attribution, and responsible innovation, and are strengthened by significant in-house engineering that transforms these foundations into real-world, deployable language technology solutions.
The sections that follow reflect this philosophy through transparent system architecture, clear attribution of open-source components, and a responsible presentation of how Bhashini.ai integrates these into its own production-grade platform.
This section explains who uses the Bhashini AI Platform, what systems exist, and how they interact, in a step-by-step narrative form.
1. People Involved
User
A person who uses Bhashini.ai services either from a web browser or a mobile application.
The user consumes AI services such as speech, text, translation, and e-books.
2. Bhashini Internal Systems
The Bhashini platform consists of three main internal systems:
2.1 Bhashini AI Services
This is the core AI backend system.
It provides the following capabilities:
Text-to-Speech (TTS)
Speech-to-Text (STT)
Language Translation
Optical Character Recognition (OCR)
These services are exposed using REST APIs and WebSocket APIs.
2.2 Bhashini.ai eBook Management System
This system is responsible for:
Managing e-books
Delivering DRM-protected e-books
It uses Readium LCP for digital rights management and content protection.
2.3 Bhashini.ai ERP System
This system manages:
User subscriptions
API credits
E-book purchases
Payments and billing
It is implemented using ERPNext / Frappe.
3. External Systems
The platform also interacts with external systems outside Bhashini:
3.1 B2B Client Applications or Agents
These are external applications built by Bhashini’s business customers.
They integrate with Bhashini AI Services using:
REST APIs
WebSocket APIs
Authentication is done using an API Key.
3.2 Sign in with Google
This is used for user authentication.
Google Sign-In works using the OpenID Connect protocol.
3.3 Razorpay Payment Gateway
This external system processes payments for:
Subscriptions
API credits
E-book purchases
It communicates with Bhashini.ai systems using secure HTTPS and webhooks.
4. Authentication Flow
The User authenticates using Google Sign-In.
Google verifies the user’s identity using OpenID Connect.
Once authenticated, the user can access Bhashini services.
5. AI Service Usage Flow
There are two main ways AI services are consumed:
5.1 Direct User Access
The user uses a web application or a mobile application to consume Bhashini.ai language technology services.
These web and mobile applications directly communicate with Bhashini AI Services using REST APIs and WebSocket APIs.
5.2 B2B Client Access
External B2B applications or agents also call Bhashini AI Services.
These calls are authenticated using API Keys.
The services provided are the same: TTS, STT, Translation, and OCR.
6. E-Book Access Flow
The user reads e-books using the Bhashini.ai eBook reading application on a web browser or a mobile device.
The eBook reading application communicates with the Bhashini.ai eBook Management System in the background.
This communication happens securely using HTTPS.
The e-books delivered to the user are DRM-protected using Readium LCP.
7. Payment and Subscription Flow
The payment flow happens in the following sequence:
The User purchases or renews subscriptions, credits, or e-books through the ERP system.
The ERP system initiates a payment request to Razorpay.
The User completes the payment on Razorpay’s platform.
Razorpay sends payment status updates back to the ERP system using webhooks.
After successful payment:
The ERP system sends subscription or credit information to Bhashini AI Services.
The ERP system sends e-book purchase information to the eBook Management System.
8. Overall Summary
The Bhashini AI Platform allows users and external business applications to consume AI language services such as speech, translation, OCR, and DRM-protected e-books. Users authenticate using Google Sign-In, make payments through Razorpay, and access services through secure APIs. An internal ERP system manages subscriptions, credits, and purchases, while the AI Services system delivers language intelligence and the eBook Management System handles protected digital book delivery.
This section explains what runs inside the Bhashini AI Platform, what each part does, and how data flows between them.
1. Person Using the System
User
A user consumes Bhashini.ai language technology services such as speech, translation, OCR, and transliteration using:
A web browser, or
A mobile application
2. Overall System Boundary
All the components described below exist inside a single system called:
Bhashini AI Platform
This platform is internally divided into logical layers for clarity.
3. Edge Layer (Entry Point into the Platform)
The Edge Layer is responsible for handling user requests, security, and routing.
3.1 Reverse Proxy
Reverse Proxy
Technology: nginx (running directly on the operating system)
Responsibilities:
Serves static content such as:
Web application files
Encrypted e-books
Terminates SSL, meaning it handles HTTPS security
Load-balances incoming traffic
Routes API calls to the backend services
3.2 Web Application
Web App
Technology: Vue.js with Bootstrap CSS
Responsibilities:
Runs in the user’s web browser
Provides user interfaces for:
Text-to-Speech
Speech-to-Text
Translation
OCR
Transliteration
All backend communication is done securely through APIs over HTTPS
4. Mobile Application
Mobile App
Technology: Android with Kotlin
Responsibilities:
Runs natively on the user’s mobile phone
Allows users to:
Scan Indic documents
Convert text into speech
Consume language services in their native language
Sends all service requests through the same backend APIs as the web app
5. Core Backend Layer
Bhashini AI Backend
Technology:
Java
OpenLiberty
Docker containers
Responsibilities:
Implements all REST and WebSocket APIs for:
TTS
STT
Translation
OCR
Transliteration
Handles:
Authentication
Authorization
Acts as the central coordinator between applications, AI models, and data stores
6. AI and Data Layer
This layer contains all systems required for AI inference and data storage.
6.1 AI Inference Engine
Triton Inference Server
Technology: NVIDIA Triton Inference Server running in Docker
Responsibilities:
Executes AI and machine learning models
Supports both:
Proprietary models
Open-source models
Receives inference requests from the AI Backend using gRPC
6.2 Caching Layer
Redis Cache
Technology: Redis Cache (an extremely fast, in-memory data store) running in Docker
Responsibilities:
Caches frequently used inference results
Uses an LRU (Least Recently Used) eviction policy
Improves performance and reduces repeated computation
6.3 Persistent Data Store
CouchDB
Technology: CouchDB (a document-oriented NoSQL database) running on the host operating system
Responsibilities:
Stores:
Subscription information
API credit balances
Records:
Metered API usage
Billing-related logs
7. Request and Data Flow (Step-by-Step)
The user uses either:
A web browser, or
A mobile app
For web users:
The browser loads the web application and static content from the Reverse Proxy using HTTPS.
For mobile users:
The mobile app communicates with backend APIs via the Reverse Proxy.
The Reverse Proxy forwards API requests to the Bhashini AI Backend using REST or WebSocket protocols.
The AI Backend:
Validates user access
Checks subscription and credit details from CouchDB
Uses Redis to retrieve cached results if available
If no cached result exists:
The AI Backend sends an inference request to Triton Inference Server using gRPC.
The inference result is:
Returned to the AI Backend
Optionally cached in Redis
Logged for usage tracking in CouchDB
The final response is sent back to:
The web application or mobile app
And then presented to the user
8. Overall Summary
The Bhashini AI Platform consists of a reverse proxy, web and mobile applications, a Java-based backend, AI inference services, caching, and a database. Users access services through web or mobile apps. Requests pass through the reverse proxy to the backend, which manages security, billing, caching, and AI inference. AI models run on Triton Inference Server, results are cached in Redis, and usage data is stored in CouchDB.
This description explains who is involved, what internal components exist, and how eBooks are published, purchased, licensed, and read within the Bhashini.ai eBook ecosystem.
1. People and External Systems Involved
1.1 Author or Publisher
An Author or Publisher creates and publishes eBooks on the Bhashini.ai platform.
They are later paid royalties for purchased books.
1.2 User
A User reads eBooks using the Bhashini.ai eBook Reader on:
A mobile device, or
A web browser.
1.3 Bhashini.ai ERP System
The ERP system manages:
eBook purchases
Payments
Royalties
It is implemented using ERPNext / Frappe.
2. Overall System Boundary
All components described below exist inside a single system called:
Bhashini.ai eBook Management System
This system is responsible for publishing, protecting, licensing, delivering, and tracking eBooks.
3. User-Facing Application
3.1 eBook Reader Application
Mobile App / Web App
Technology: Android (Kotlin) and Web
Responsibilities:
Allows users to read Indic language books
Supports books enriched with AI features
Uses DRM licenses to securely open encrypted books
The app does not store decryption keys permanently.
4. Core Backend Services
4.1 Bhashini.ai eBook Backend
Technology:
Java
OpenLiberty
Docker
Responsibilities:
Allows authors to publish eBooks
Manages:
eBook catalog
Users
Purchases
Coordinates license generation and delivery
Acts as the central controller of the eBook system
5. DRM and Encryption Components
5.1 LCP Encryption Tool
Technology: Go-based command-line tool
Responsibilities:
Encrypts newly published eBooks
Runs during the publishing workflow
Produces DRM-protected content
5.2 LCP License Server
Technology: Go, cryptography libraries, Docker
Responsibilities:
Generates LCP licenses for purchased eBooks
Each license defines:
Who can read the book
On which devices
Under what conditions
5.3 LCP Status Server
Technology: Go, cryptography libraries, Docker
Responsibilities:
Manages:
Device registration
Device activation
License revocation
Generates license status documents
Communicates with reading applications during book usage
6. Data Storage Components
6.1 Encrypted Content Repository
Technology: Network File System (NFS)
Responsibilities:
Stores encrypted eBook files
Contains no readable plaintext content
Books can only be decrypted using valid licenses
6.2 CouchDB Database
Technology: CouchDB (a document-oriented NoSQL database) running on the host system
Responsibilities:
Stores:
User purchase records
User-specific LCP license information
7. Publishing Flow (Author Perspective)
The Author publishes an eBook using the eBook Backend.
The backend invokes the LCP Encryption Tool to encrypt the eBook.
The encrypted eBook is sent to the LCP License Server.
The encrypted content is stored in the Encrypted Content Repository.
8. Purchase and Reading Flow (User Perspective)
The User purchases an eBook through the ERP system.
The ERP system sends purchase information to the eBook Backend.
The eBook Backend requests an LCP license from the LCP License Server.
The License Server notifies the LCP Status Server about the new license.
The eBook Backend stores the user’s license details in CouchDB.
The eBook Backend sends the LCP license to the user’s reading app.
The reading app downloads the encrypted eBook content from the content repository.
The reading app:
Retrieves license status documents
Registers the user’s device by communicating with the LCP Status Server.
9. Royalties Flow
The ERP system pays royalties to the Author or Publisher using UPI, based on eBook sales.
10. Overall Summary
The Bhashini.ai eBook Management System allows authors to publish encrypted eBooks and users to securely purchase and read them. Books are encrypted using LCP, licenses are generated and managed by dedicated license and status servers, and encrypted content is stored separately from licenses. Users read books through a mobile or web app using DRM licenses, while purchases, payments, and royalties are handled by the ERP system.
Yes — today’s open-source ecosystem makes it possible to assemble ASR, TTS, and translation pipelines.
But building a demo is easy. Building a production-grade, real-time, multilingual voice AI platform is not.
Here’s what teams typically underestimate:
Real systems are not a single model call.
You need multiple components working together:
ASR stack: speech recognition, VAD, language identification, speaker diarization
TTS stack: text normalization, spectrogram encoder, decoder, speed control
Each component has:
different latency profiles
different failure modes
different hardware requirements
Most DIY systems become fragile very quickly.
Bhashini.ai provides a unified orchestration layer that seamlessly manages multiple models in real time.
Handling live speech isn’t just running ASR.
You need:
accurate speech start/stop detection
pause/resume handling
interim vs final transcripts
seamless handoff to LLMs and TTS
Most teams underestimate this and end up with:
laggy interactions
broken conversational flow
poor user experience
Bhashini.ai provides a battle-tested voice orchestration layer (VAD + FSM) that makes real-time voice interactions feel natural and responsive.
Real-world applications require:
low-latency streaming
high concurrency
GPU-efficient inference
DIY systems often:
break under scale
become too expensive
fail to meet real-time expectations
Bhashini.ai is built for real-time production workloads using optimized runtimes (ONNX, Triton, LibTorch, etc.) for each model.
Text-to-speech is not just intelligibility:
naturalness, emotion, and consistency are critical
voice cloning and cross-lingual voice preservation are non-trivial
Without proper processing:
numbers, abbreviations, and formats sound wrong
speech sounds robotic or inconsistent
Bhashini.ai provides 100+ high-quality, culturally relevant voices with advanced text normalization and speech control.
Production systems require:
authentication & authorization
rate limiting and usage control
persistent sessions
reliability under load
Building and maintaining this stack (APIs + infra + scaling) is significant effort.
Bhashini.ai offers ready-to-use REST and WebSocket APIs built for enterprise-grade reliability and responsible AI usage.
Keeping up with:
new models
better datasets
evolving benchmarks
requires ongoing investment and engineering effort.
Bhashini.ai continuously improves the stack — without additional effort from your team.
Because your goal is not to build language infrastructure — it’s to build your product.
With Bhashini.ai, you get:
Production-ready multilingual voice AI platform
Real-time orchestration across multiple models
Low-latency, scalable infrastructure
High-quality voices optimized for Indian languages
Enterprise-grade APIs with built-in reliability
Continuous upgrades without maintenance overhead
You can build it yourself.
But you’ll spend months solving infrastructure, orchestration, and quality problems that Bhashini.ai has already solved.
Use Bhashini.ai — and focus on what makes your product unique.
BHASHINI (by Government of India) is a powerful initiative to make Indian languages digitally accessible.
Bhashini.ai builds on similar goals — but is designed for a very different use case:
building real-time, production-grade applications.
BHASHINI.gov.in
Focused on national-scale language infrastructure
Provides access to datasets, models, and APIs
Designed as a digital public good
Bhashini.ai
Built as a production-ready platform
Provides orchestration, optimization, and real-time capabilities
Designed for application developers and enterprises
Think of it as:
BHASHINI.gov.in = foundation
Bhashini.ai = production layer on top
BHASHINI.gov.in gives you access to:
ASR, TTS, translation models
datasets and benchmarks
But you still need to:
integrate multiple models
handle edge cases
build pipelines
Bhashini.ai gives you a complete system:
multi-model orchestration
real-time voice handling (VAD + FSM)
unified APIs
You don’t assemble — you integrate.
Voice AI is not just model accuracy — it’s interaction quality.
BHASHINI.gov.in focuses on:
model access and language enablement
Bhashini.ai focuses on:
real-time streaming
pause/resume handling
interim vs final transcripts
seamless conversational flow
This is critical for:
voice assistants
call automation
conversational AI
Production applications require:
low latency
high concurrency
efficient resource utilization
Bhashini.ai:
uses optimized runtimes (ONNX, Triton, LibTorch, etc.)
ensures consistent real-time performance
This level of optimization is essential for customer-facing systems
BHASHINI.gov.in provides:
strong foundational models and datasets
Bhashini.ai provides:
100+ curated voices
text normalization (numbers, abbreviations, etc.)
control over speech speed and delivery
consistent, production-quality output
Built for brand identity and real user experience
Production deployments require:
authentication & authorization
usage control and rate limiting
reliability under load
Bhashini.ai provides enterprise-grade REST and WebSocket APIs ready for deployment.
BHASHINI.gov.in evolves as:
a national language ecosystem
Bhashini.ai evolves as:
a product platform
continuously improving orchestration, performance, and quality
So your application improves — without rework.
Use BHASHINI.gov.in if you want:
access to datasets and base models
research and experimentation
language ecosystem participation
Use Bhashini.ai if you want:
to build and ship real-world applications
real-time voice experiences
scalable, production-ready systems