Vyapar AI - Voice-First Business Assistant for Small Enterprises

Darshan Nitin Bhanushali; Yash Amit Bhavsar; Janvi Kishor Pandav; Ganesh Wadmare

doi:10.5281/zenodo.21126772

Volume 15, Issue 06 (June 2026)

Vyapar AI – Voice-First Business Assistant for Small Enterprises

DOI : 10.5281/zenodo.21126772

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 0
Authors : Darshan Nitin Bhanushali, Yash Amit Bhavsar, Janvi Kishor Pandav, Ganesh Wadmare
Paper ID : IJERTV15IS061078
Volume & Issue : Volume 15, Issue 06 , June – 2026
Published (First Online): 02-07-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Vyapar AI – Voice-First Business Assistant for Small Enterprises

*Darshan Nitin Bhanushali, *Yash Amit Bhavsar, *Janvi Kishor Pandav

Ganesh Wadmare -Faculty Guide

Department of Artificial Intelligence and Data Science

K.J. Somaiya Institute of Technology Sion (E), Mumbai – 400 022, India

Abstract: India has a large number of micro, small, and medium enterprises (MSMEs) over 63 million in total, almost all of which use manual processes and paper records for their day-to-day activities. Adoption of technologies is hindered by lack of digital literacy, cumbersome user interfaces, lack of adequate support for regional languages, and dependence on internet access. Vyapar AI tackles these problems by offering a voice-first, offline-capable mobile application for Android based on natural voice commands in Hinglish (Hindi + English) allowing owners of small businesses to handle their inventories, invoicing, client data, payment history, and business intelligence. The software combines React Native CLI frontend, Python FastAPI backend, and on-device speech recognition with Vosk engine, thus completely removing dependence on internet connectivity. Speech is analyzed with a deterministic rule-based NLP system, enabling prompt and predictable interactions without resorting to cloud-based APIs. Invoices can be shared and payment reminders sent directly from the application through deep linking for WhatsApp, thus avoiding use of external API services altogether. All business intelligence, such as sales summary, best-selling products, and stock shortages, is processed by the AI engine offline. This paper presents design considerations, architecture description, mathematical models, implementation details, and evaluation of features provided by Vyapar AI along with a comparison of closely-related applications and directions for further research. Index Terms: Voice-First Mobile Applications, Offline Speech Recognition, MSME, Rule-Based NLP, Hinglish, Android, FastAPI, WhatsApp Integration, Business Intelligence, Vosk.

Keywords: MSME Digitization, Voice-First Applications, Offline Speech Recognition, Hinglish NLP, Rule-Based Natural Language Processing, Android Application Development, React Native, FastAPI, Vosk Speech Engine, Offline AI Systems, Business Intelligence, Inventory Management, Invoice Automation, WhatsApp Integration, Edge Computing, Small Business Automation, Regional Language Support, Mobile Computing, AI for MSMEs, Offline-Capable Applications

Introduction

There are over 63 million MSMEs in India, which contribute a total of about 30% to the country’s GDP and provide employment to over 110 million people. While the economic impact is significant, over 95% of these businesses do not use any form of digital management technology. The MSME owner, usually a shopkeeper, trader, or small-scale manufacturer, uses a manual register, manual calculations, and other rudimentary forms of accounting that are prone to errors and inefficiencies.

The reasons for the lack of penetration of management software among these businesses are clear. Factors such as low digital literacy levels, complex UI/UX design, insufficient support for Hindi and regional languages, and dependence on consistent internet connectivity hinder the adoption of business management software by this demographic. Current solutions targeting larger businesses assume a level of technological proficiency and internet availability that is unavailable in MSMEs.

This paper proposes Vyapar AI, a voice-based mobile application allowing small business owners to manage their operations end-to-end using natural spoken Hindi and English

commands without requiring the internet for its core functionalities.

The proposed application is built using a state-of-the-art full- stack development framework using a React Native CLI front-end (Android) and a Python FastAPI back-end. The on- device speech-to-text conversion is accomplished using the Vosk offline speech-to-text engine, eliminating the need for the internet for speech recognition. Recognized text is parsed using a rule-based NLP engine to identify user intent and extract entities such as customer names, product quantities, and monetary amounts. The back-end executes the business logic to perform tasks such as managing inventory, creating invoices, managing customers, tracking payments, generating AI-driven insights, and integrating with WhatsApp using deep links.

The following are the contributions made by this paper:
1. Voice-first MSME management mobile application that supports Hinglish commands;
2. Deterministic rule-based NLP engine for recognizing nine categories of business intent;
3. WhatsApp integration using deep links and no external API calls;
4. End-to-end solution with features validation in Android.
BACKGROUND AND RELATED WORK
The relevant literature to consider here in terms of context is centred around the development of voice-firt mobile assistants for small businesses that function in contexts where connectivity is poor, and levels of literacy are low. Additionally, industry reviews of voice-first technology in India reveal that there is a significant yet untapped potential in voice for the MSME market segment within India.

Research on voice-driven transaction systems has proven that voice can successfully log monetary figures and client codes.

Literature Review and Comparative Analysis

Table I presents a structured comparison of Vyapar AI against the most closely related systems identified in the literature. The comparison evaluates each system across voice modality, offline capability, backend architecture, and key distinguishing strengths and gaps. Vyapar AI is the only system in this comparison that combines full offline voice interaction in Hinglish, self-contained Android APK deployment, WhatsApp deep-link communication, and local AI-driven business insights within a single application.

TABLE I

Comparative Analysis of Related Systems

System / Reference	Voice Modality	Offline Support	Backend / NLP	Key Strengths	Gaps vs Vyapar AI
Dukawalla [13]	Voice- first	Limited cloud- based	Cloud generative	Field- tested in Africa SMBs	Not fully offline; no local analytics
Visual + Voice Inventory [8]	Voice + CV	Hybrid	Custom backend + STT	87% less manual entry	No invoicing or WhatsApp
Offline Android Assistant [4]	Voice- first	Fully offline	Pattern- match rules	Full offline; lightweight	No business domain; no invoices
Offline AI Voice Asst. [5]	Voice- first	Mainly offline	Rule- based NLP	Privacy- focused; low- resource	Not tailored to SMB operations
Vyapar App (commercial) [20]	GUI-first	Online- oriented	Cloud- based	Rich feature set; WhatsApp	Not voice- first; not fully offline
Vyapar AI (this work)	Fully voice-first (Hinglish)	Core features fully offline	Rule- based NLP + local analytics	Full-stack APK; no external APIs	User-study benchmarks ongoing

RESEARCH GAP ANALYSIS
1. Absence of an Integrated Voice-First MSME Solution
  
  There have been many efforts toward voice assistants in the context of small businesses; however, none of them integrate the requirements of voice-first, offline capabilities, Hinglish language, inventory management, invoicing, customer management, payments tracking, and business insights into one integrated Android application. Vyapar AI solves that problem.
2. Rule-Based NLP for Deterministic Business Commands
  
  Current solutions utilize intent recognition models that leverage machine learning algorithms, which require significant computational power and rely on cloud APIs. A rule-based NLP model used in the current study offers a much faster response time, as it returns a deterministic result based on a predefined set of commands. It does not require any GPU hardware nor Internet connection, providing a unique value proposition for the given use case.
3. WhatsApp as Default Communication Channel
  
  The current literature does not take into consideration the fact that WhatsApp messenger plays a prominent role in MSMEs’ daily communication routines in India. This project incorporates invoice sending and payment reminders through WhatsApp deep linking. It leverages an existing communication channel without requiring any third-party API integration, which is a new addition to the field of research.
4. Offline-First Architecture for Low-Connectivity Markets
Although cloud-based solutions exist for managing small businesses, their operation relies on stable internet connectivity, which cannot be guaranteed in certain regions. An offline-first approach adopted by Vyapar AI sends transcriptions only to a local backend server, thus minimizing dependence on network infrastructure.

Proposed System: Vyapar AI

System Overview

Vyapar AI uses a layered and offline-first client-server system architecture designed to reduce dependency on networks and optimize local computation. The entire system functionality is expressed as follows:

F(audio) = TTS(BL( NLP( STT(audio) ) ) )

Here, STT converts the input audio signal into text using the pretrained offline acoustic model; NLP then translates the transcription into {intent, entities}; BL applies business logic to the input and generates a corresponding response; finally, TTS produces speech output based on the response locally.
Layer 1: On-Device Speech Recognition

The speech recognition process is performed on the device itself, with Vosk offline engine. The only information sent to the server is the transcribed text string. The latency of STT does not exceed 500 ms at all times, effectively bypassing any network round-trip for voice capturing and operating without connectivity.

Layer 2: Rule-Based NLP Engine

The NLP engine uses keyword pattern matching with no machine learning dependencies, enabling O(n) intent detection relative to command length with response times under 100 ms. Nine intent categories are supported: create_invoice, check_stock, add_product, update_stock, add_customer, send_reminder, get_sales, get_top_products, and get_low_stock. Table II illustrates example Hinglish voice commands for each intent.

TABLE II

Intent	Example Voice Command
create_invoice	“Ramesh ko 500 ka invoice bana do”
check_stock	“Maggi ka stock kitna hai”
add_product	“Rice add karo 20 quantity”

NLP Intent Categories and Example Commands

Intent	Example Voice Command
update_stock	“Tata Tea ka stock 50 karo”
add_customer	“Naya customer add karo Ramesh”
send_reminder	“Ramesh ko payment reminder bhejo”
get_sales	“Total sales kitna hai”
get_top_products	“Top product kaunsa hai”
get_low_stock	“Kaunse products ka stock kam hai”

Layer 3: FastAPI Backend and Business Logic

The business logic is implemented using the Python framework FastAPI in modular fashion with a dedicated layer per functionality. The FastAPI backend includes five layers of business logic covering Inventory Management, Invoice Creation & PDF Generation, Customer Management, Payment Tracking, and Sales Analytics respectively. Asynchronous design of FastAPI allows efficient processing of concurrent requests through non-blocking I/O. FastAPI backend can be run on lcal devices as well as local network server.

Total invoice amount calculation formula: Total(I) = (Qi × Pi); i = 1 to n (where Qi – quantity, Pi – unit price of each line item).

Inventory is decreased automatically upon generating an invoice. An alert is triggered once Stock < threshold.

Month sales calculation: Sales(month) = Total(Ij) over all invoices in that month.

Exposure of pending payments: Pending = Total(Ij) over all invoices with payment_status = ‘unpaid’.
Layer 4: React Native CLI Frontend

The Android frontend is implemented in React Native CLI (no Expo) and delivers eight core screens: Dashboard, Voice Input, Inventory, Invoice, Customer, Payment Tracking, Analytics, and Settings. The Voice Input screen provides a microphone button with real-time waveform visualisation, displays transcribed text and detected intent, and speaks the backend response via Android Native TTS. The Dashboard presents summary cards for total sales, pending payments, product count, and low-stock alerts.
Layer 5: WhatsApp Integration

Vyapar AI integrates WhatsApp via wa.me deep links without requiring the WhatsApp Business API. When a user requests a payment reminder or invoice share, the backend generates a pre-filled message string and the app opens WhatsApp with the customer’s phone number and message pre-populated. This approach leverages the most widely used communication channel in the Indian MSME market at zero API cost.

Tech Stack Summary

Table III summarises the complete technology stack deployed in Vyapar AI

TABLE III

Technology Stack

Component	Technology
Frontend	React Native CLI (Android only, no Expo)
Backend	FastAPI (Python, async, modular)
Database	SQLite (development) / PostgreSQL (production)
Speech-to-Text	Vosk (offline, on-device, ~40 MB model)
Text-to-Speech	Android Native TTS (on-device)
NLP Engine	Rule-based intent detection + entity extraction
PDF Generation	ReportLab / WeasyPrint (backend)
WhatsApp Integration	wa.me deep links (no API required)
Deployment	Self-contained Android APK

RESULTS AND FEATURE VALIDATION

Feature	Status	Offline Capable
Voice-controlled inventory management	Implemented	Yes
Invoice creation and PDF generation	Implemented	Yes
Customer management	Implemented	Yes
Payment tracking (paid / unpaid)	Implemented	Yes
WhatsApp invoice sharing (deep link)	Implemented	Requires WhatsApp
WhatsApp payment reminders (deep link)	Implemented	Requires WhatsApp
AI business insights total sales	Implemented	Yes
AI business insights top products	Implemented	Yes
Low-stock alerts	Implemented	Yes
Hinglish voice commands	Implemented	Yes
On-device STT (Vosk)	Implemented	Yes
Android TTS response output	Implemented	Yes

The application was implemented and tested on Android hardware. All twelve core features were successfully implemented and validated. Speech processing runs entirely on-device with STT latency under 500 ms. Rule-based NLP operates under 100 ms for all intent categories. Table IV summarises the feature validation status for the current implementation.

CHALLENGES AND OPEN ISSUES
1. NLP Generalisation
  
  While the NLP rule-based system performs quickly and deterministically, it is limited to a predetermined number of intents and lexical patterns. The ambiguity in some phrases or those requiring contextual interpretation could result in a classification error, requiring the use of workarounds or reprompts from the user. Using a simple on-device sequence classifier model is seen as an improvement within reach.
2. Scalability for Multi-Device Deployments
  
  While the commands in Hinglish (Hindi + English) are covered completely, India is known to have a multitude of languages. Hence, a significant share of the MSMEs that constitute our user base use Marathi, Gujarati, Bengali, Tamil, or Telugu as their mother tongue. Training an acoustic model that covers such languages is no small feat.
3. Language Coverage
  
  While Hinglish (Hindi + English) commands are fully supported, India’s linguistic diversity means that a large share of the MSME target audience speaks Marathi, Gujarati, Bengali, Tamil, or Telugu as their primary language. Extending acoustic model support to these languages is a significant undertaking requiring region-specific training data.
4. GST and Regulatory Compliance
  
  TABLE IV
  
  Feature Validation Summary
  
  There is currently no functionality regarding the computation of GST, the preparation of GSTIN based invoices, or summary statements for monthly returns. Compliance with India’s Goods and Services Tax policy is a must for formal adoption in business. This feature is a high priority for the next phase of development.
5. Formal User Evaluation
User studies with relevant participants and latency benchmarks are underway. Without a formal usability test conducted, the extent to which the voice system eases the workload for MSME owners compared to current GUI systems is uncertain.
CONCLUSION

The Vyapar AI system provides a proof-of-concept that voice is a viable primary means of interacting with business tools in informal markets. Through offline speech recognition, deterministic and rule-based natural language processing (NLP) techniques, and a fully featured business management backend system, this project has made cutting-edge business management solutions powered by artificial intelligence available to less digitally literate individuals who do not have reliable internet connectivity and prefer to interact in Hinglish.

Using offline speech recognition on the device removes the limitation of lack of internet connectivity, while the deterministic nature of NLP used guarantees quick response time with high accuracy of results, making it explainable and predictable, all without needing cloud AI services [4][5][6]. Integration with WhatsApp through deep links allows for

easy invoicing and payment reminders without paying any third-party API fees [13][15].

FastAPI microservices allow for incorporation of more business features such as GST filing, expense tracking, and supplier management in subsequent iterations, thereby making this solution more robust than existing offerings. Overall, this project has shown that a thoughtful application of artificial intelligence technology and engineering design can go a long way in improving business efficiency through voice interfaces.
FUTURE SCOPE

Future improvements to the Vyapar AI include:
LIST OF ABREVATIONS
1. MSME Micro, Small and Medium Enterprise
2. NLP Natural Language Processing
3. STT Speech-to-Text
4. TTS Text-to-Speech
5. API Application Programming Interface
6. APK Android Package Kit
7. GDP Gross Domestic Product
8. GST Goods and Services Tax
9. UPI Unified Payments Interface
10. UI/UX User Interface / User Experience
CONFLICT OF INTEREST

The authors declare no conflict of interest. No external funding was received for this study.

XI. ACKNOWLEDGEMENT

The authors gratefully acknowledge the guidance and support of the faculty at the Department of Artificial Intelligence and Data Science, K.J. Somaiya Institute of Technology, Mumbai. All three authors contributed substantially to the design, implementation, analysis, and reporting of the system presented in this paper.

XIII. REFERENCES

IRJET. Voice Based Billing System. International Research Journal of Engineering and Technology, Vol. 6, Issue 3. Available: https://www.irjet.net/archives/V6/i3/IRJET-V6I31136.pdf
IJARSCT. Voice Based Billing System Multilingual, Voice- Activated Billing for Shopkeepers. Available: https://ijarsct.co.in/Paper25159.pdf
AI Accountant. Voice-Based Accounting Entry: The Game-Changer for Indian Finance Teams. Available: https://www.aiaccountant.com/blog/voice-based-accounting-entry- india
Academia.edu / IJRASET. Design and Development of a Privacy- Preserving Offline AI Assistant for MSMEs. Available: https://www.ijraset.com/best-journal/design-and-development-of-a- privacypreserving-offline-ai-assistant
IJRASET. Building an Offline Virtual Voice Assistant using AI and NLP. Available: https://www.ijraset.com/best-journal/building-an- offline-virtual-voice-assistant-using-ai-and-nlp
IJSAT. Enhancing User Autonomy and Privacy: An Offline Virtual Voice Assistant. Available: https://www.ijsat.org/research- paper.php?id=9830
SciTePress. Offline Speech Recognition Development Systematic Review. Available:

https://www.scitepress.org/papers/2018/67880/67880.pdf
IJIRT. An AI-Based Visual and Voice-Controlled Inventory System.

Available: https://ijirt.org/Article?manuscript=180458
IJIRT. Visual and Voice-Assisted Inventory Automation. Available: https://ijirt.org/publishedpaper/IJIRT188367_Paper.pdf
Polsri e-prints. Inventory Management System using AI Voice.

Available: http://eprints.polsri.ac.id/22452/
IJERT. Voice-Based Indexing System in Warehouse Management. Available: https://www.ijert.org/voice-based-indexing-system-in- warehouse-management
IJRASET. An Intelligent Smart Retail System for Voice-Guided Stock Monitoring. Available: https://www.ijraset.com/best-journal/an- intelligent-smart-retail-system
arXiv. Dukawalla: Voice Interfaces for Small Businesses in Africa.

Available: https://arxiv.org/pdf/2505.05170.pdf
Engineering for Change. AI Voice Assistant for a Textile Marketplace App in Rural India. Available: https://www.engineeringforchange.org/projects/voice-recognition- feature-for-rural-textile-producers/
PwC India. Voice First Research Insights Report. Available: https://www.pwc.in/assets/pdfs/research-insights/2019/voice-first.pdf
IJRPR. Voice-Recognition-based Money Transaction System Survey, Vol. 4, Issue 4. Available: https://ijrpr.com/uploads/V4ISSUE4/IJRPR11454.pdf
ThinkDebug. From Swiping to Speaking: Why Voice-First Apps Are the Future. Available: https://thinkdebug.com/from-swiping-to- speaking-why-voice-first-apps-are-the-future-in-2025/
Vosk. Offline Speech Recognition API for Python, Android, and More.

Available: https://alphacephei.com/vosk/
FastAPI. Modern, Fast Web Framework for Building APIs with Python. Available: https://fastapi.tiangolo.com/
Vyapar App. Business Billing and Accounting Software for SMBs.

Available: https://vyaparapp.in/
VoiceInvoicer. Voice-Driven Invoice Creation App. Available: https://apps.apple.com/
B. Henkels, C. D. Schultz, A. De Keyser, and D. Mahr, “The sound of progress: AI voice agents in service,” J. Service Manage., vol. 37, no. 1, pp. 132, Feb. 2026, doi: 10.1108/JOSM-06-2025-0269.
“Does your voice assistant remember? Analyzing conversational context recall and utilization in voice interaction models,” arXiv preprint arXiv:2502.19759, 2025.
Y. Mei, Y. Zheng, D. Xu, and Y. Long, “SHNU multilingual conversational speech recognition system for INTERSPEECH 2025 MLC-SLM Challenge,” in Proc. Interspeech 2025, pp. 15, Jul. 2025, arXiv preprint arXiv:2507.03343.
“NTU Speechlab LLM-based multilingual ASR system for Interspeech MLC-SLM Challenge 2025,” arXiv preprint arXiv:2506.13339, Jun. 2025.
T. Alumäe and A. Fedorchenko, “TalTech systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge,” in Proc. Interspeech 2025, pp. 1

4, Jun. 2025, arXiv preprint arXiv:2506.01458.
“The multimodal information based speech processing (MISP) 2025 Challenge: Audio-visual diarization and recognition,” in Proc. Interspeech 2025, pp. 15, May 2025, arXiv preprint arXiv:2505.13971.
“CompanionCast: A multi-agent conversational AI framework with spatial audio for social co-viewing experiences,” arXiv preprint arXiv:2512.10918, Dec. 2025.
“Enhancing speech emotion recognition with graph-based multimodal fusion and prosodic features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Intespeech 2025,” in Proc. Interspeech 2025, pp. 15, Jun. 2025, arXiv preprint arXiv:2506.02088.
“TriageSim: A conversational emergency triage simulation framework from structured electronic health records,” arXiv preprint arXiv:2603.10035, 2025.
“RelayS2S: A dual-path speculative generation for real-time dialogue,” arXiv preprint arXiv:2603.23346, 2025.