šŸ†
Global Research Authority
Serving Researchers Since 2012

AVA: AI Veterinary Assistance – An NLP and Semantic Vector-Based Clinical Decision Support System for Animal Healthcare

DOI : https://doi.org/10.5281/zenodo.19369783
Download Full-Text PDF Cite this Publication

Text Only Version

 

AVA: AI Veterinary Assistance – An NLP and Semantic Vector-Based Clinical Decision Support System for Animal Healthcare

Mr. Alen Denny

Department of Computer Science & Engineering Federal Institute of Science and Technology Angamaly, India

Mr. Christin V S

Department of Computer Science & Engineering Federal Institute of Science and Technology Angamaly, India

Mr. Basil Paul

Department of Computer Science & EngineeringĀ Federal Institute of Science and Technology Angamaly, India

Ms. Sheelu Susan Mathews

Assistant Professor (Guide) Department of Computer Science & EngineeringĀ Federal Institute of Science and Technology Angamaly, India

Mr. Christo Tomy Joseph

Department of Computer Science & EngineeringĀ Federal Institute of Science and Technology Angamaly, India

Abstract Quick and correct veterinary assessment is a major problem for pet owners who often nd it hard to tell the difference between minor health issues and dangerous emergencies. Traditional symptom checking methods depend on hospital-based clinical visits, which take a lot of time, need specic institutions, and are not available outside working hours. This paper presents AVA (AI Veterinary Assistance), a smart, NLP-based clinical decision support system for basic veterinary assessment. The system handles unstructured natural language symptom descriptions using a two-part prediction design that includes a Lexical Heuristic Matcher and a Semantic Vector Engine built on the all-MiniLM-L6-v2 SentenceTransformer model. AVA pulls out structured patient proles from free-form text, links symptoms to a carefully collected MongoDB disease database of over 205 conditions, creates relevant follow-up questions, and provides ranked possible diagnoses with condence scores and urgency levels. Testing results show a macro-average AUC of 0.988 and strong disease classication performance across multiple veterinary categories. The system is built as an interactive Streamlit web application with multi-language support, voice input through Whisper ASR, and optional skin lesion image analysis. AVA offers a scalable, easy-to-use, and clear AI-powered framework for helping pet owners and veterinary professionals in basic clinical assessment.

Keywords Articial Intelligence; Natural Language Processing; Veterinary Decision Support; Semantic Embeddings; Clinical Triage; Disease Prediction; SentenceTransformers; Streamlit.

  1. INTRODUCTION

    Veterinary care depends heavily on quick and correct identica- tion of clinical signs. Pet owners are increasingly the rst ones to assess animal health, yet they lack clinical training to tell the difference between conditions that need immediate emergency attention and those that can be handled with home care. Delays or wrong understanding in this basic assessment phase can badly affect patient outcomes [1].

    The fast growth of AI and natural language processing (NLP) in medical and clinical elds has created new chances for smart decision support systems. While big improvements have been made in human healthcare applications [2, 7], veterinary medicine has received much less attention in the clinical AI research. Existing tools for pet owners either depend on very basic keyword matching or provide general, often worrying in- formation taken from common search engines, neither of which is good enough for organized clinical assessment.

    This paper introduces AVA (AI Veterinary Assistance), a spe- cialized clinical decision support system made to ll this gap. AVA takes unstructured text or voice descriptions of animal symptoms, pulls out structured clinical proles using a multi- step NLP pipeline, and provides ranked possible diagnoses with urgency levels and specic follow-up questions. The system

    combines a rule-based Lexical Heuristic Engine with a dense Semantic Vector Engine, getting strong performance even when user descriptions use informal or unclear medical language.

    The main contributions of this work are:

    1. A two-part disease prediction design combining heuristic and semantic approaches;
    2. A structured NLP extraction pipeline for veterinary clinical attributes from free-form text;
    3. An adaptive follow-up question generation module that im- proves diagnostic condence;
    4. A carefully collected MongoDB-backed veterinary disease database covering 205+ conditions;
    5. A ready-to-use, multi-language web interface with voice and image input support;
    6. Experimental testing showing macro-average AUC of 0.988 on test data.

      The rest of this paper is organized as follows: Section II gives background and related work; Section III describes the sys- tem methodology; Section IV details the system architecture and modules; Section V presents algorithms; Section VI cov- ers implementation; Section VII reports experimental results;

      Section VIII discusses challenges and limitations; Section IX concludes with future research directions.

  2. BACKGROUND AND RELATED WORK
      1. AI in Clinical Decision Support

        Clinical decision support systems (CDSS) have changed a lot with improvements in machine learning and NLP. Early rule- based expert systems have been replaced by data-driven models that can handle unstructured clinical text. Large language mod- els (LLMs) such as BERT [7] and GPT-4o [6] have shown strong performance on clinical information extraction tasks, and few-shot prompting methods have further improved structured extraction from limited medical records [2].

        Image-based diagnostic AI has also grown, with deep learning models achieving high accuracy in radiology [3] and histopathol- ogy tasks. Audio-based diagnostic tools, such as lung sound classiers using one-dimensional convolutional neural networks (1D-CNNs), have achieved over 90% accuracy in respiratory condition detection [4]. Disease onset prediction models using electronic health records have achieved around 85% reliability in future population health settings [5].

      2. Veterinary AI

        Veterinary AI research has focused mainly on image-based diag- nostics, with deep learning frameworks achieving F1 scores of up to 0.88 in multi-class animal disease classication tasks [1]. However, NLP-based veterinary assessment systems that accept free-form natural language input remain rare. The challenges specic to veterinary NLP include the variety of species-specic symptom vocabulary, informal owner descriptions, and the lack of large labelled datasets similar to those available in human medicine.

      3. Semantic Embedding-Based Retrieval

    Dense vector retrieval using sentence-level embeddings has proven useful for clinical information retrieval tasks. The all-MiniLM-L6-v2 model from the SentenceTransform- ers library produces 384-dimensional embeddings that balance computational efciency with strong semantic similarity per- formance [8]. Cosine similarity over pre-computed disease embeddings allows real-time matching even on basic hardware, making it suitable for deployment in settings with limited com- putational resources.

  3. METHODOLOGY

    AVA uses a modular, pipeline-based methodology designed to ensure clear separation between input processing, clinical reasoning, and output generation.

    1. Literature and Knowledge Base ConstructionVeterinary disease proles were collected from peer-reviewed veterinary references, clinical databases, and domain expert con- sultation. Each disease record includes species compatibility ags, symptom lists, severity classication, treatment recom- mendations, and prevention guidelines. The resulting MongoDB collection contains 205+ disease entries covering gastrointesti- nal, respiratory, dermatological, neurological, urinary, and sys- temic categories across dogs, cats, and bovine species.
    2. NLP Extraction Pipeline

      Raw user input (text or transcribed audio) is processed through a multi-step NLP pipeline built using spaCy and NLTK. The pipeline performs tokenization, lemmatization, and pattern- based extraction across 30+ veterinary symptom categories. Pa- tient demographic attributes (species, age, breed, weight) are extracted using specic regular expressions. Negation detection and contextual ltering are applied to reduce wrong symptom assignments.

    3. Dual-Engine Prediction

      AVA uses two complementary prediction engines working in parallel:

      Lexical Heuristic Engine: Calculates a baseline condence score for each candidate disease by computing the ratio of matched symptoms to total known disease symptoms. Cate- gorical boosters (+0.03 per matched category) and species lters are applied to adjust scores.

      Semantic Vector Engine: Encodes the patient symptom prole as a 384-dimensional embedding using all-MiniLM-L6-v2. Cosine similarity is calculated against pre-embedded disease vectors stored in MongoDB. A combined score mixing semantic similarity (75% weight) and lexical overlap (25% weight) is calculated and normalized.

    4. Adaptive Follow-Up Generation

      The top-ranked candidate diseases are passed to the Follow-Up Question Generator, which nds missing information in the initial patient description and creates 68 targeted clarifying questions organized across three categories: Symptom Details, Medical History, and Lifestyle. Questions are prioritized based on the severity of the leading possible diagnosis.

    5. Evaluation Protocol

      The system was tested on a held-out dataset of veterinary case descriptions labeled with ground-truth disease categories and severity levels. Standard classication metrics were calculated: accuracy, weighted F1-score, and multiclass area under the ROC curve (AUC). Bayesian hyperparameter optimization was ap- plied to calibrate the condence threshold (best C = 0.003594).

  4. SYSTEM ARCHITECTURE

    Figure 1: Overall System Architecture of AVA

    The AVA system architecture is organized into three main pro-

    cessing stages, as shown in Figure 1.

      1. Preprocessing Stage

        The preprocessing stage takes raw multi-modal input (text, au- dio, or image). Text is cleaned and normalized; audio is tran- scribed using the Whisper ASR model; uploaded skin images undergo quality validation. Demographic attributes (species, age, breed, weight) are extracted and structured for further pro- cessing.

      2. Analysis and Processing Stage

        The main processing stage includes ve sequential components:

        • VeterinaryNLPAnalyzer: Rule-based symptom extraction over 30+ categories.
        • MongoDiseaseRepository: Disease matching and ranking against the collected database.
        • Lexical Prediction: Heuristic scoring and categorical con- dence boosters.
        • Semantic Vector Engine: SentenceTransformer-based em- bedding similarity computation.
        • Condence Calculation: Bayesian score fusion and urgency classication.
      3. Validation and Output Stage

    The validation stage applies score ltering, creates the adaptive follow-up question set, and puts together the nal structured JSON clinical report containing ranked disease candidates, con- dence scores, urgency assessment, and treatment recommenda- tions for display on the Streamlit interface.

  5. MODULE DESCRIPTIONS
    1. NLP Patient Analyzer Module

      The NLP Patient Analyzer handles unstructured text describ- ing animal health issues. It extracts demographic variables (animal type, age, breed, weight) using named-entity recog- nition patterns and regular expressions. Symptom extraction uses deep pattern matching across 30+ veterinary symptom cat- egories including gastrointestinal, respiratory, dermatological, neurological, and urinary signs. Complex symptom attributes duration, severity, and frequency are captured through con- textual phrase analysis, and surrounding context is ltered to resolve unclear clinical signs.

    2. Lexical Prediction Module

      The Lexical Heuristic Matcher serves as the main fast-scoring engine. It queries the MongoDB disease collection ltered by the target species and calculates a baseline condence score as the ratio of matched to total known disease symptoms. Cate- gorical condence boosters are applied for symptom-category alignment (e.g., +0.03 when a urination abnormality maps to the urinary category). Conditions falling below a pruning threshold are removed, and the remaining candidates are returned with calculated heuristic condence values.

    3. Semantic Vector Engine Module

      The Semantic Vector Engine works as a separate micro-service. During ofine pre-computation, disease text proles (name,

      description, symptom list) are encoded into 384-dimensional dense vectors using all-MiniLM-L6-v2 and stored in the MongoDB diseases vector collection. During inference, the patient symptom text is encoded using the same model, cosine similarity is calculated against all stored disease vectors, scores are adjusted by a 25% lexical overlap component, and a normalized semantic ranking matrix is returned. A smooth fallback to the lexical engine is provided if the vector service is unavailable.

    4. Follow-Up Question Generator Module

    The Follow-Up Question Generator nds missing clinical details from the initial patient description and smartly produces 68 contextual follow-up questions. Questions are organized into three categories Symptom Details, Medical History, and Lifestyle and are prioritized based on the severity of the top- ranked possible diagnosis. Answers to follow-up questions are added into a rened condence update cycle, greatly improving overall clinical report accuracy.

  6. ALGORITHMS
    1. Main System Workow
      1. Accept raw unstructured text or audio input.
      2. Pass input to NLP Analyzer to clean and extract patient attributes and symptoms.
      3. Run base heuristic scoring through the Lexical Prediction Engine.
      4. Perform semantic matching via the Semantic Vector Engine.
      5. Receive condence scores and ranked candidate diseases.
      6. Send leading disease candidates to the Follow-Up Question Generator.
      7. Compile and output the structured clinical JSON report to the Streamlit UI.
    2. Lexical Heuristic Scoring
      1. Get target species and the NLP-extracted symptom set.
      2. Query the MongoDB diseases collection for the matching animal prole.
      3. Calculate baseline score: (matched symptoms) / (total dis- ease symptoms).
      4. Apply categorical condence boosters per symptom-category alignment.
      5. Remove conditions that fail the condence threshold.
      6. Return ranked array of conditions with heuristic condence values.
    3. Semantic Vector Engine

      Ofine pre-computation:

      1. Extract text proles (name, description, symptoms) for each disease.
      2. Encode into 384-dimensional vectors via

        all-MiniLM-L6-v2.

      3. Store vector arrays into the diseases vector MongoDB

        collection.

        Real-time prediction:

        1. Receive live patient symptom text.
        2. Transform text using the same SentenceTransformer model.
        3. Calculate cosine similarity against all stored disease vectors.
        4. Adjust scores via localized lexical overlap (25% weight).
        5. Return normalized semantic ranking matrix.
  7. IMPLEMENTATION DETAILS

    A. Technology Stack

    Table 1: AVA Technology Stack

    pre-computing and caching disease embeddings in MongoDB rather than recalculating during inference. Multi-language sup- port required integration of a translation layer to normalize non-English inputs before NLP processing. Whisper ASR inte- gration added latency for voice inputs, partially xed through chunked audio processing. Regulatory and privacy considera- tions led to the design decision to process all data locally without sending patient information to external APIs.

  8. PERFORMANCE EVALUATION
    1. Evaluation Metrics

      System performance was measured using standard multi-class classication metrics calculated on the verified diseases vector held-out test set. Met-

      rics include per-class precision, recall, weighted F1-score,

      Component Details

      Language Python 3

      Frontend Streamlit, HTML/CSS NLP Frameworks spaCy, NLTK Embedding Model SentenceTransformers

      (all-MiniLM-L6-v2)

      Vector Database MongoDB (diseases vector collec-

      accuracy, and multiclass ROC AUC (one-vs-rest). Condence score calibration was done via Bayesian hyperparameter search over the regularization parameter C.

    2. Quantitative Results

      Table 2: Classication Performance on Test Set

      Auth Database

      tion) SQLite

      Metric Value

      Speech Input OpenAI Whisper ASR

      Image Analysis CNN-based skin lesion classier

      1. Backend Architecture

        The backend is built in Python 3 using object-oriented principles that maintain strict separation between NLP extraction, database interaction, and prediction logic. The VeterinaryNLPAnalyzer, MongoDiseaseRepository, LexicalPredictor, and SemanticVec- torEngine modules are independently created and combined within the main analysis pipeline.

      2. Data Management

        A mixed database strategy is used. MongoDB Atlas stores un- structured disease JSON documents, multi-dimensional vector arrays, and analysis history records. SQLite manages structured user authentication and session data. Disease embeddings are pre-computed ofine and stored as native BSON arrays, allow- ing sub-second similarity retrieval during inference.

      3. Frontend Interface

        The Streamlit web application provides a responsive, stateful interface supporting English and Malayalam language modes. Users can enter symptoms as free-form text, record voice input (processed via Whisper ASR), and optionally upload skin im- ages for dermatological analysis. Diagnostic results are shown as ranked disease cards with condence meters, urgency badges, and structured treatment recommendations. Analysis history is saved per user account across sessions.

      4. Implementation Challenges

      Several practical challenges were faced during development. Variation in natural language symptom descriptions required extensive regex pattern libraries and negation-handling logic. Real-time semantic similarity computation was optimized by

      Macro-Average AUC 0.988

      Micro-Average AUC 0.984

      Weighted F1-Score (validation) 0.75

      Accuracy (validation) 0.75

      Best Regularization (C) 0.003594 Disease Database Coverage 205+ conditions Symptom Categories 30+

      Figure 2: Multiclass ROC Curve for AVA Disease Classication (Macro-avg AUC = 0.988, Micro-avg AUC = 0.984)

      The ROC curve analysis (Figure 2) shows strong per-category discrimination. Category-specic AUCs include: species- specic (1.000), bacterial (1.000), respiratory (1.000), viral

      (0.963), and skin (0.964). The macro-average AUC of 0.988 conrms strong discriminative power across all disease cate- gories.

      The row-normalized confusion matrix (Figure 3) shows that the semantic vector engine achieves strongest performance on severe-category conditions (45% correctly classied as severe), with moderate conditions showing 43% correct classication. Mild conditions show distributed classication between moder- ate (50%) and severe (50%) bins, reecting the natural difculty of telling mild from moderate presentations in low-symptom descriptions. These results indicate that severity classication is the primary area for further improvement.

      Figure 3: Row-Normalized Confusion Matrix for Severity Classication

    3. Comparison with Baseline Approaches

    Table 3: Comparison of Prediction Approaches

    Approach Description AUC
    Keyword search Simple term matching
    Lexical Heuristic only Rule-based scoring 0.85
    Semantic Vector only Embedding similarity 0.97
    AVA Dual-Engine Hybrid fusion 0.988

    Table 3 shows the advantage of the dual-engine hybrid approach over either component alone. The combination of heuristic pre- cision with semantic generalization consistently outperforms single-engine baselines, conrming the core architectural deci- sion.

  9. COMPARISON STUDY

    Table 4 summarizes related works in veterinary and clinical AI, showing the gap that AVA addresses through its NLP-based, dual-engine, multimodal approach.

  10. SOCIAL RELEVANCE AND SDGS

    AVA directly supports United Nations Sustainable Development Goal 3 (Good Health and Well-being) by encouraging timely animal health assessment and reducing the risk of condition worsening due to delayed treatment. The system makes prelimi- nary veterinary guidance more accessible, particularly in rural

    and underserved areas where qualied veterinarians may not be easily available.

    The system also adds to SDG 9 (Industry, Innovation and In- frastructure) by improving veterinary assessment through the integration of NLP and dense vector embeddings into a scalable, production-ready digital health infrastructure. The modular architecture supports extension to additional animal species, languages, and clinical domains.

    From a clinical workow perspective, AVA provides veterinari- ans with structured, pre-organized patient histories that reduce consultation time and cognitive load, improving overall care quality and efciency.

  11. CONCLUSION

This paper presented AVA, an AI-powered veterinary clinical decision support system that addresses the imporant need for accessible, accurate, and clear preliminary assessment tools in veterinary medicine. The system combines a Lexical Heuris- tic Engine and a Semantic Vector Engine in a dual-prediction design, achieving a macro-average AUC of 0.988 on a col- lected veterinary disease dataset. The modular pipeline, adaptive follow-up question generation, multi-language interface, and voice/image input capabilities together provide a comprehensive and production-ready solution.

The study reviewed key design decisions and implementation challenges, showing important trade-offs between lexical preci- sion and semantic generalization. While disease classication performance is strong, severity grading remains the main area for further improvement, indicating the need for larger labeled veterinary datasets and ne-tuned domain-specic language models.

Future research should focus on: (i) ne-tuning domain-specic veterinary language models on larger datasets; (ii) integrating telemedicine capabilities for real-time veterinarian consulta- tion; (iii) expanding species coverage to include avian, equine, and aquatic animals; (iv) applying zero-knowledge or federated learning approaches for privacy-preserving multi-clinic deploy- ment; and (v) conducting prospective clinical validation studies in real veterinary practice environments.

ACKNOWLEDGMENT

The authors express sincere gratitude to the Department of Com- puter Science and Engineering, Federal Institute of Science and Technology (FISAT), Angamaly, for providing the infrastructure and academic support necessary for this research. Special thanks to the faculty and peer reviewers whose constructive feedback strengthened the quality of this work.

REFERENCES

  1. Y.-G. Jin, G. Wu, J.-W. Seo, S.-J. Park, S.-H. Hur, D.

    Aliyeva, J.-H. Park, and K.-M. Kim, AI Veterinary As- sistance: Enhancing Clinical Decision-Making in Animal Healthcare, IEEE Access, 2025.

  2. S. Agrawal et al., Large Language Models are Few-Shot Clinical Information Extractors, in Proc. EMNLP, 2022.

    Table 4: Comparison of Related Works

    Reference Methodology Advantage Limitation

    Agrawal et al. [2] Few-shot LLM clinical IE High precision extraction Low recall in sparse descriptions

    Mayats-Alpay [3] Image-based deep learning High diagnostic accuracy No text/voice input

    Ali et al. [4] 1D-CNN lung sound classication 90% accuracy, lightweight Single condition, single modality Jin et al. [1] Deep learning veterinary framework F1 = 0.88 No NLP or symptom text input Grout et al. [5] AI-based disease onset prediction 85% reliability Requires EHR data; not real-time

    AVA (Ours) NLP + Semantic Vector dual-engine AUC = 0.988, multilingual,

    voice/image

    Severity classication requires im- provement

  3. L. Mayats-Alpay, Articial Intelligence for Automatic Detection and Classication of Disease on X-Ray Images, arXiv preprint, 2022.
  4. A. S. W. Ali, M. M. Rashid, M. U. Yousuf, S. Shams et al., Towards Clinical Decision Support via Lung Sound Classication Using 1D-CNN, Sensors, 2024.
  5. R. Grout, R. Gupta, R. Bryant, M. A. Elmahgoub et al., Predicting Disease Onset from Electronic Health Records for Population Health Management, Frontiers in Articial Intelligence, 2024.
  6. OpenAI, GPT-4o Technical Overview and Large Lan- guage Model Applications, 2024.
  7. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Lan- guage Understanding, in Proc. NAACL-HLT, 2019.
  8. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit et al., Attention Is All You Need, in Advances in Neural Infor- mation Processing Systems (NeurIPS), 2017.
  9. S. Sharma et al., Deep Learning-Based Diagnosis and Prognosis of Alzheimers Disease: A Review, Medical AI Survey Study, 2022.
  10. Kim et al., Deep Learning-Based Lung Cancer Diagnosis Using Respiratory Cytology Images, Clinical AI Study, 2023.