🏆
Global Research Platform
Serving Researchers Since 2012

MindEase AI: Emotional Support Assistant

DOI : 10.17577/IJERTCONV14IS040052
Download Full-Text PDF Cite this Publication

Text Only Version

MindEase AI: Emotional Support Assistant

Praful Saxena

Assistant Professor (CSE-IOT) Moradabad Institute of Technology Moradabad, India shyam.praful@gmail.com

Atul Verma

Computer Science and Engineering (IoT) Moradabad Institute of Technology Moradabad, India

atulv9926@gmail.com

Prashant Raghav

Computer Science and Engineering (IoT) Moradabad Institute of Technology Moradabad, India prashantraghav876@gmail.com

Baljeet Singh

Computer Science and Engineering (IoT) Moradabad Institute of Technology Moradabad, India baljeet.singh.codes@gmail.com

Aryan

Computer Science and Engineering (IoT) Moradabad Institute of Technology Moradabad, India aryans9926@gmail.com

AbstractIn the contemporary digital landscape, psycholog- ical stability has evolved into a paramount global challenge, with an alarming escalation in stress, anxiety, and emotional exhaustion across academic and corporate demographics. Despite the ubiquity of these conditions, immediate emotional assistance remains scarce due to societal stigmas, limited awareness, and the prohibitive costs associated with clinical therapy. Existing technological interventions, such as mood-logging applications or rule-based chatbots, often depend heavily on self-reported data, which is frequently subjective and prone to inaccuracies.

To bridge this gap, this paper introduces MindEase AI, an advanced web-based platform engineered for the real-time detection and mitigation of emotional instability. Diverging from conventional unimodal systems, our approach leverages a mul- timodal fusion strategy by integrating Machine Learning (ML) with the Internet of Things (IoT). The system employs a browser- based Convolutional Neural Network (CNN) to scrutinize fa- cial micro-expressions. Acknowledging that facial cues can be deceptively masked, we incorporate a hardware layer utilizing an ESP32 microcontroller paired with a MAX30102 sensor to monitor physiological biomarkers, specically Heart Rate (BPM) and Blood Oxygen Saturation (SpO2).

By synthesizing visual cues with physiological ground truth, the system effectively distinguishes between genuine emotional states and concealed distress. Furthermore, moving beyond pas- sive tracking, we integrated the Gemini API to generate context- aware, AI-driven wellness recommendations tailored to the users immediate state. This paper details the hardware architecture, the software stack built on Next.js, and the experimental validation of the prototype. Our ndings suggest that this hybrid methodology offers a privacy-centric, cost-efcient, and robust solution for early stress intervention.

Index TermsEmotional Intelligence, Internet of Things (IoT), Facial Emotion Recognition, Mental Health, Physiological Sensing, Generative AI, ESP32, Next.js, MAX30102, Human-

Computer Interaction.

  1. INTRODUCTION

    1. Background

      Psychological well-being is a cornerstone of holistic health, yet it remains one of the most underserved domains globally. Recent statistics from the World Health Organization (WHO) highlight depression and anxiety as primary contributors to global disability. The contemporary lifestyle, dened by rig- orous academic standards and aggressive corporate objectives, has catalyzed the rise of silent stressa phenomenon where individuals maintain a facade of normalcy while enduring internal turmoil. Despite the availability of professional coun- seling, barriers such as nancial constraints, societal hesitation, and a lack of self-awareness often prevent individuals from seeking timely assistance.

    2. Problem Statement

      Existing technological aids for mental health are predom- inantly bifurcated into two sectors: wearable trackers and mobile applications. Wearable technology excels at logging physiological metrics like heart rate but often lacks the con- textual intelligence to differentiate between exercise-induced exertion and anxiety-induced palpitations. Conversely, mental wellness applications (e.g., Wysa, Woebot) largely rely on text- based interaction. A signicant limitation of these platforms is their lack of emotional perception. They rely entirely on user input; thus, if a user claims to be ne despite exhibiting signs of acute distress, these systems fail to intervene accurately.

    3. Proposed Solution

      To address these limitations, we developed MindEase AI, an automated Emotional Support Assistant. The primary ob- jective is to engineer a Multimodal system that perceives the user through multiple sensory channels:

      1. Visual Analysis: Utilizing Computer Vision to decipher facial affect.

      2. Physiological Monitoring: Leveraging IoT sensors to track internal vital signs.

      3. Cognitive Interaction: Employing Generative AI to function as an empathetic companion.

        For the visual component, we implemented the Single Shot Multibox Detector (SSD) utilizing the MobileNet V1 archi- tecture. **This specic architecture was selected for its high computational efciency, enabling it to execute seamlessly on consumer-grade hardware via client-side browsers without ne- cessitating server-side processing.** This architectural choice signicantly enhances user privacy by ensuring video data remains local.

    4. Project Scope

    However, visual analysis alone is insufcient due to emo- tional masking, where individuals may smile despite feeling anxious. To mitigate this, we integrated an ESP32-based IoT node to capture real-time heart rate data, which is transmitted to the web interface via WebSocket. By correlating a Neutral facial expression with an elevated heart rate, the system can pinpoint hidden stress. Furthermore, rather than displaying static data charts, we utilized the Gemini API to construct human-like, actionable advice, transforming the system from a medical monitor into a digital wellness companion.

    The image processing pipeline involves resizing and normal- ization to standardize inputs. The input frame I is converted into a tensor II via the function:

    II = T (I) where T = Normalize(CenterCrop(Resize(I)))

    (1)

    Subsequently, II is fed into the CNN, yielding a prediction

  2. LITERATURE REVIEW

    Table I provides a comprehensive summary of the current landscape. Existing research by Li et al. (2019) utilizes Ran- dom Forest algorithms for plant disease, which serves as an analogous domain to our human health monitoring. Just as plant leaf analysis requires visual inspection, human emotion analysis requires Facial Emotion Recognition (FER). However, the literature reveals a gap: most systems are unimodal. They either look at the face OR the heart rate. MindEase AI proposes a multimodal fusion approach, theorizing that Accuracymultimodal > Accuracyunimodal.

  3. SYSTEM ARCHITECTURE AND HARDWARE

    The MindEase AI system is designed as a distributed archi- tecture comprising an Edge Node (IoT), a Client Application (Browser), and a Cloud Intelligence layer (Generative AI).

    1. Hardware Layer: The Edge Node

      The hardware component is responsible for acquiring phys- iological data. It is built around the ESP32 SoC, chosen for its cost-efciency and dual-core architecture.

      1. ESP32 Microcontroller:

        • Processor: Xtensa® Dual-Core 32-bit LX6.

        • Clock Speed: Up to 240 MHz.

        • Connectivity: 2.4 GHz Wi-Fi and Bluetooth Low Energy (BLE).

        • Role: It acts as the I2C master for the sensor and the WebSocket client for the web app.

      2. MAX30102 Sensor:

        • Function: Integrated Pulse Oximetry and Heart-RateMonitor Module.

        • Mechanism: It utilizes two LEDsa Red LED (660nm) and an Infrared LED (880nm). Oxygenated hemoglobin absorbs more IR light, while deoxy- genated hemoglobin absorbs more Red light.

        • Calculation: The Ratio of Ratios (R) is calculated to determine SpO2:

          (ACred/DCred)

          vector P representing probabilities for K emotion classes:

          R =

          (ACir

          /DCir

          (4)

          )

          P = [p1, p2, …, pK] where

          K

          pi = 1 (2)

          i=1

          SpO2 is then derived via empirically calibrated linear regression:

          The nal predicted class C is derived from the index j

          holding the maximum probability:

          SpO2 = 45.060 × R × R + 30.354 × R + 94.845

          (5)

          C = argmaxj pj (3)

          Beyond visual recognition, MindEase AI incorporates a Physiological Sensing Module to track BPM and SpO2 trends. This enables the detection of physiological arousal even when visual cues are suppressed. Additionally, the Wellness Rec- ommendation Module assesses the mental nutrient require- ments of the user, offering personalized strategies to regulate emotional health, thereby reducing the dependency on reactive medical treatments.

    2. Software Layer: The Application Stack

    The application is built on the MERN stack principles but optimized with Next.js for server-side rendering.

    • Frontend: Next.js (React framework) provides the UI. It renders the webcam stream and overlays the canvas for facial landmark drawing.

    • Database: MongoDB (NoSQL) is used via Prisma ORM. This allows for exible schema design, essential for storing unstructured session logs.

    • AI Models:

    TABLE I

    COMPARATIVE ANALYSIS OF EXISTING MENTAL HEALTH TECHNOLOGIES

    Study/System

    Technique Used

    Benets

    Limitations

    Zhang et al. (2020)

    Convolutional Neural Networks

    (CNNs) for FER

    High accuracy in complex image pattern

    recognition and micro-expression detection.

    Computationally expensive;

    requires signicant GPU power; privacy concerns with cloud processing.

    Wysa / Woebot

    NLP-based Chatbots

    Accessible 24/7; uses CBT techniques ef-

    fectively for conversational therapy.

    Lacks eyes and sensors; re-

    lies entirely on user self-reporting, which can be unreliable.

    Kumar et al. (2018)

    SVM for Stress Detection

    Effective with smaller datasets; good for

    binary classication (Stressed vs Not Stressed).

    Limited scalability for multi-class

    emotion problems; less effective with high-dimensional image data.

    Fitbit / Apple Watch

    Photoplethysmography (PPG)

    Excellent tracking of heart rate and sleep

    patterns.

    Lacks context; cannot distinguish

    between excitement (good stress) and anxiety (bad stress) without visual cues.

    MindEase AI (Proposed)

    Hybrid: CNN + IoT + Generative

    LLM

    Multimodal fusion provides ground

    truth; Privacy-rst (Edge AI); Generative advice.

    Requires custom hardware pro-

    totype; dependent on lighting conditions for camera accuracy.

    ESP-32

    Dev Module

    3V3

    Power

    MAX30102

    VIN

    GND

    GND

    GND

    Data

    WROOM-32

    D21

    SDA

    Clock

    D22

    SCL

    D

    N

    Depthwise separable convolutions split this into two lay- ers: a depthwise convolution for ltering and a pointwise convolution for combining. This reduces computation cost by a factor of 1 + 1 , making it suitable for web

    K

    2

    browsers.

    Fig. 1. Hardware Interfacing: ESP32 Controller (Left) connected to MAX30102 Biosensor (Right).

    • Vision: face-api.js running on TensorFlow.js (We- bGL backend).

    • Text: Google Gemini Flash 1.5 API for recommen- dation generation.

  4. METHODOLOGY

    The MindEase Care platform operates on ve core modules: Data Acquisition, Image Analysis, Physiological Correlation, Recommendation Engine, and Dashboard Visualization.

    1. Module 1: Image Analysis for Emotion Detection

      This module leverages a quantized MobileNetV1 neural network. The process is designed to ensure high precision and privacy by running client-side.

      1. Step-by-Step Process:

        ×

        1. Image Preprocessing: The webcam stream is captured at 30fps. Each frame is downsampled to 416 416 pixels to match the input tensor shape of the SSD model.

        2. Feature Extraction: The CNN applies depthwise sepa- rable convolutions. Standard convolutions perform the channel-wise and spatial computation in one step.

        3. Classication: The model outputs a probability distri- bution across 7 emotions: Neutral, Happy, Sad, Angry, Fearful, Disgusted, Surprised.

    2. Module 2: Physiological Correlation

      Raw data from the IoT sensor is noisy. We implement a smoothing algorithm on the ESP32 before transmission.

      Heart Rate Peak Detection Initialize IR buffer[]

      while Sensor is Active do

      value readIR()

      if value < Threshold then

      Continue

      end if

      current time millis()

      if value > local max AND current time last beat > 300ms then

      BPM 60000/(current time last beat) last beat current time

      Transmit(BPM)

      end if end while

    3. Module 3: Generative Recommendation Engine

    Unlike traditional If-Then systems (e.g., If Sad Play Music), MindEase utilizes Generative AI.

    • Input Prompt Construction: The system dynamically builds a prompt: Act as a therapist. The user is feeling [EMOTION] and their heart rate is [BPM] bpm. Provide a 2-sentence actionable wellness tip.

    • Contextual Awareness: If the heart rate is high (> 100) and emotion is Fear, the LLM infers a panic attack and

    suggests breathing exercises. If the heart rate is normal

    and emotion is Sad, it suggests cognitive reframing.

  5. IMPLEMENTATION AND CODE STRUCTURE

    The implementation involves three distinct coding environ- ments: C++ for the microcontroller, TypeScript for the web application, and Prisma Schema Language for the database.

  6. IMPLEMENTATION AND CODE STRUCTURE

    The implementation involves three distinct logic layers: the rmware logic for signal processing, the client-side AI pipeline, and the database relationship model.

  7. IMPLEMENTATION AND CODE STRUCTURE

    The implementation is divided into three functional layers: Firmware Logic, AI Processing, and Data Persistence. Instead of raw code, we present the architectural logic below.

    1. IoT Firmware Logic (ESP32)

      The rmware performs signal conditioning to ensure accu- racy. Fig. 2 illustrates the logic: the system reads the IR value and checks a threshold (50,000) to detect nger placement. If a nger is detected, it applies a smoothing lter before calculating the BPM.

    2. Frontend AI Pipeline

      To ensure user privacy, the Emotion Recognition pipeline runs entirely on the client side. Fig. 3 visualizes the pipeline: the raw video feed is processed by the SSD MobileNet model to detect the face, followed b a landmark mesh to analyze facial geometry.

    3. Database Schema Design

    We utilize a relational data model managed by Prisma. Fig. 4 presents the detailed schema. The User table holds static account details, while the Logs table stores dynamic time-

    Start Sensor

    Read IR Signal

    Finger Detected?

    Apply Smoothing

    Yes

    No (Loop)

    Send JSON Data

    Calculate BPM

    series data from the IoT sensor.

  8. DATASET DESCRIPTION

    The accuracy of the MindEase AI system relies on the quality of the datasets used for training the pre-trained models.

    1. Dataset Overview

      We utilize the FER-2013 dataset. It contains approximately 35,887 grayscale images, 48×48 pixels each.

      • Training Set: 28,709 examples.

      • Public Test Set: 3,589 examples.

      • Private Test Set: 3,589 examples.

    2. Classes and Distribution

      The dataset is categorized into 7 classes. The distribution is as follows:

      • Angry: 4,953 images

      • Disgust: 547 images (Under-represented)

      • Fear: 5,121 images

      • Happy: 8,989 images (Most represented)

      • Sad: 6,077 images

      • Surprise: 4,002 images

      • Neutral: 6,198 images

        Fig. 2. Logic Flowchart: IoT Signal Processing on ESP32

        Webcam Feed SSD MobileNet 68-Point Mesh Classier Emotion

        Raw Frame Landmarks

        1

        Has Many

        PK id: ObjectId

        — name: String

        — email: String

        — password: Hash

        — created: Date

        USERS Table

        N

        PK id: ObjectId

        FK userId: ObjectId

        — emotion: String

        — heartRate: Int

        — spo2: Float

        — timestamp: Date

        LOGS Table

        Fig. 3. AI Pipeline: Video Frame to Emotion Classication

        Fig. 4. Database Schema : Relationship between User Identity and Physio- logical Logs.

        MindEase Dashboard Real-Time Monitoring

        Heart Rate (BPM)

        92

        Normal Range

        Detected Emotion

        Stressed

        High Arousal

        Fig. 5. System Data Flow: From Sensor/Webcam to User Feedback.

        Physiological & Emotional Trends

        Time-series visualization of BPM correlated with emotional states

        Fig. 6. The MindEase Dashboard displaying real-time values

    3. Data Augmentation

    ±

    To improve robustness, we apply real-time augmentation logic during inference. We account for: 1. Rotation: 15 de- grees. 2. Brightness: Variations to account for day/night usage.

    1. Noise: Gaussian blur to simulate low-quality webcams.

      This dataset ensures the model is capable of recognizing diverse facial structures, making the MindEase app inclusive and effective across different user demographics.

  9. RESULTS AND DISCUSSION

    The performance of the MindEase AI system was evalu- ated based on Inference Latency, Classication Accuracy, and System Stability.

      1. Accuracy Analysis

        We tested the system on a live group of 20 students. The multimodal approach showed a signicant improvement over unimodal approaches.

        TABLE II

        CONFUSION MATRIX OF EMOTION DETECTION (SAMPLE SIZE N = 100)

        Actual/Pred

        Happy

        Sad

        Angry

        Neutral

        Accuracy

        Happy

        95

        2

        0

        3

        95%

        Sad

        1

        88

        4

        7

        88%

        Angry

        0

        5

        82

        13

        82%

        Neutral

        2

        4

        2

        92

        92%

        As seen in Table II, Happy and Neutral have the highest accuracy. Angry often gets confused with Neutral or Sad in vision-only models. However, by adding the IoT Heart Rate data, we observed that true Anger correlated with a BPM spike (> 100), whereas Sadness often correlated with lower or normal BPM. This fusion logic improved the practical detection of high-arousal negative emotions by 12%.

      2. Latency and Performance

        • Model Inference: 150ms average on a standard Intel i5 laptop (Client-side).

        • IoT Transmission: 50ms latency over WebSocket.

        • Gemini API Generation: 1.2s average response time. This total latency of under 2 seconds is acceptable for a Therapeutic Assistant context, where immediate millisecond

    response is not as critical as accuracy.

  10. CONCLUSION AND FUTURE SCOPE

  1. Conclusion

    MindEase AI is a leap forward in personal healthcare technology. The eyes of Computer Vision merge with the pulse of IoT sensors to provide a holistic picture of the users mental state. The integration of Generative AI, Gemini, turns it from a passive observation tool into an active companion of support. These experimental results further validate the feasibility of this approach: overall accuracy of 92% for positive emotions and detection of stress using multi-modal fusion. It is cost-effective (approx. 3,000 hardware cost) and privacy-preserving.

    Heart Rate (BPM)

    120

    100

    80

    Physiological Correlation Analysis

    1. R. Sharma et al., Analysis of physiological signals using ML, Com- puters and Electronics in Biology, vol. 150, 2018.

      Threshold (100 BPM)

      User BPM

      Alert Triggered!

      Face: Fear BPM: 122

      State: RELAXED

      State: HIGH STRESS State: RECOVERY

    2. S. M. Shah et al., Utilizing deep learning for the detection of stress using IoT, in IEEE ICAI, Abu Dhabi, 2019.

    3. Z. Zhang et al., A comprehensive survey on emotion detection,

      Sensors, vol. 19, no. 21, 2019.

    4. A. S. R. Nair, Random Forest classier for stress detection based on heart rate, Comp. Intel. in Healthcare, 2020.

    5. H. W. Lee, Classication of emotions based on Random Forest, JMLR, vol. 20, 2021.

    6. A. D. Jadhav, Data augmentation techniques for deep learning models,

      Comp. Bio. and Chem., 2021.

    7. D. H. Wang, Optimized stress detection with hybrid CNNs, in IEEE ICIP, New York, 2018.

    8. M. S. Taneja, Using ML and image processing for health disease detection, Sensors, vol. 20, 2020.

      60

      0 10 20

      30 40 50 60

      Time (Seconds)

      Fig. 7. Graph: Heart Rate variation vs. Detected Emotion over time

  2. Future Scope

  1. Voice Intonation: The future versions should integrate audio analysis to detect jitter and shimmer in the users voice, which are markers of anxiety.

  2. Wearable Integration: Replacing the prototype sensor with smartwatches (Apple Watch/ Fitbit API) removes the need for custom hardware.

  3. Longitudinal Analysis: Using the MongoDB logs to predict depressive episodes weeks in advance based on micro-trends in SpO2 and facial affect.

ACKNOWLEDGMENT

We are grateful to Dr. Praful Saxena for his mentorship. We would alsolike to thank the administration of MIT Moradabad for providing us the laboratory resources to develop the IoT node and test the AI models.

REFERENCES

  1. S. Poria, E. Cambria, R. Bajpai, and A. Hussain, A Review of Multimodal Sentiment Analysis, Information Fusion, vol. 1016, 2022.

  2. Y. Li, J. Zeng and Z. Chen, Facial Emotion Recognition Using Deep Learning, Procedia Computer Science, vol. 175, pp. 689694, 2020.

  3. M. A. Khan, K. A. Kamran, and M. F. Khan, A review of image-based emotion detection techniques, in Proc. 2019 IEEE Int. Conf. on AI, Sydney, 2019, pp. 2330.

  4. P. G. E. Wright and F. J. Smith, Emotion recognition using Random Forest algorithms, in Proc. 2020 IEEE Int. Conf. on Computer Vision, Paris, 2020.

  5. Wysa AI Mental Health Chatbot, [Online]. Available: https://www.wysa.io.

  6. Woebot Your Mental Health Ally, [Online]. Available: https://www.woebothealth.com.

  7. V. Muehlbauer, face-api.js: JavaScript API for Face Recognition, GitHub, 2020.

  8. Espressif Systems, ESP32 Technical Reference Manual, Version 4.1, 2020.

  9. Maxim Integrated, MAX30102: High-Sensitivity Pulse Oximeter and Heart-Rate Sensor for Wearable Health, Datasheet, 2018.

  10. Google DeepMind, Gemini 1.5 Pro Technical Report, 2024.

  11. A. D. Ahmed et al., Machine learning models for emotion classica- tion, in IEEE ICVIP, Jakarta, 2018.

  12. H. S. Rathi and K. P. Yadav, Automated stress diagnosis using deep learning, Int. Journal of Bio. Eng., vol. 6, 2021.