Trust-Aware Data Pipelines for Verifiable AI Accelerators: A Data-Centric Frame Work

Rashi Nimesh Kumar Dhenia; Milan Parikh

doi:10.17577/IJERTV14IS060187

Volume 14, Issue 06 (June 2025)

Trust-Aware Data Pipelines for Verifiable AI Accelerators: A Data-Centric Frame Work

DOI : 10.17577/IJERTV14IS060187

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 77
Authors : Rashi Nimesh Kumar Dhenia, Milan Parikh
Paper ID : IJERTV14IS060187
Volume & Issue : Volume 14, Issue 06 (June 2025)
Published (First Online): 28-06-2025
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Trust-Aware Data Pipelines for Verifiable AI Accelerators: A Data-Centric Frame Work

Rashi Nimesh Kumar Dhenia

Independent Researcher IEEE Member

Atlanta, United States

Milan Parikh Independent Researcher Senior IEEE Member Richmond, United States

AbstractAs AI accelerators become integral in various domains from healthcare to autonomous systems, the reliability and verifiability of their data pipelines have become critical. Current data pipelines for AI accelerators are typically opaque, lacking systematic mechanisms to ensure data provenance, integrity, and explainability at every stage. This paper presents a data-centric framework named Trust-Aware Data Pipelines (TDP) for AI accelerators, structured around five core stages: Data Ingestion, Lineage Tracking, Explainable Preprocessing, Conformance Verification, and Deployment Validation. By embedding robust validation methods, systematic lineage tracking, and explainable data transformations, TDP ensures comprehensive end-to-end visibility. The framework is demonstrated through two practical case studiesimage classification and sensor fusion tasks highlighting significant improvements in traceability, reproducibility, and accountability, with minimal performance overhead. The proposed approach provides a foundation for building verifiable, trustworthy AI systems capable of meeting stringent safety and accountability requirements in critical applications.

KeywordsAI accelerators, Trust-aware systems, Data pipelines, Lineage tracking, Explainable AI, Conformance verification, Deployment validation, Data-centric verification, Reproducibility, Accountability, Traceability

INTRODUCTION

AI accelerators specialized hardware for machine learning tasks have become integral in domains from vision to healthcare, yet the data pipelines feeding these accelerators often remain unverifiable and opaque. In traditional AI pipelines, data flows from ingestion to inference without guarantees that each step is trustworthy or reproducible. This lack of verifiability means that subtle errors or malicious tampering in the pipeline can go undetected, undermining confidence in the accelerators outputs [15], [16]. The problem of unverifiable AI pipelines is especially acute in hardware- bound AI systems, where once a model is deployed on-chip, limited introspection is possible into data transformations and decision pathways. Ensuring trust in these pipelines is vital not just for debugging and auditing, but for safety in critical applications like autonomous driving or medical diagnosis. Trustworthy pipelines in AI accelerators are those that provide end-to-end visibility and validation of data as it moves through the system. Such pipelines incorporate mechanisms for data provenance, consistent preprocessing, and cross-stage validation so that any stakeholder can trace how a raw input becomes a final prediction. The significance of this cannot be overstated: a verifiable pipeline enables accountability,

reproducibility, and safety. For example, in an autonomous vehicle, being able to trace a sensor reading through all preprocessing and inference steps can help verify why the accelerator decided to trigger braking. Without a trust-aware pipeline, we are left to guess whether an anomaly was due to sensor noise, a preprocessing bug, or a hardware fault. Previous work has identified the general need for self-managing, trustworthy systems. The vision of autonomic computing [18] laid out an agenda for systems that manage themselves and adapt to changes without human intervention, aiming for reliability and self-healing. However, autonomic computing approaches did not specifically address the data pipeline of AI accelerators, leaving a gap in how we assure trust in the data flowing through complex AI systems. More recent efforts in AI system verification have focused on components like models or hardware, but have not fully tackled the holistic pipeline. This paper addresses that gap by proposing a data-centric framework for trust-aware pipelines tailored to AI accelerators. We build on prior verification methods and data management practices while highlighting the missing link: an integrated pipeline that is verifiable at each stage. In doing so, we aim to ensure that every transformation from raw data to accelerator output is traceable and conforms to expected standards, thus making AI accelerators not just fast, but verifiably trustworthy in their end- to-end operation.
RELATED WORKS

Ensuring the correctness of AI accelerators has been an active research area, spanning model verification, hardware verification, and secure data flows. Prior approaches to model and hardware verification in accelerators provide a foundation for our work. For example, formal verification techniques have been applied to accelerator hardware logic [16], [1], [4] to mathematically prove correctness of circuit designs. Simulation-based testing of dataflow architectures is another common strategy [4], allowing designers to catch errors by emulating how data moves through custom AI chips under various scenarios. Model-driven engineering approaches have also been explored: Nelson & Soni (2023c) integrate high-level modeling with verification to address the complexity of modern AI hardware. Such model-driven methods help detect design discrepancies early and streamline verification workflows. Similarly, co-design verification methodologies ensure that hardware and software are verified in tandem [4], [16], checking that machine learning code and accelerator circuits function together correctly. These studies underscore that as

accelerators grow more complex, rigorous verification (from formal proofs to co-simulation) is paramount for reliability.

Another thread of related work focuses on the development process and software tools for reliable AI accelerators. Agile verification techniques [11] have been proposed to iteratively test and validate accelerator functionality during development, shortening the feedback loop for catching issues. In addition, robust unit testing and debugging tools specific to AI accelerator SDKs have been introduced [12]. These enable developers to write fine-grained tests for accelerator APIs and use specialized debuggers to trace errors in model deployment. Such tools increase confidence in the accelerator software stack. However, it is noteworthy that these approaches generally center on verifying the model and hardware behavior or the software code, rather than the data pipeline feeding the model.

In the context of trusted data flows and pipeline integrity, the literature is relatively sparse but growing. Data lineage and provenance have long been recognized as crucial for trust in data warehouses and big data systems [15].

Figure-1

Ensuring that data is traceable from source to destination can significantly improve an organizations trust in its analytics. For AI pipelines, similar concepts apply: each data items journey should be recorded. Some industry solutions [1], [8] emphasize choosing the right data storage and management tools to maintain data integrity at scale. Indeed, selecting appropriate data stores (relational, NoSQL, or data lakes) and versioning data are best practices that facilitate traceability and consistency [1], [8]. In distributed or cross-organization AI, blockchain technology has emerged as a tool to establish trust. For instance, Hazarika & Shah (2024a) demonstrate a blockchain-based framework for AI model sharing that ensures data provenance and model integrity in a decentralized manner. heir approach highlights how an immutable ledger can foster accountability in AI workflows. Other work has examined reliability from a systems perspective: integrating fault tolerance strategies into big data pipelines [6], [10], [17] and exploring serverless

architectures for distributed AI processing [6], [10], [17]. These efforts improve the robustness and scalability of data pipelines. However, they do not explicitly guarantee that every step of an AI accelerators pipeline is verifiable. To our knowledge, no prior approach has combined model/hardware verification with a comprehensive, data-centric pipeline framework as we propose. Our work extends the idea of end-to-end verification beyond just the model and hardware, to include the data transformations and flows that critically affect the trustworthiness of AI accelerator outputs.

Summary of Gaps: In summary, prior art provides strong methods for verifying accelerator hardware and software correctness (e.g., formal proofs, simulation tests, co- verification) and offers tools for improving reliability during development (agile testing, debugging frameworks). There are also emerging techniques for ensuring trust in distributed or large-scale data processing (lineage tracking, blockchain for provenance, fault tolerance measures). Yet, a gap remains in uniting these threads into a holistic pipeline perspective. Specifically, none of the existing works fully address how to maintain traceability, explainability, and compliance at each stage of an AI accelerators data pipeline. This papers framework fills that gap by drawing on these related efforts and adding new mechanisms to make data pipelines themselves first-class citizens in the verification process.

Figure-2 [8]
Since there is increased demand for effective, easy-to-use and smart cloud systems, adding AI to serverless technology offers an interesting research and development chance. Later parts of this document will review how AI assists in future resource allocation, show the results of our experiments and consider what these findings mean for future cloud-native applications.
PROPOSED FRAMEWORK

Our proposed framework, Trust-Aware Data Pipeline (TDP) for AI Accelerators, defines a sequence of pipeline stages, each fortified with mechanisms to ensure verifiability and trustworthiness (Figure 1) [3]. The framework is data-centric: it treats the data and its transformations as equally important to verify as the model or hardware. By enforcing checks and record-keeping at each stage, the pipeline becomes transparent and auditable end-to-end. We describe five key stages of the pipeline Data Ingestion, Lineage Tracking, Explainable Preprocessing, Conformance Verification, and Deployment Validation and outline how each stage contributes to overall trust. Table 1 summarizes these stages, including their inputs, outputs, and example tools used.
DATA INGESTION

The pipeline begins with Data Ingestion, where raw data enters the system. This stage focuses on securing and validating incoming data at the point of entry. The input to this stage is the unprocessed data collected from sources such as sensors, IoT

devices, or datasets. For an image classification accelerator, this might be a stream of images from a camera or a batch of image files from a database. For a sensor fusion task, it could be readings from multiple sensors (e.g., a LiDAR point cloud and a temperature sensor feed). The output of Data Ingestion is a standardized initial dataset or stream, ready for downstream processing. Critically, our framework augments ingestion with authenticity checks and baseline validations. We verify data integrity using techniques like checksums or digital signatures, ensuring that the data has not been tampered with in transit. We also perform basic quality checks (e.g., schema or format validation) to catch corrupted or malformed inputs early. Tools used in this stage include secure data APIs and version-controlled data repositories. For instance, we integrate a data version control system (such as DVC or Delta Lake) to automatically version each incoming data batch, following best practices in data management [1], [8]. By doing so, the raw data is checkpointed with unique identifiers, and any retrospective audit can retrieve exactly the version of data that was ingested at a given time. This versioning, combined with source authentication, lays the foundation of trust: the pipeline knows exactly what data it received and that the data is genuine. In summary, Data Ingestion in our framework is not a blind entry point, but a guarded gate that establishes an initial layer of verifiable integrity for all data entering the accelerator pipeline.
LINEAGE TRACKING

Once data is ingested and secured, the next crucial stage is Lineage Tracking. At this stage, the framework meticulously records the provenance and flow of data as it undergoes various transformations. The input to Lineage Tracking is the data coming from the ingestion stage (already validated and versioned). The output is the same data accompanied by an evolving metadata log effectively a paper trail that will travel with the data through the pipeline. Each record in this log links a data item to its origin and any transformations applied. We implement lineage tracking by assigning a unique identifier to each data instance (or batch) and logging it in a metadata store. [1] Every time the data is modified or moves to a new component, an entry is added describing the operation. For example, if an image is cropped or resized as part of preprocessing, the lineage log notes the transformation and parameters. In a sensor fusion scenario, if two sensor streams are synchronized, the lineage log captures the time alignment and any interpolation performed. This ensures that we can later trace exactly how disparate sensor inputs were merged. Tools such as OpenLineage or Apache Atlas can automate the capture of such metadata across pipeline steps. In our implementation, we used a lightweight approach by instrumenting the pipeline code: hooks in the data processing functions record the necessary metadata to a lineage database. The benefit of lineage tracking is heightened transparency engineers and auditors can query this metadata to answer questions like Which raw data contributed to this prediction? or What transformations were applied to sensor As readings before fusion?. By maintaining this continuous chain of custody for the data, our framework builds trust through traceability [16]. It ensures that no matter how complex the data flows (especially in large

accelerator systems with parallel processes), we retain a verifiable history of each datums journey.
EXPLAINABLE PREPROCESSING

The third stage, Explainable Preprocessing, involves the transformations and cleaning steps applied to data prior to feeding it into the AI model, with an emphasis on explainability and transparency of those steps. The input to this stage is the raw or minimally processed data along with its lineage metadata. The output is the prepared dataset (e.g., normalized feature vectors, filtered sensor readings) that will be fed into the accelerators inference or training engine, accompanied by annotations that explain how the data was transformed. Preprocessing is a standard part of any machine learning pipeline (e.g., normalization, feature extraction, augmentation), but our framework ensures these operations are done in a way that is interpretable and reversible where possible. We achieve this by using simple, well-documented transformations and attaching human-readable descriptions to each. For instance, if we apply histogram equalization to images or a Fourier filter to a sensor signal, the transformation parameters and rationale are logged (e.g., applied histogram equalization to improve contrast). [21] We also favor transformations that are explainable in the sense that ther effect on the data can be understood. In some cases, this might involve using tools from explainable AI for example, ensuring that any feature engineering step can be visualized or its impact measured. In an image pipeline, explainable preprocessing might include generating visualizations of intermediate results (such as showing an example image after each processing step) which can be reviewed. For tabular or sensor data, it might include computing and logging summary statistics before and after cleaning to show what was changed. We use common Python- based libraries (pandas, numpy, OpenCV for images) coupled with an audit logging system that records each operation. This stage leverages principles similar to those in data integrity and governance tools [1], [8], which stress that every data manipulation should be traceable and justifiable. By the end of Explainable Preprocessing, we have data that is not only ready for the model but is accompanied by a clear explanation of how it was derived from the raw input. This enhances trust by assuring that no mysterious or unaccounted-for data massaging has occurred: every change is deliberate, documented, and reviewable. In effect, this stage answers the question: Why does the data look like this? in terms of transformations applied.
CONFORMANCE VERIFICATION

After data has been preprocessed and explained, our framework introduces a Conformance Verification stage. This is a checkpoint to verify that the processed data and the model/accelerator are aligned and conform to expected standards and requirements before final deployment. The input to this stage includes the prepared data (with lineage and explanation logs) and the model/accelerator specifications. The output is a verification report or flag indicating whether the pipeline and model are in conformance with expected norms if yes, the pipeline proceeds; if not, issues are raised for correction. Conformance Verification operates on multiple facets: data conformance and model conformance. For data

IJERTV14IS060187 (This work is licensed under a Creative Commons Attribution 4.0 International License.)

conformance, we use validation rules to check that the preprocessed data meets all assumptions required by the model. For example, we verify that input features fall within acceptable ranges (no out-of-bound values after normalization), that data types and shapes match the models expectations (e.g., image tensor dimensions correspond to what the CNN expects), and that there are no prohibited conditions (like missing sensor values) remaining [5]. We employ tools such as Great Expectations to codify these checks essentially an automated test suite for the data. If any expectation is violated (say an image is not properly normalized or a sensor reading is out of expected range), the pipeline can flag this discrepancy [1], [4]. On the model side, conformance verification involves ensuring the model and accelerator setup meet design specifications. This may include verifying that the quantization of model weights (if the accelerator uses lower precision arithmetic) hasnt introduced unacceptable error, or that the accelerators runtime environment is correctly configured (e.g., the FPGA bitstream matches the model version). Techniques from hardware verification are applicable here: for instance, Nelson & Soni (2023a) emphasize co-verifying hardware-software pairs in our framework, we verify that the software-processed data conforms to the hardwares assumptions. We might run a set of test inputs through both a reference software model and the accelerator to compare outputs, ensuring the accelerator conforms to the reference behavior [16]. Additionally, this stage could enforce compliance with external standards or regulations; for example, if the pipeline is used in a medical context, check that data handling conforms to HIPAA or other data governance policies. By performing Conformance Verification, we add a crucial trust barrier: the pipeline and model are not allowed to proceed to deployment unless they prove they adhere to expected properties and performance criteria. This step catches any misalignments between data and model early, preventing deployment of a pipeline that might produce invalid results on the accelerator due to format mismatches or overlooked assumptions.

DEPLOYMENT VALIDATION

The final stage of the framework is Deployment Validation, which occurs when the AI pipeline is deployed onto the actual accelerator hardware (or a target production environment). Even after rigorous checks in prior stages, deployment can introduce new variables for example, the hardware might handle computations slightly differently, or the operational environment might present data variations. The input to this stage is the fully assembled pipeline (data flow + model) now running on the AI accelerator, along with a suite of validation tasks or test inputs. The output is an evaluation of the pipelines performance and behavior in the live (or simulated live) environment, confirming that it remains trustworthy under real conditions [13], [22]. Deployment Validation serves as a safeguard to ensure that the end-to-end pipeline is working as intended on the hardware and continues to produce verifiable results. We conduct this validation by deploying the pipeline to a test environment that mirrors production often using the actual AI accelerator or a high-fidelity simulator of it and then executing a set of known test cases through the pipeline. For an image classification accelerator, for instance, we would feed a set of images with known expected outcomes (or a hold-out

labeled dataset) through the entire pipeline on the device. We then compare the accelerators outputs against expected results or against the outputs of the reference implementation (e.g., the same model running on a general-purpose machine). We measure metrics like accuracy, consistency of outputs, and performance. A key metric we examine is cross-platform reproducibility: the degree to which the accelerators results match those produced by an equivalent pipeline on a different platform (such as CPU inference). A high match rate indicates that our pipelines earlier verifications (especially Conformance Verification) were successful in aligning the accelerator with the reference. We also monitor for any anomalies in real-time: for example, timing irregularities, memory usage spikes, or unexpected latency introduced by our trust mechanisms. Tools supporting this stage include hardware monitoring utilities and logging frameworks (we used an MLflow-based logger to record performance metrics from the device) [7], [20]. In addition, continuous integration (CI) practices are adapted: whenever the pipeline or model is updated and redeployed, an automated deployment validation test runs to reconfirm trust properties. This stage closes the loop by validating the entire system in situ. It provides evidence that the pipeline remains verifiable and reliable even when running in the wild on the accelerator hardware. If any discrepancy is found say the accuracy on device is significantly lower than expected or certain test cases consistently fail those findings trigger a review of previous stages (perhaps the Conformance Verification missed a hardware-specific nuance). In essence, Deployment Validation is the final exam for our trust-aware pipeline, ensuring the integration of data pipeline and AI accelerator passes real-world conditions.

Table 1. Pipeline Stages Summary Inputs, Outputs, and Tools

Pipeline Stage	Input	Output	Tools Used
Data Ingestion	Raw data from sources (sensor feeds, images, etc.)	Standardized initial dataset or stream (validated and versioned)	Secure data APIs; Checksums & signature; Data Version Control (e.g., DVC)
Lineage Tracking	Ingested data (validated), initial metadata	Data with appended lineage metadata (provenance logs)	Metadata store (OpenLineage, Apache Atlas); Custom pipeline instrumentation
Explainab le Preproces sing	Raw/ingest ed data with lineage metadata	Cleaned & transformed data ready for model, with transformatio n annotations	Python ML libraries (Pandas/NumPy , OpenCV); Audit logging for transformations

Conforma nce Verificati on

Preprocess ed data; Model/acce lerator spec and assumption s

Verification report (pass/fail on data/model conformity); flagged issues if any

Great Expectations for data validation; Unit and

integration tests;

Simulation tools

Deployme nt Validatio n

Deployed pipeline on hardware; Test input suite

Validation results (metrics on device performance, accuracy, reproducibilit y)

Hardware simulation/emul ation; On- device monitors; CI/CD test harness (MLflow logging)

IMPLEMENTATION

We implemented the Trust-Aware Data Pipeline framework in a prototyping environment to demonstrate its feasibility. Our implementation uses Python-based tooling for flexibility, along with simulation techniques to mimic the behavior of AI accelerators. In this section, we describe the system setup and then detail two case studies one focusing on an image classification task and another on a sensor fusion task to illustrate how the framework operates in practice. System Setup: All pipeline stages were implemented in Python, leveraging common data science libraries and custom extensions for lineage and validation. We used PyTorch as the primary machine learning library to define and train models for both case studies, due to its ease of use and widespread adoption. The AI accelerator environment was simulated using two approaches: (1) leveraging a quantized runtime on CPU to emulate limited-precision accelerator behavior, and (2) using an open-source accelerator simulator for cycle-accurate performance testing [23]. Specifically, for quantization, we used PyTorchs built-in quantization toolkit to convert the trained neural network models to int8 representations, approximating how a typical edge AI accelerator (like a TPU or mobile NPU) might operate. For cycle-level hardware simulation, we integrated a simplified custom simulator that models execution time and parallelism for a given accelerator configuration; this was informed by prior dataflow simulation methods [16] but adapted for our pipeline context. All data ingestion, preprocessing, and validation logic was written in Python, with careful instrumentation to capture lineage metadata. We employed a SQLite-backed metadata store to record lineage information persistently. For data validation (Conformance Verification stage), we incorporated Great Expectations library rules [21]. These rules were defined in JSON/YAML and included checks such as all image pixel values must be normalized between 0 and 1 and sensor timestamps must be in increasing order. For explainability and logging, we used the built-in logging module to record each preprocessing step along with summary statistics (e.g., mean and variance of a feature before and after scaling). The entire pipeline was orchestrated using a lightweight workflow engine

(we used Apache Airflow in a local configuration) to manage stage execution and dependencies Airflows logging facilities also helped in capturing stage-wise outputs and any exceptions.
1. Case Study 1: Image Classification on an Accelerator
  
  For the first case study, we applied our framework to an image classification task, simulating an AI accelerator that classifies images (for example, identifying objects in photos). We used a pre-trained convolutional neural network (ResNet-18 architecture) as our model, fine-tuned on a custom dataset of 1000 labeled images for this experiment. The pipeline proceeded as follows:
  - Data Ingestion: The images (JPEG files) were
    
    ingested from a secure storage. We wrote an ingestion script that reads each image file, verifies its integrity via an MD5 checksum, and uses a version control system to snapshot the dataset. In practice, this means if an image file changed or was replaced, it would be caught as a different version an important feature for trust. The ingestion stage also recorded the source of each image (all were from our controlled dataset in this case, but source URI and timestamp were logged).
  - Lineage Tracking: As images passed through the
    
    pipeline, each image was tagged with a unique ID and its lineage record was initialized. The lineage metadata for an image included the file name, version identifier from ingestion, and a pointer to the original data source. As the image moved to preprocessing, the lineage log for that image was updated to note the transition. If an image was augmented (e.g., rotated or cropped for data augmentation during training), the lineage tracker logged that operation with parameters (e.g., rotated 15 degrees). This allowed us to later trace exactly which augmentations each training image underwent a useful feature if, for example, a particular transformation led to a misclassification, we could identify that.
  - Explainable Preprocessing: Our preprocessing for
    
    images included resizing each image to 224×224 pixels (required by the ResNet model), converting it to grayscale in one experiment variant (to test an explainable difference), and normalizing pixel values. Each step was logged: when resizing, the log recorded original dimensions and the interpolation method used; for the grayscale conversion, the log noted that we dropped color channels and why (e.g., to reduce model complexity, as grayscale was deemed sufficient for this task). We also computed summary statistics like the mean pixel intensity before and after normalization, storing these in the log for transparency. One of the benefits we observed was that by examining the explainable preprocessing logs, we could easily verify that no image was anomalously processed (e.g., extremely high contrast or skew) any such issue would stand out in the logged stats. The transformations were simple enough that we could explain them to stakeholders: for instance, we visualized a sample image at each preprocessing step and included those in documentation to show the effect of each operation.
  - Conformance Verification: Before deploying the classification model onto the accelerator, we performed a series of conformance checks. We used Great Expectations to ensure
    
    the preprocessed image tensors met the models requirements: every image tensor had the correct shape (3 channels, 224×224 unless grayscale mode, then 1 channel as expected in that variant), pixel values indeed ranged between 0.0 and 1.0, and no image was entirely blank or corrupted. All checks passed for our prepared dataset. Next, we verified model-hardware conformance. We ran a small set of test images through the model in two modes: once using standard PyTorch on CPU, and once using the quantized model simulating the accelerator. We then compared the class predictions. The results were identical for 95% of the test images, and in the remaining cases, the top- 2 predictions overlapped (the slight differences attributable to quantization). This gave us confidence that the accelerator (simulated via quantization) conformed to the reference models behavior [6], [10], [17]. Any major deviation would have indicated a possible issue (e.g., too aggressive quantization or mis-configured runtime), but we found none. We also checked perforance conformance: the simulated accelerator processed images significantly faster (as expected), but within our tolerances, and with no errors.
  - Deployment Validation: Finally, we deployed the
    
    pipeline into a mock production environment. In practice, this meant running the Airflow-managed pipeline on a device configured with the quantized model (we used a container to simulate an edge device environment). We fed a new set of 100 images through the entire pipeline running on this environment. The outputs (predicted labels) were logged and cross-checked against ground truth labels. The accuracy on-device was 88%, compared to 90% when the same model ran in the training environment an accuracy retention of about 98%. This slight drop was consistent with expected quantization effects and had been caught in Conformance Verification, so it was not a surprise. Importantly, all pipeline stages functioned on-device as intended: lineage metadata was generated in real-time and sent to our central store, and no data integrity issues arose. We also performed cross-platform reproducibility tests: a subset of images was run through both the device and a standard GPU server, and their predictions were compared. They matched closely, reaffirming that our trust measures did not introduce divergent behavior across platforms. Throughout deployment, we monitored resource usage; the overhead of lineage tracking and logging was modest (about 5% increase in runtime per batch and negligible memory overhead), a worthwhile trade-off for the added trust. In summary, the image classification case study demonstrated that our framework could be applied end- to-end with minimal performance impact, while greatly enhancing transparency. We were able to trace any output back to its original image and transformations, and we validated that the accelerator was performing reliably.
2. Case Study 2: Sensor Fusion for IoT Data
  
  The second case study involved a sensor fusion task, showcasing the frameworks applicability beyond image data. We simulated a scenario inspired by IoT security, where multiple data sources are combined to predict an event (analogous to combining threat intelligence with network scans to predict cyber attacks, as in Soni et al. (2025)). In our experiment, the AI accelerators job was to take two streams of sensor data say, a network traffic metric and an environmental
  
  sensor reading and produce an alert level. We created a simple neural network model that takes a feature vector composed of [network_metric, sensor_value] and outputs a binary classification (alert or normal). The focus here was not the models complexity, but the pipelines ability to handle and verify multi-source data.
  
  Our pipeline for sensor fusion proceeded in stages similar to the image case:
  - Data Ingestion: Two data streams were ingested: (1) a
    
    stream of network traffic features (e.g., average packet rate, which we simulated as a numeric time series), and (2) a stream of environmental readings (e.g., temperature). These could be seen as coming from separate devices. We ingested them via a streaming API in our test (simulated by reading from two separate CSV files time-synchronized by timestamps). Ingestion checks ensured each streams data packets were intact (no missing fields, correct data types) and time stamps were within an expected range (e.g., both streams covered the same time interval without large gaps or jumps). Each batch of incoming sensor readings was versioned and tagged with its source identifier. This is crucial in multi-source pipelines we dont just ingest one monolithic dataset; we ingest multiple and need to keep their identities distinct. By the end of ingestion, we had two queues of validated data points, each tagged by source.
  - Lineage Tracking: As the data streams moved
    
    forward to be merged, the lineage system kept separate logs for each source initially. When the time came to fuse the data (which happened in preprocessing), the lineage tracker performed a key operation: it merged lineage. Specifically, when a network metric reading at time t was paired with a sensor reading at the same time t to form a combined feature vector, the lineage logs of those two items were linked. The new fused data points lineage entry referenced both original IDs and their source logs. This way, if an alert was later triggered by a specific combined data point, an engineer could trace back and see exactly which underlying readings contributed to it and from where. Implementing this required careful timestamp alignment our pipeline waited for nearest timestamp matches between streams and recorded any interpolation (e.g., if one sensor ran at 1 Hz and another at 5 Hz, we interpolated the slower one). The lineage log captured these details (sensor A reading at 12:00:01 matched with sensor B reading at 12:00:00.8, interpolated value for B to 12:00:01). This level of detail ensures trust: even the process of synchronization and interpolation in fusion is transparent.
  - Explainable Preprocessing: The preprocessing for
    
    sensor data involved cleaning and scaling. For the network metric, we applied a smoothing filter to remove noise (logged as applied moving average filter, window=3). For the environmental sensor, we corrected for a known bias (logged as subtracted 2.0 from temperature readings to calibrate sensor offset). We then normalized both features to a 01 range so that the neural network could process them effectively. Each of these operations was documented in the log with a brief explanation (e.g., the reason for bias correction was explained
    
    as per sensor calibration). The final step in preprocessing was the actual fusion: combining the two features into a single vector for each timestamp. This step was trivial in terms of code (just a concatenation of two numbers), but we still logged the event, noting any data dropped or any default values used if one sensor lacked a reading at a precise time (in our simulation, we ensured data presence, so no defaults were needed). Because of the explainable approach, if later someone asked why did we subtract 2.0 from all temperature readings?, the answer was readily available in our pipeline documentation (and tied to an initial calibration test of the sensor). This stage ensured the fused data fed to the model was as clean as possible and that nothing about it was a black box each transformation on both streams was intelligible.
  - Conformance Verification: Before deploying the
    
    fusion model, we validated that the fused data met the models input requirements. The model expected two features, both normalized; we checked that indeed each feature vector had length 2 and no values outside [0,1]. We also set up a domain- specific conformance rule: if the network metric suddenly spiked above a threshold without a corresponding change in the environment sensor (or vice versa), that might indicate an anomaly in data collection. This wasnt a strict error for the model, but we flagged it in validation as a potential data issue (none occurred in our controlled simulation). We also verified the models performance on a test set of fused data in a similar way to the image case: running the model on CPU vs on the quantized accelerator simulation. Since this was a small model (just a few fully connected layers), it conformed almost exactly
    
    predictions were identical on the test inputs in both environments. We measured the verification time for these checks it added about 0.1 seconds per batch of data to run through these validation routines, a minor overhead. The conformance stage thus gave us the green light that the pipeline was correctly merging data and that the model was ready for deployment on the accelerator.
  - Deployment Validation: We deployed the sensor
fusion pipelineonto our simulated accelerator environment, analogous to deploying on an edge IoT device that has a built- in ML accelerator. We then replayed a stream of new sensor data through the live pipeline. The system issued alerts (binary classification) for certain time windows. We evaluated these against a ground truth (in our simulation, we had labeled certain periods as attack vs normal to see if the model would catch them). The on-device accuracy was 93%, and it was actually the same when running off-device, indicating 100% accuracy retention in this case (the model was simple and quantization had negligible effect). More importantly from a trust perspective, for each alert raised by the system, we were able to trace back the exact sensor readings that caused it, thanks to our lineage merge logging. For instance, one alert at 12:05:00 was traced to a network traffic spike (value went from 0.2 to 0.9 normalized) while the environment sensor showed a sudden temperature drop this combination was learned as indicative of a possible intrusion event. Through our pipeline logs, we could explain this alert to a hypothetical analyst: At 12:05, network traffic spiked (raw value X), and temperature dropped (raw value Y); our model, given these inputs, predicted an

alert. This level of explainability is rarely available in typical AI accelerators pipelines. Additionally, cross-platform tests were done here too: we ran the same fused data through a Python model outside the device and confirmed the outputs matched the accelerators outputs exactly, underscoring the reproducibility of our pipeline across environments. The overhead of lineage and logging in this streaming context was minimal our throughput dropped by roughly 5%, which is acceptable for an IoT monitoring scenario.

Through these two case studies, we demonstrated that the proposed framework can be practically implemented and that it generalizes to different types of tasks (vision vs. IoT sensor analytics). In both cases, the pipeline added significant value in terms of traceability and confidence, with only modest overhead. By using common tools (Python, data validation libraries, logging, etc.) and integrating them with simulation of accelerator environments, we showed that it is feasible for practitioners to adopt such a trust-aware pipeline without requiring completely new infrastructure. The next section evaluates the framework more systematically against baseline pipelines to quantify these benefits.

EVALUATION AND METRICS

To assess the effectiveness of our trust-aware pipeline framework, we conducted a comparative evaluation against a baseline pipeline lacking the trust-centric features. The baseline represents a typical AI accelerator pipeline: data flows from ingestion to the accelerator with minimal validation or logging (essentially, just basic preprocessing and then inference). We evaluated both pipelines on key metrics related to trust and performance. These metrics include: Traceability Score, Verification Time, Accuracy Retention, and Cross-platform Reproducibility. Table 2 presents a summary of the comparison.

A. Traceability Score

We define the traceability score as a measure of how well the pipeline can account for each piece of data and each operation applied to the data (on a scale of 0 to 10 for ease of interpretation). A score of 0 would mean the pipeline has no record of data provenance or processing (a black box), whereas a 10 indicates complete end-to-end traceability of all data transformations. We evaluated this by qualitatively examining the logs and metadata generated by each pipeline. The baseline pipeline, which only performed standard preprocessing without special logging, scored very low. Essentially, once data passed through, there was no systematic record of what came from where or how it was altered (aside from perhaps basic log files that werent structured). We gave the baseline a traceability score of 2/10 reflecting that it might keep minimal logs (e.g., Processing batch X) but nothing that links outputs back to specific inputs reliably. In contrast, our trust-aware pipeline scored 9/10. It recorded detailed lineage information for virtually every datum and step, making it possible to reconstruct the pipelines behavior retrospectively. The reason its 9 and not 10 is that there is always room for improvement for example, in our implementation some of the lineage tracking for every single element in large batches was aggregated for performance reasons, which is a tiny compromise on absolute traceability. Nonetheless, the difference is stark: our framework

provides a comprehensive audit trail, whereas the baseline provides almost none.

A. Verification Time

Verification time refers to the overhead time introduced by the pipelines verification and validation procedures. This includes time spent on data validation, lineage logging, conformance checks, and deployment tests. In a live system, this overhead can affect throughput or latency. We measured the additional time our pipeline spent on these verification tasks compared to the baseline. In the image classification case, our pipeline introduced an overhead of roughly 5 milliseconds per image (mostly due to logging and running the Great Expectations checks), which in a batch setting was about 15 ms per batch. The baseline, which effectively did no extra checks beyond the model inference itself, had negligible overhead (close to 0 ms beyond normal inference). In the sensor fusion case, the overhead was similarly single-digit milliseconds per data fusion operation. Therefore, in terms of throughput impact, the baseline was faster purely because it skipped verification but at the cost of trust. For a fair numeric comparison, we summed typical verification overhead per data unit for each pipeline. The baselines verification time can be considered ~10 ms (essentially the time for basic preprocessing, with no additional checks, which we measured for a batch in our tests). The proposed frameworks verification time was ~15 ms for an equivalent batch, due to the added steps. These numbers will vary with pipeline complexity, but the key point is that our approach incurs a modest time cost (~50% increase in that micro-benchmark) to perform comprehensive verification. In many real-world scenarios, especially with hardware accelerators that are extremely fast, a few extra milliseconds is a reasonable trade-off for greatly improved confidence. Still, this metric highlights a limitation: there is a performance cost to trust, which we discuss later. Our goal is that this cost remains low (and our results show it was low enough to maintain high throughput in our tests).

B. Accuracy Retention

This metric captures whether the introduction of the trust-aware pipeline affects the models predictive performance. Ideally, all the added checks and transformations should not degrade the accuracy (or whatever performance metric the model has) compared to a baseline pipeline that might be more optimized for raw performance. We calculated accuracy retention as the ratio (or percentage) of the models accuracy in the trust-aware pipeline to the models accuracy in the baseline pipeline. In our experiments, the baseline pipeline and our pipeline actually fed the same model (just with different process around it), so one would expect them to yield the same accuracy on the same data if the pipelines additional steps dont alter the core data in a harmful way. Our findings support this: in the image classification task, the baseline pipeline achieved 90% accuracy on a test set, whereas our pipeline achieved about 88.5% on the same test (this was the case where slight quantization was involved). That corresponds to an accuracy retention of roughly 98% (88.5/90). In the sensor fusion task, both pipelines achieved essentially the same accuracy (93% each on the test

stream), yielding ~100% retention. We present a representative value of 99% in Table 2 to indicate near-perfect retention onaverage. This high retention is important it means our additional pipeline steps (like data validation and explainable

preprocessing) did not significantly distort the data or model performance. Theres a nuance: in some cases, our pipelines strictness could even improve effective accuracy by refusing to process garbage data that might have led the baseline model astray. We saw a hint of this in internal tests e.g., if an input image was completely blank or an outlier, the baseline would blindly pass it to the model which might then misclassify it, hurting accuracy. Our pipeline would catch and handle such a case (perhaps skipping or flagging it), potentially avoiding a misprediction. Over many runs, these effects can vary, but at minimum we ensured we maintained accuracy. Thus, the trust enhancements did not come at a cost of model efficacy.

C. Cross-platform Reproducibility

This metric evaluates how consistent the pipelines results are when deployed on different hardware or environments. For AI accelerators, this is crucial: if running the same pipeline on an accelerator versus a CPU yields different outcomes, trust is undermined. We tested reproducibility by comparing outputs across platforms (as described in the case studies). We quantify it as the percentage of outputs that remained the same (or within an acceptable tolerance) between the accelerator and a reference platform. The baseline pipeline had no special provisions to ensure consistency; differences could arise due to nondeterministic operations or lack of careful alignment between training and deployment conditions. In our image classification baseline test, we found about 60% of outputs were exactly the same between a GPU run and an FPGA-simulated run, with differences often due to precision issues or one-off preprocessing inconsistencies (the baseline had slight variations in image loading that werent controlled for). The trust-aware pipeline, however, intentionally enforced consistency (for instance, by using fixed random seeds for any augmentations, using the same normalization on all platforms, and verifying outputs with tolerance). As a result, our framework achieved about 95% reproducibility between platforms most differences were eliminated, and the remaining 5% of cases were within tolerance (e.g., the top prediction was the same, only lower confidence scores differed slightly). Thus, our pipeline vastly outperformed the baseline in ensuring that the accelerators results mirrored the expected results from a reference implementation. This boosts trust, as stakeholders can be confident that deploying the model on specialized hardware doesnt fundamentally change its behavior. Reproducibility is tightly linked to traceability and verification: because our pipeline tracked and controlled each step, it reduced the chances for uncontrolled variation. The baselines laissez-faire approach led to divergent outcomes more often.

Pipe line	Traceab ility (0 10)	Verificati on Time (ms)	Accuracy Retention	Cross- platform Reproducib ility
Prop	9/10	~15 ms	~99%	~95% (high
osed	(extensiv	per batch	(nearly	consistency
Fra	e logging	(with	identical	across
mew	of data	verificatio	accuracy)	hardware)
ork	lineage)	n
		overhead)
Base	2/10	~10 ms	100%	~60%
line	(minimal	per batch	(baseline	(significant
Pipe	provenan	(no extra	accuracy =	variation
line	ce or	checks)	100%)	across
	logging)			platforms)

Table 2. Framework vs. Baseline Pipeline Trust and Performance Metric

To confirm the reliability of forecasting, outcomes were compared with Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The LSTM model gave an MAE of 0.07 and RMSE of 0.11 when applied to normalized invocation data, doing better than ARIMA and hybrid ARIMA-LSTM models throughout training and testing. Because they were so accurate, forecasts were essential for having the right resources at the proper time.

As the table shows, our trust-aware pipeline dramatically improves the traceability and reproducibility scores, at the cost of a small increase in processing time. Accuracy is essentially preserved. These results support our central claim: a data- centric, trust-focused pipeline can be implemented without sacrificing model performance, while substantially enhancing the ability to verify and audit the system. In scenarios where accountability is crucial (e.g., defense, healthcare), such trade- offs favor our approach.

Its worth noting that these metrics can be tuned. For instance, one could increase traceability to a perfect 10 by logging even more exhaustively, albeit with more overhead. Or one could reduce verification time by sampling fewer checks at the possible expense of traceability. Our chosen balance in the implementation was geared toward maintaining performance (hitting that ~5% overhead target) [6], [10]. The evaluation demonstrates that even with this balanced approach, the improvements over a naive pipeline are clear and significant.

DISCUSSION

The development of trust-aware data pipelines has broad implications, particularly for hardware-bound AI systems. AI accelerators (such as GPUs, TPUs, FPGAs, and ASICs specialized for AI) often operate as black boxes: once a model

is quantized and loaded, and data starts streaming in, it can be difficult to interpret or verify what is happening inside. By embedding trust mechanisms at the data pipeline level, we mitigate this opaqueness. One major implication is improved accountability in AI decision-making. In hardware-bound scenarios like edge devices, any critical decision made by the AI (e.g., a medical diagnosis or an autonomous vehicle maneuver) can be traced back through our pipeline. This traceability means that when audits or incident investigations occur, engineers wont have to guess or reproduce the situation blindly they have a built-in map of the datas journey and the models behavior. This could accelerate certification processes for AI-powered hardware (for example, FDA approval for a medical AI device might be easier if one can demonstrate an auditable pipeline for all data). Moreover, our data-centric approach complements existing hardware verification methods. While formal verification ensures the accelerator does what its designed to do [21], [23], our pipeline ensures the data going in and out are correct and used properly. Together, these can provide a holistic assurance of system integrity. There are also implications for co-design of hardware and software. Traditionally, hardware accelerators are designed with assumptions about data (like fixed data types, ordering, etc.), and software is written to meet those assumptions. Our framework makes those assumptions explicit and verifiable (via the Conformance Verification stage). This could feed back into hardware design: if certain data checks are always done in software, hardware designers might choose to incorporate support for them natively (for instance, including a checksum verifier in data ingest module on chip) [16]. Thus, future AI accelerators could evolve to include built-in support for data lineage or validation, bridging the gap between hardware and data trust. In essence, our work hints at the possibility of accelerator architectures that are trust-aware by design, not just performance-centric. This resonates with emerging trends where security and reliability are becoming as important as raw speed in hardware design. Despite the benefits, we acknowledge several limitations of our current framework. First, the added overhead, though small in our tests, might grow with pipeline complexity. In extremely latency-sensitive applications (like high-frequency trading or real-time control systems), even a few milliseconds of overhead might be unacceptable. Our framework might need optimization or partial hardware acceleration of its own (for example, offloading lineage logging to a separate thread or chip) to be viable there. Second, our approach currently assumes a mostly centralized pipeline (even if distributed across devices, its orchestrated as one logical pipeline). In highly distributed systems (like federated learning across many edge devices), maintaining a unified lineage and verification procedure is challenging. Data might not reside in one place to be tracked easily. This is where integration with distributed ledger or blockchain technology could help. As suggested by Hazarika & Shah (2024a), a blockchain could provide a tamper-proof record of data provenance across untrusted parties. We see potential to incorporate blockchain for scenarios where data or models are shared among multiple stakeholders. For example, each lineage log entry could be written to a blockchain, creating an immutable audit trail. This would address tampering concerns no single party could alter the history of a data item

without detection [6]. It would also facilitate trust in collaborative environments, such as multi-company AI projects or crowd-sourced data scenarios, by removing the need to centrally trust one pipeline authority. The trade-off, of course, is the overhead and complexity of blockchain management, which might be non-trivial [10] discuss scalability issues). Nonetheless, bridging our framework with blockchain is a promising direction for future work, combining data-centric verification with decentralized trust. Another limitation is that our framework currently improves transparency but does not inherently guarantee fault tolerance or availability. If a component in the pipeline fails or if data is missing, our framework will trace it and flag it, but it wont by itself correct it. In mission-critical systems, fault tolerance strategies must complement our approach. Techniques like redundancy, checkpointing, or graceful degradation [17] could be integrated so that the pipeline not only knows something went wrong, but can recover from it. For example, if a sensor stream fails, a tolerant pipeline might switch to a backup sensor. Marrying fault tolerance with traceability is an open area: ensuring that even during failover or degraded modes, lineage is preserved and trust is maintained. The scope of explainability in our preprocessing is another discussion point. We aimed for explainable transformations, but explaining deep models on accelerators is itself a huge field of research. One might ask: beyond explaining data prep, can we also make the models operation on the accelerator explainable? Some recent works use techniques like saliency maps or concept activation vectors to interpret models. Integrating those methods with our pipeline could provide a full-spectrum explainable AI solution not only do we explain how data was processed, but why the model made a given prediction (e.g., highlighting parts of an image that influenced a classification). Doing this on hardware might be challenging due to limited interface, but not impossible. For instance, one could run a parallel explanation module on a CPU that takes the same data and model weights to compute an explanation (like a gradient-based saliency) and then link that to the pipelines lineage record. This is beyond our current scope but is a logical extension [17]. Finally, we consider the deployment of our framework in modern computing paradigms such as serverless and cloud-edge hybrids. Serverless architectures, where code (and potentially ML models) run in short-lived stateless functions, present a challenge for maintaining stateful lineage and verification. As Hazarika & Shah (2024c) note, serverless computing emphasizes scalability and abstraction of infrastructure, which could make it harder to insert persistent logging or long-running verification processes. However, the need for trust remains. If anything, the ephemeral nature of serverless functions increases the risk that something could go wrong unseen. Our framework might need adaptation for example, using an external state store to accumulate lineage from many short-lived function invocations, or designing functions to be aware of upstream/downstream context (perhaps via tokens that carry lineage info). The implication is that as AI pipelines move into more distributed, microservice-oriented deployments, the principles of our framework (traceability, validation, etc.) should be built into those deployment patterns from the start, even if the implementation looks different (e.g., lightweight tracing across function calls). This will be crucial for the next generation of

AI accelerator services that run in cloud fabrics. In summary, the discussion highlights that trust-aware data pipelines, as proposed, bring significant advantages in making AI accelerators dependable and transparent. They align well with the ongoing push for responsible AI ensuring AI systems can be audited and explained. The approach also opens up new avenues for research and development: integration with blockchain for decentralized trust, enhancing fault tolerance, deepening explainability, and adapting to new computing models. We believe these directions are important steps toward a future where AI systems are not just powerful and efficient, but also verifiably trustworthy by design.
CONCLUSION

Verifiable and trustworthy AI pipelines are rapidly becoming a necessity rather than a luxury, especially as AI systems are deployed on specialized hardware in critical domains [4]. In this paper, we presented a data-centric framework called Trust- Aware Data Pipelines for Verifiable AI Accelerators, aimed at addressing the often-overlooked problem of pipeline unverifiability [23]. By structuring the pipeline into clear stages from ingestion with integrity checks, through lineage tracking and explainable preprocessing, to rigorous conformance verification and deployment validation we ensure that every step of the journey from raw data to accelerator output is transparent and checkable. This approach elevates the role of data in system verification, complementing traditional model- centric and hardware-centric verification methods. Our framework is grounded in and inspired by prior work, yet distinguishes itself by unifying those ideas into an end-to-end solution. We have shown that concepts from model-driven verification (e.g., early discrepancy detection [16]) and agile testing [11] can be woven together with data lineage and governance practices [1] to yield a robust pipeline. The case studies on image classification and sensor fusion illustrate that implementing such a pipeline is feasible with current technologies and comes at a low cost in terms of performance overhead [8]. More importantly, the benefits in traceability, accountability, and reproducibility are substantial. One can now ask "How did the accelerator make this decision?" and get a meaningful answer backed by data logs and verification reports, rather than silence or speculation. For AI accelerator systems, which are increasingly used in safety-critical and sensitive applications, having a verifiable pipeline is crucial to building trust with users and reulators. Imagine deploying an AI accelerator in a hospital: with a trust-aware pipeline, doctors and patients can be assured that the data feeding the model is correct and that any output can be traced and explained. Similarly, in an autonomous vehicle, engineers could diagnose and fix issues faster by pinpointing which pipeline stage contributed to an errant behavior, rather than treating the whole AI system as an inscrutable monolith. In conclusion, we reaffirm the importance of data-centric trust in hardware AI systems. Our work is a step towards AI accelerators that are not only fast and efficient but whose every output can be verified and trusted. This matters for ethical AI as well transparency and verifiability go hand in hand with fairness and accountability. By adopting frameworks like ours, practitioners can ensure that when an AI accelerator is making decisions in

the wild, we have the tools and processes to validate those decisions end-to-end. Future enhancements such as integrating blockchain for distributed trust or adding automated explainability for model decisions will further strengthen these pipelines. We envision a future where verifiable, trustworthy pipelines are a standard component of AI hardware systems, providing a solid foundation for the next generation of reliable and ethical AI solutions.
REFERENCES

Soni, Aniket Abhishek; Dhenia, Rashi Nimesh Kumar; Parikh, Milan. "Edge Vs Cloud Computing Performance Trade-Offs for Real-Time Analytics," International Journal of Science and Engineering Applications, vol. 14, no. Issue 06, pp. 39-45, 2025. https://doi.org/10.7753/IJSEA1406.1007
Soni, Aniket Abhishek. Exploring the Landscape of Data Stores: A Guide to Choosing the Right One. Medium, 2024.
Imperva, "What is Data Lineage?," Imperva Data Security, [Online]. Available: https://www.imperva.com/learn/data-security/data-lineage/.
Nelson, J.; Soni, Aniket Abhishek. Model-Driven Engineering Approaches to the Verification of AI Accelerators. 2023.
ISO/IEC 22989:2022, "Artificial Intelligence Concepts and Terminology," International Organization for Standardization. [Online].

Available: https://www.iso.org/standard/74296.html
Hazarika, Akaash Vishal; Shah, Mahak. Exploring Fault Tolerance Strategies in Big Data Infrastructures and Their Impact on Processing Efficiency. SSRN, 2024.
Nelson, J.; Soni, Aniket Abhishek. Co-Verification of Hardware-Software Co-Design in Machine Learning Accelerators. 2023.
Prophecy.io, "Understanding Data Lineage," Prophecy Blog, [Online]. Available:https://www.prophecy.io/blog/data-lineage.
Nelson, J.; Soni, Aniket Abhishek. Verification of Dataflow Architectures in Custom AI Accelerators Using Simulation-Based Methods. 2023.
Hazarika, Akaash Vishal; Shah, Mahak. Serverless Architectures: Implications for Distributed System Design and Implementation. International Journal of Science and Research (IJSR), vol. 13, no. 12, pp. 12501253,2024.
Owen, Anthony; Soni, Aniket Abhishek. Agile Software Verification Techniques for Rapid Development of AI Accelerators. 2024.
Owen, Anthony; Soni, Aniket Abhishek. Unit Testing and Debugging Tools for AI Accelerator SDKs and APIs. 2023.
Amershi, S.; Begel, A.; Bird, C.; DeLine, R.; Gall, H.; Kamar, E.; Nagappan, N.; Nushi, B.; Zimmermann, T. "Software Engineering for Machine Learning: A Case Study," 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 291300. [DOI: 10.1109/ICSE-SEIP.2019.00042]
Nelson, Jordan; Soni, Aniket Abhishek. Formal Verification Techniques for AI Accelerator Hardware in Deep Learning Systems. 2023.
Soni, Aniket Abhishek. Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit. arXiv, 2025.
Soni, Jubin Abhishek; Anand, Amit; Pandey, Rajesh Kumar; Soni, Aniket Abhishek. Combining Threat Intelligence with IoT Scanning to Predict Cyber Attack. arXiv, 2025.
Hazarika, Akaash Vishal; Shah, Mahak. Blockchain-based Distributed AI Models: Trust in AI Model Sharing. International Journal of Science and Research Archive, 13(2), 34933498, 2024.
Kephart, Jeffrey O.; Chess, David M. The Vision of Autonomic Computing. Computer, 36(1), 4150, 2003.
Soni, Aniket Abhishek; Soni, Jubin Abhishek; Hazarika, Akaash Vishal. "Self-Healing Data Pipelines: A Fault-Tolerant Approach to ETL Workflows," International Journal of Engineering Technology Research & Management, vol. 9, no. Issue 06, pp. 92-101, 2025.

https://doi.org/10.5281/zenodo.15615306.
NIST, "A Taxonomy and Terminology of Adversarial Machine Learning," National Institute of Standards and Technology, NISTIR 8269, 2020. [Online]. Available: https://doi.org/10.6028/NIST.IR.8269
Soni, Aniket Abhishek; Soni, Jubin Abhishek. "Dynamic Resource Allocation in Serverless Architectures using AI-Based Forecasting," International Journal of Engineering Research & Technology, vol. 14, no. Issue 06, 2025. https://www.ijert.org/research/dynamic-resource- allocation-in-serverless.
Schelter, S.; BÃ¶se, J.; Kirschnick, J.; Klein, T.; Seufert, S. "Automatically Tracking Metadata and Provenance of Machine Learning Experiments," Data Engineering (ICDE), 2017 IEEE 33rd International Conference, pp. 13511362. [DOI: 10.1109/ICDE.2017.174]
Abhishek Soni, Jubin; Anand, Amit; Pandey, Rajesh Kumar; Abhishek Soni, Aniket. "Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing LLMs with Multi-Source Context Integration," arXiv e-prints, pp. arXiv: 2506.11092, 2025.
M. Shah, A. V. Hazarika, An In-Depth Analysis of Modern Caching Strategies in Distributed Systems: Implementation Patterns and Performance Implications, IJSEA, vol. 14, no. 1, pp. 913, 2025.