HYPER-MI: Hierarchical Provenance-Aware Multi-Instance Graph Analytics for Scalable Distributed Graph Computation

doi:https://doi.org/10.5281/zenodo.19705931

Volume 15, Issue 04 (April 2026)

HYPER-MI: Hierarchical Provenance-Aware Multi-Instance Graph Analytics for Scalable Distributed Graph Computation

DOI : https://doi.org/10.5281/zenodo.19705931

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 9
Authors : Ramachandra Reddy Vangala
Paper ID : IJERTV15IS041548
Volume & Issue : Volume 15, Issue 04 , April – 2026
Published (First Online): 23-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

HYPER-MI: Hierarchical Provenance-Aware Multi-Instance Graph Analytics for Scalable Distributed Graph Computation

Ramachandra Reddy Vangala

Principal Data Architect – Enterprise Cloud Platforms.

FEI Systems, 10980 Grantchester WY, Ste 370, Columbia, MD, USA 21044.

Abstract – Large-scale graph analytics has become fundamental for modern applications including social networks, recommendation systems, cybersecurity, and knowl-edge graph processing. Traditional graph algorithms often operate on a single query instance at a time, resulting in redundant computations when multi-ple queries are evaluated over the same graph. Recent research has explored multi-instance graph computation, vectorized traversal, and distributed graph frameworks to address this inefficiency. Systems such as AutoMI and MITra demonstrate that sharing computation across multiple algorithm instances sig-nificantly improves performance. However, existing frameworks face several challenges including limited scalability across distributed environments, absence of adaptive vectorization strategies, lack of provenance tracking for explainability, and inefficient indexing for repeated shortest-path queries. In this paper, we propose HYPER-MI, a novel hierarchical multi-instance graph analytics framework that integrates adaptive SIMD vectorization, provenance-aware execution, distributed scheduling, and pruned landmark vector indexing. The proposed framework introduces a hierarchical execution model that dynam-ically groups graph queries into vectorized computation layers while preserving provenance metadata for explainability and debugging. We design a set of algo-rithms for hierarchical traversal scheduling, vectorized pruning-based indexing, and distributed execution orchestration. Experimental analysis on large-scale graph datasets demonstrates that HYPER-MI improves query

performance by up to 3.4× compared to existing multi-instance frameworks while maintaining scalable distributed execution. The framework also enables explainable graph analytics through provenance tracking, making it suitable for large-scale data mining and AI- driven graph applications.

1 Introduction

Graph data structures are widely used to represent complex relationships in modern computing systems. Applications ranging from social networks and recommendation engines to biological networks and financial transaction monitoring rely heavily on graph-based analysis techniques [1, 2]. With the growing size of graph datasets, effi-cient graph computation has become a critical research challenge. Traditional graph algorithms typically execute one query instance at a time, which results in redundant computation when multiple queries are processed simultaneously.

Recent research has explored the concept of multi-instance graph algorithms, where multiple instances of the same algorithm are evaluated simultaneously over a shared graph structure. This paradigm enables substantial computational savings by sharing intermediate results among instances [35]. Frameworks such as AutoMI automati-cally transform vertex-centric algorithms into vectorized multi-instance versions, while MITra provides a programming model for expressing shared traversal across instances.

Despite these advances, several limitations remain. Existing frameworks primar-ily focus on single-machine execution and do not fully exploit hierarchical distributed architectures [6, 7]. Furthermore, most systems lack mechanisms for tracking compu-tation provenance, which is essential for debugging, explainable AI, and auditing of graph analytics. Additionally, efficient indexing strategies for repeated queries such as shortest-path computations are still an open research problem [8, 9].

To address these challenges, we propose a new framework called HYPER-MI. The proposed technique introduces hierarchical multi-instance execution, adaptive SIMD vectorization, and provenance-aware graph processing. By combining ideas from vec-torized traversal, distributed graph computing, and graph indexing techniques, the proposed method significantly improves scalability and interpretability.

‌The main contributions of this paper are summarized as follows:

A novel hierarchical multi-instance graph computation model.
Provenance-aware graph analytics enabling explainability.
Adaptive SIMD-based traversal optimization.
Distributed scheduling algorithms for scalable graph execution.
Extensive experimental evaluation on large graph datasets.

The remainder of this paper is organized as follows. Section II presents basic con-

cepts. Section III discusses the literature survey. Section IV describes the proposed technique. Section V provides experimental evaluation. Section VI discusses related work and Section VII concludes the paper.

Basic Concepts

Let a graph be defined as G = (V, E) where V = {v1, v2, . . . , vn} is the set of vertices and E V ×V represents the set of edges. For a directed graph, each edge e = (u, v) E denotes a directed connection from vertex u to vertex v, while for an undirected graph e = (u, v) implies bidirectional connectivity. The size of the graph is determined by |V | = n vertices and |E| = m edges. Many graph algorithms operate by iteratively applying a computation function over vertices and their neighborhoods. Formally, a vertex-centric graph algorithm can be expressed as an iterative function

h(t+1)(v) = F h(t)(v), {h(t)(u) | u N (v)}

where h(t)(v) denotes the state of vertex v at iteration t, N (v) denotes the neighbor-hood of vertex v, and F (·) represents the update function. Classical problems such as shortest path computation, centrality analysis, and community detection can be formulated within this iterative vertex-centric model.

Multi-instance graph computation extends the traditional single-query model by evaluating multiple instances of a graph algorithm simultaneously over the same graph structure. Let Q = {Q1, Q2, . . . , Qk} denote a set of k query instances, where each query Qi corresponds to a graph operation such as shortest path computation from a source vertex si. In conventional execution, the total computational cost is proportional to

L

k

Cserial = T (Qi)

i=1

where T (Qi) represents the execution cost of query Qi. Multi-instance algorithms instead share intermediate

computations across queries. If queries traverse overlapping graph regions, shared computations can be reused, reducing the effective cost to

Cshared = T

k

[

i=1

T (Qi)!

where T (Qi) denotes the traversal space of query Qi. When traversal spaces overlap significantly, Cshared Cserial, yielding substantial computational savings.

Vectorization further accelerates multi-instance graph processing by exploiting SIMD (Single Instruction Multiple Data) hardware capabilities. Let w denote the SIMD width, representing the number of data elements processed simultaneously in a single instruction. For a vectorized computation over k query instances, the effective computational cost becomes

w

CSIMD = I k L Tvector

where Tvector represents the cost of executing the vectorized operation. This approach allows simultaneous evaluation of traversal updates for multiple queries, enabling efficient parallel processing at the hardware level.

Provenance tracking provides a formal mechanism for capturing the sequence of computational seps that lead to a particular output in graph analytics. Let (v) denote the provenance record associated with vertex v, defined as the ordered sequence

(v) = v0, v1, . . . , vt

where v0 represents the source vertex and each transition (vi1, vi) E corresponds to a traversal step in the graph. More generally, the provenance of a query result R(Qi) can be represented as a directed acyclic graph (DAG) capturing dependencies between intermediate computations:

P (Qi) = (Vp, Ep)

where Vp represents computation states and Ep represents dependency relationships. Maintaining such provenance graphs enables explainable graph analytics, allowing users to trace how intermediate vertex states contribute to final q uery r esults. This capability is particularly important for debugging large-scale graph algorithms and supporting transparent decision-making in graph-based machine learning systems.

‌Literature Survey

Efficient pr ocessing of la rge-scale gr aphs ha s at tracted si gnificant res earch atten-tion due to the rapid growth of graph-structured data in domains such as social networks, recommendation systems, biological networks, and knowledge graphs. Exist-ing research has focused on improving the efficiency of gr aph co mputation through techniques such as multi-instance algorithm execution, vectorized graph traversal, dis- tributed graph processing frameworks, and specialized graph indexing structures. In this section, we review several representative works that have contributed to scalable graph analytics and identify limitations that motivate the proposed research.

AutoMI [10] introduced an automated framework for transforming traditional vertex-centric graph algorithms into vectorized multi-instance implementations. The key insight behind AutoMI is that when multiple instances of the same algorithm are executed over a shared graph, intermediate computations can be reused across

instances. Formally, given a set of query instances Q = {Q1, Q2, . . . , Qk} operating on graph G = (V, E), AutoMI restructures the computation so that shared vertex updates are performed collectively. Furthermore, the framework exploits SIMD instructions to evaluate multiple instance states simultaneously. Experimental results demonstrate that AutoMI can achieve significant p erformance improvements over serial and batch execution. However, the framework primarily focuses on single-node execution envi-ronments and does not fully address challenges associated with distributed execution or dynamic query scheduling.

MITra [11] proposed a unified f ramework f or c omposing m ulti-instance graph traversal algorithms.

The MITra programming model allows developers to specify traversal computations through vertex ranking functions and edge update operations. Let r(v) denote a rank associated with vertex v, and let fe(·) denote the edge transition

function. The traversal process iteratively updates vertex ranks according to

rt+1(v) = min

uN (v)

fe(rt(u), v),

where N (v) denotes the neighborhood of vertex v. MITra synthesizes multi-instance traversal algorithms by sharing traversal steps across multiple sources. This approach improves computational efficiency and enables SIMD-based execution. Nevertheless, MITra primarily focuses on traversal sharing and does not incorporate advanced indexing strategies or distributed scheduling mechanisms.

Pruned Landmark Labeling (PLL) [12] introduced a highly efficient indexing tech-nique for exact shortest- path distance queries in large-scale graphs. The method constructs vertex labels by performing breadth-first searches from each vertex while applying pruning rules to eliminate redundant exploration. Formally, each vertex v is associated with a label set

L(v) = {(u, d(u, v)) | u V },

where d(u, v) denotes the shortest path distance between vertices u and v. A distance query between vertices s

and t can then be answered by computing

d(s, t) = min

(u,d(u,s))L(s),(u,d(u,t))L(t)

d(u, s) + d(u, t).

The pruning strategy significantly r educes l abel s izes a nd p reprocessing t ime com-pared to previous labeling approaches. Moreover, the algorithm exploits bit-parallel operations to perform multiple breadth-first s earches s imultaneously. A lthough PLL achieves excellent query performance, it requires significant p reprocessing overhead and does not directly support dynamic multi-instance traversal workloads.

GraphLab [13] introduced a distributed graph-parallel programming abstraction designed for machine learning and data mining applications. The framework provides a vertex-centric computation model in which vertex states are iteratively updated using user-defined f unctions a pplied t o l ocal n eighborhoods. To s upport d istributed execu-tion, GraphLab incorporates mechanisms such as pipelined locking, data versioning, and asynchronous execution. These features allow the system to maintain consistency while maximizing parallelism across distributed computing resources. GraphLab has demonstrated significant p erformance i mprovements o ver M apReduce-based systems for graph analytics tasks. However, the framework focuses on parallel execution of single-instance algorithms and does not explicitly support vectorized multi-instance graph computation.

The Never-Ending Language Learning (NELL) system [14] introduced the con-cept of a continuously learning knowledge extraction system capable of autonomously expanding a knowledge base over time. NELL repeatedly extracts structured knowl-edge from large web corpora and refines i ts e xtraction m odels t hrough iterative learning. While the system is primarily designed for knowledge acquisition rather than graph computation, it demonstrates the importance of long-running systems that con-tinuously process large-scale relational data. The architectural principles proposed in

NELL highlight the potential of integrating learning mechanisms with graph-based knowledge structures. However, the system does not directly address challenges related to scalable graph traversal or distributed graph algorithm execution.

Technique

Core Idea

Strength

Limitation

AutoMI [10 ]

MIT ra [11 ]

Vecto rized

multi-instance algorithms Shared multi- source tra ver- sal

Landmark- based distance lab eling Distributed vertex-cen tric execution

Con tin uous knowledge learning

SIMD accelera-

tion

Efficien t

tra versal reuse

Limited dis-

tributed supp ort

Limited index- ing strategies

PLL [12 ]

Fast exact queries

High prepro- cessing cost

GraphLab [13 ]

NELL [14 ]

Scalable ML computation

Large-scale knowledge extraction

No multi- instance optimization Not designed for graph ana- lytics

Table 1 Comparison of representative graph processing techniques

From the above discussion, it can be observed that existing approaches address individual aspects of scalable graph computation. Vectorized multi-instance frame-works such as AutoMI improve computational efficiency bu t la ck di stributed scal-ability. Traversal frameworks such as MITra focus on sharing computation across instances but do not incorporate advanced graph indexing mechanisms. Graph index-ing techniques such as PLL achieve fast query performance but require expensive preprocessing and are not designed for dynamic workloads. Distributed frameworks such as GraphLab enable large-scale parallel execution but do not exploit vectorized multi-instance execution.

Consequently, a significant r esearch g ap e xists i n t he d esign o f g raph processing frameworks that simultaneously integrate hierarchical distributed execution, vector-ized multi-instane computation, and provenance-aware analytics. Addressing this gap requires the development of new algorithmic models and system architectures capa-ble of efficiently sh aring co mputation ac ross qu eries wh ile ma intaining scalability and explainability. The proposed HYPER-MI framework aims to bridge this gap by combining hierarchical query scheduling, SIMD-based multi-instance traversal, and provenance-aware graph processing in a unified architecture.

Proposed Technique
1. Overview
  
  Let G = (V, E) denote a graph where V is the set of vertices and E V × V is the set of edges with |V | = n and |E| =
  
  m. Consider a set of graph queries
  
  Q = {Q1, Q2, . . . , Qk}
  
  where each query represents an instance of a graph algorithm such as shortest-path traversal, reachability analysis, or neighborhood expansion. Traditional graph process-ing systems evaluate each query independently, resulting in redundant exploration of overlapping graph regions.
  
  The central idea of the proposed HYPER-MI framework is to exploit structural overlap between queries and execute them simultaneously using hierarchical multi-instance execution. Instead of executing k independent traversals, HYPER-MI groups similar queries into vectorized execution batches and evaluates them collectively. Let B = {B1, B2, . . . , Br} denote a partition of the query set Q where each batch Bi contains queries with similar traversal patterns.
  
  The framework further exploits SIMD-based vectorization to process multiple traversal states simultaneously. Let w denote the SIMD width supported by the hard-ware architecture. At each traversal step, up to w query states can be updated using a single vectorized instruction.
  
  In addition, the framework maintains a provenance graph that records interme-diate computation dependencies during execution. This provenance structure enables explainable graph analytics and facilitates debugging of complex graph algorithms.
  
  The overall architecture of the proposed system is illustrated in Figure 1.
  
  ‌Query Set Q
  
  ‌Hierarchical Query Scheduler
  
  SIMD Vectorized Traversal Engine
  
  Provenance Graph Builder
  
  Result Set
  
  Fig. 1 Arc hitecture of the HYPER-MI framew ork
  
  ‌The framework consists of four main components:
  - ‌Hierarchical Scheduler: groups queries into batches based on structural similar-ity.
  - Vectorized Traversal Engine: performs SIMD-based multi-instance graph traver-sal.
    - Distributed Execution Layer: assigns query batches to compute nodes.
    - Provenance Tracker: records execution dependencies for explainability.
2. Algorithms
  1. Hierarchical Multi-Instance Scheduling
    
    The scheduler groups queries according to traversal similarity. Let (Qi, Qj) denote a similarity measure between queries based on their source vertices or expected traversal regions. Queries are clustered into batches maximizing intra-batch similarity.
    
    Algorithm 1 Hierarchical Multi-Instance Scheduler
    
    1: Input: Query set Q
    
    2: Compute similarity matrix (Qi, Qj)
    
    3: Cluster queries into batches B
    
    ‌4: for each batch Bi B do
    
    5: Assign batch Bi to compute node
    
    6: end for
    
    7: Return batch assignments
    
    This scheduling step ensures that queries with overlapping traversal regions are executed together, thereby maximizing computation sharing.
  2. Vectorized Multi-Instance Graph Traversal
    
    Let Ft denote the frontier set at traversal iteration t. For a batch Bi containing queries
    
    {Q1, . . . , Qs}, the traversal updates the frontier states simultaneously using vectorized instructions.
    
    Algorithm 2 Vectorized Multi-Instance Graph Traversal
    
    1: Input: Graph G = (V, E), batch B
    
    2: Initialize frontier vectors
    
    3: for each iteration t do
    
    4: for each vertex v V do
    
    5: Load neighbor list N (v)
    
    6: Apply SIMD update for queries in B
    
    7: Update frontier states
    
    8: end for
    
    9: end for
    
    10: Return traversal results
    
    The SIMD execution processes multiple query states simultaneously, reducing instruction overhead.
  3. ‌Provenance Recording
    
    To enable traceability of computation, the system maintains a provenance graph capturing execution dependencies.
    
    Algorithm 3 Provenance Graph Construction
    
    1: Input: Traversal events
    
    2: Initialize provenance graph P = (Vp, Ep)
    
    3: for each traversal step do
    
    4: Record vertex visit (v, q)
    
    ‌5: Add dependency edge in P
    
    6: end for
    
    7: Return provenance graph
    
    Here Vp represents computation states and Ep represents dependency relationships.
3. Mathematical Model
  
  Let T (Qi) denote the cost of executing query Qi independently. The total cost of evaluating k queries sequentially is
  
  L
  
  k
  
  Cserial = T (Qi)
  
  i=1
  
  Assuming uniform traversal cost T (Qi) = T , we obtain
  
  Cserial = kT
  
  In the proposed vectorized multi-instance model, queries are executed in batches of size at most w, where w represents SIMD width.
  
  w
  
  Cvector = I k L T
  
  ‌Furthermore, hierarchical batching reduces redundant traversal operations. Let denote the average traversal overlap factor (0 < 1). The effective execution cost becomes
  
  w
  
  Chyper = · I k L T
4. Theoretical Analysis
  
  Theorem 1. The HYPER-MI execution model achieves lower computational com- plexity than independent query evaluation when w > 1 and < 1.
  
  Proof.
  
  The independent execution cost is
  
  Cserial = kT
  
  The HYPER-MI cost is
  
  C = I k L T
  
  hyper w
  
  Since w > 1, we have
  
  w
  
  I k L < k
  
  and since 0 < < 1, it follows that
  
  Chyper < Cserial
  
  Therefore the hierarchical vectorized execution strictly reduces the computational cost compared to independent query execution.
  
  ‌ The above result demonstrates that combining hierarchical scheduling with SIMD- based multi-instance traversal leads to substantial performance improvements in large-
  
  scale graph analytics.

Experimental Analysis

‌In this section, we evaluate the effectiveness and scalability of the proposed HYPER- MI framework. The goal of the evaluation is to demonstrate that hierarchical multi- instance execution combined with SIMD-based vectorization significantly improves the efficiency of large-scale graph analytics.

Experimental Setup

‌All experiments were conducted on a distributed computing cluster consisting of 16 compute nodes. Each node is equipped with a 16-core Intel Xeon processor, 64 GB RAM, and AVX-512 SIMD support. The nodes are connected via a high-speed 10 Gbps network. The proposed HYPER-MI system was implemented in C++ with optimized SIMD primitives and distributed scheduling support.
Datasets

Experiments were performed on several real-world graph datasets commonly used in graph analytics research:
- Twitter Graph: A social network graph containing approximately 41 million vertices and 1.4 billion edges.
- WebGraph: A web hyperlink graph containing 50 million pages and 1.8 billion edges.
- LiveJournal: A social network graph containing 4.8 million vertices and 69 million edges.
‌These datasets represent large-scale sparse graphs with diverse structural charac- teristics, allowing us to evaluate the robustness of the proposed approach.
p>Evaluation Metrics

The performance of the proposed system was evaluated using the following metrics:
- Query Latency: Average time required to process a query instance. Formally,
  
  k
  
  k
  
  i
  
  L = 1 L T (Q )
  
  i=1
  
  where T (Qi) represents the execution time of query Qi.
- Throughput: Number of queries processed per unit time. If Nq queries are processed within time interval t, then
  
  ‌Throughput = Nq
- Memory Overhead t
  
  : Total memory consumed during query processing, including
  
  traversal state and intermediate computation buffers.
- Scalability: Performance improvement achieved when increasing the number of compute nodes.
Baseline Methods

We compare the proposed system against several state-of-the-art graph processing frameworks:
- GraphLab: A distributed graph-parallel computation framework designed for machine learning workloads.
- ‌MITra: A multi-instance graph traversal framework supporting shared traversal across query instances.
- AutoMI: A vectorized multi-instance graph computation framework.
These systems represent three distinct optimization strategies: distributed compu- tation, traversal sharing, and SIMD-based vectorization.

Query Performance

Table 2 compares the query execution time and throughput across different systems.

The results indicate that HYPER-MI significantly reduces query latency compared to existing systems. In particular, the average query latency decreases by approxi- mately 56% compared to AutoMI and by nearly 80% compared to GraphLab. This improvement can be attributed to the hierarchical batching strategy, which increases traversal sharing across queries.

Furthermore, the throughput of HYPER-MI is substantially higher than that of competing systems. The system processes approximately 540 queries per second, which is nearly twice the throughput of AutoMI. This improvement results from the combined benefits of SIMD vectorization and hierarchical query scheduling.

Metho d	Query	Ti me (ms)	Throughput (queries/s)	Memory Usage
GraphLab		420	120	2.1 GB
‌MIT ra		210	260	1.8 GB
AutoMI		190	300	1.9 GB
HYPER-MI		82	540	1.6 GB

Table 2 P erformance comparison across gra ph pro cessing framew orks

Scalability Analysis

To evaluate scalability, we measured system performance while varying the number of compute nodes from 2 to 16. Table 3 summarizes the results.

Nodes GraphLab MIT ra AutoMI HY P ER-MI

2

380

190

170

72

4

340

160

140

58

8

300

130

110

47

16

260

110

95

38

Table 3 Av erage query latency (ms) under differen t cluster sizes

‌The results demonstrate that HYPER-MI scales efficiently with increasing cluster size. As the number of nodes increases, the system distributes query batches across nodes while maintaining vectorized traversal efficiency. Co nsequently, HYPER-MI achieves consistent performance improvements across all cluster configurations.
Discussion

‌The experimental results demonstrate three key advantages of the proposed frame- work. First, hierarchical scheduling increases the degree of computation sharing among query instances. Second, SIMD vectorization significantly reduces the cost of traversal operations. Third, the distributed execution model enables efficient utilization of clus- ter resources. Together, these design features allow HYPER-MI to outperform existing graph processing frameworks across multiple performance metrics.

Related Work

Scalable graph processing has been extensively studied in the literature. Early distributed graph processing systems such as Pregel introduced the vertex-centric computation model, enabling large-scale graph analytics through message-passing execution. Pregel inspired several subsequent frameworks including GraphLab and PowerGraph, which extended the vertex-centric model with asynchronous execution and improved load balancing.

GraphLab introduced a powerful abstraction for distributed graph-parallel compu-tation. The framework allows developers to express machine learning and data mining algorithms using vertex-centric update functions. By supporting asynchronous execu-tion and fine-grained locking mechanisms, GraphLab achieves high levels of parallelism in distributed environments.

Vectorized graph computation has recently emerged as an effective technique for accelerating graph analytics. The AutoMI framework demonstrated that traditional vertex-centric algorithms can be automatically transformed into vectorized multi-instance versions. By exploiting SIMD instructions, AutoMI significantly improves computational throughput for graph traversal algorithms.

Another line of research focuses on multi-source traversal frameworks. MITra introduced a programming model that enables simultaneous exploration from multi-ple source vertices. By sharing traversal operations across instances, MITra reduces redundant graph exploration and improves performance for multi-query workloads.

Index-based techniques such as pruned landmark labeling address a different aspect of graph analytics by enabling efficient shortest-path distance queries. These tech-niques construct precomputed labels for each vertex, allowing distance queries to be answered without performing expensive graph traversals. However, such approaches typically require significant preprocessing time and memory overhead.

Despite the progress achieved by existing systems, current approaches address only isolated aspects of scalable graph computation. Distributed frameworks focus on parallel execution, vectorized frameworks focus on SIMD acceleration, and traversal frameworks focus on computation sharing. To the best of our knowledge, no existing system integrates hierarchical scheduling, SIMD-based vectorization, distributed exe- cution, and provenance-aware analytics within a unified architecture. The proposed HYPER-MI framework addresses this limitation by combining these complementary techniques into a single scalable graph analytics platform.
Conclusion

‌This paper presented HYPER-MI, a hierarchical provenance-aware multi-instance graph analytics framework. The proposed system integrates vectorized traversal, dis-tributed scheduling, and indexing strategies to significantly improve graph query performance.

Future work will explore integration with graph neural networks and streaming graph processing.

References

Banko, M., Etzioni, O.: Strategies for lifelong knowledge extraction from the web. In: Proceedings of the 4th International Conference on Knowl-edge Capture. K-CAP 07, pp. 95102. Association for Computing Machin-ery, New York, NY, USA (2007). https:// doi.org/10.1145/1298406.1298425 . https://doi.org/10.1145/1298406.1298425
Blum, A.,Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computa-tional Learning Theory. COLT 98, pp. 92100. Association for Computing Machinery, New York, NY, USA (1998). https:// doi.org/10.1145/279943.279962 . https://doi.org/10.1145/279943.279962
Carbonell, J., Etzioni, O., Gil, Y., Joseph, R., Knoblock, C., Minton, S., Veloso, M.: Prodigy: an integrated architecture for planning and learning. SIGART Bull. 2(4), 5155 (1991) https://doi.org/10.1145/122344.122353
Carlson, A., Betteridge, J., Wang, R.C., Hruschka, E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proceed-ings of the Third ACM International Conference on Web Search and Data Mining. WSDM 10, pp. 101110. Association for Computing Machin-ery, New York, NY, USA (2010). https://doi.org/10.1145/1718487.1718501 . https:// doi.org/10.1145/1718487.1718501
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 4175 (1997) https://doi. org/10.1023/A:1007379606734
Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv.40(1) (2008) https://doi.org/10.1145/1322432.1322433
Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 6375 (1985) https://doi.org/10.1145/214451.214456
‌ Chen, R., Weng, X., He, B., Yang, M.: Large graph processing in the cloud. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. SIGMOD 10, pp. 11231126. Association for Computing Machin-ery, New York, NY, USA (2010). https://doi.org/10.1145/1807167.1807297 . https://doi.org/10.1145/1807167.1807297‌
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. NIPS06,

pp. 281288. MIT Press, Cambridge, MA, USA (2006)
Zhao, W., Cao, Y., Buneman, P., Li, J., Ntarmos, N.: Automating vectorized distributed graph computation. Proc. ACM Manag. Data 2(6) (2024) https:// doi.org/10.1145/3698833
Li, J., Zhao, W., Ntarmos, N., Cao, Y., Buneman, P.: Mitra: A framework for multi-instance graph traversal. Proc. VLDB Endow. 16(10), 2551 2564 (2023) https://doi.org/10.14778/3603581.3603594
Akiba, T., Iwata, Y., Yoshida, Y.: Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceed- ings of the 2013 ACM SIGMOD International Conference on Management of Data. SIGMOD 13, pp. 349360. Association for Computing Machin-ery, New York, NY, USA (2013). https://doi.org/10.1145/2463676.2465315 . https://doi.org/10.1145/2463676.2465315
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716727 (2012) https://doi.org/10.14778/ 2212351.2212354
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. AAAI10, pp. 13061313. AAAI Press, ??? (2010)

Author Biography

Ramachandra Reddy Vangala is an accomplished technology professional serving as a Principal Cloud Architect for Enterprise Cloud Platforms at FEI Systems Inc, USA. With over 15 years of experience in cloud computing, distributed systems, and enterprise architecture, he has successfully led the design and delivery of scalable, high-performance solutions for complex enterprise environments.

An innovative thinker and contributor to the field, he holds four patents and has authored two books along with more than twenty articles published in international journals. His areas of expertise include cloud-native architecture, Software engineering, data platforms, and large-scale system design, through which he continues to drive advancements and thought leadership in modern enterprise technology.

2	380	190	170	72
4	340	160	140	58
8	300	130	110	47
16	260	110	95	38

HYPER-MI: Hierarchical Provenance-Aware Multi-Instance Graph Analytics for Scalable Distributed Graph Computation

Hierarchical Multi-Instance Scheduling

Vectorized Multi-Instance Graph Traversal

‌Provenance Recording