AutoML-Driven Flow-Level Classification of Encrypted Instant Messaging Applications

doi:https://doi.org/10.5281/zenodo.18095825

Volume 14, Issue 12 (December 2025)

AutoML-Driven Flow-Level Classification of Encrypted Instant Messaging Applications

DOI : https://doi.org/10.5281/zenodo.18095825

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 96
Authors : Ritwick Mondal
Paper ID : IJERTV14IS120562
Volume & Issue : Volume 14, Issue 12 , December – 2025
DOI : 10.17577/IJERTV14IS120562
Published (First Online): 30-12-2025
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

AutoML-Driven Flow-Level Classification of Encrypted Instant Messaging Applications

Ritwick Mondal

Department of Computer Science and Engineering National Institute of Technology Durgapur Durgapur, India

AbstractThe adoption of end-to-end encryption in Instant Messaging Applications (IMAs) has rendered payload-based trafc inspection ineffective, creating challenges for network monitoring, security enforcement, and quality-of-service manage- ment. Flow-level trafc classication has emerged as a practical alternative, but achieving high performance requires careful feature engineering and robust model optimization. This study presents an AutoML-driven framework for multiclass classi- cation of encrypted IMA trafc using ow-level statistical features extracted with Tranalyzer2. A domain-driven feature l- tering strategy removes protocol-specic and location-dependent attributes, yielding a rened set of 194 application-agnostic features, including the target class. We evaluate two state-of-the- art AutoML frameworks, AutoGluon and FLAML, on a dataset comprising six encrypted IMAsMicrosoft Teams, Discord, Facebook Messenger, Signal, Telegram, and WhatsAppand four non-IMA trafc categories. AutoGluon constructs an ensemble model achieving 99.96% accuracy, whereas FLAML optimizes a LightGBM model within a limited time budget, achieving 99.94% accuracy with lower computational overhead. The com- parative analysis highlights the trade-offs between ensemble- driven robustness and lightweight optimization efciency. These results demonstrate that ow-level representations combined with modern AutoML frameworks enable highly accurate, scalable, and reproducible classication of encrypted IMA trafc while eliminating the need for manual model selection and hyperpa- rameter tuning, making the approach suitable for deployment in real-world, resource-constrained network environments.

Index TermsEncrypted Trafc Classication; Instant Mes- saging Applications; AutoML; Flow-Level Features

Introduction

The proliferation of end-to-end encryption in Instant Mes- saging Applications (IMAs) such as WhatsApp, Signal, Tele- gram, and Microsoft Teams has signicantly enhanced user privacy but simultaneously challenged traditional network monitoring and trafc analysis methods. Payload-based in- spection is rendered ineffective under encryption, limiting the ability of network administrators and security systems to accurately identify and manage application trafc. Conse- quently, ow-level trafc classication has emerged as a viable alternative, leveraging statistical and temporal characteristics of network ows rather than payload contents to distinguish between encrypted applications.

Accurate classication of encrypted IMA trafc remains a challenging task due to the high dimensionality and het- erogeneity of ow-level features, as well as the presence

of non-linear correlations between trafc characteristics and application behavior. Traditional machine learning approaches often require extensive manual feature engineering, model selection, and hyperparameter tuning, which can be labor- intensive, time-consuming, and prone to overtting, particu- larly when datasets are large and complex. Moreover, certain features, such as protocol identiers or ve-tuple attributes, may trivially reveal application categories without capturing intrinsic trafc patterns, necessitating careful domain-driven preprocessing to ensure robust generalization.

Automated Machine Learning (AutoML) frameworks pro- vide an effective solution to these challenges by streamlining model selection, feature preprocessing, and hyperparameter optimization in a reproducible and scalable manner. This study investigates the application of two state-of-the-art AutoML frameworks, AutoGluon [2] and FLAML [3], for multiclass classication of encrypted IMA trafc. Flow-level features are extracted using Tranalyzer2, a high-performance and extensi- ble trafc analysis tool, and rened through domain-driven feature ltering to produce an application-agnostic feature set. AutoGluon leverages an ensemble-driven approach to achieve high robustness, while FLAML employs time-budgeted opti- mization for lightweight yet accurate model selection.

The contributions of this work are threefold: (i) the develop- ment of a robust AutoML-based framework for encrypted IMA trafc classication using ow-level features, (ii) a systematic evaluation of ensemble-driven versus lightweight AutoML strategies, and (iii) an empirical demonstration of near-perfect classication performance across multiple encrypted IMAs with minimal manual intervention, highlighting the suitability of the proposed approach for resource-constrained, real-world network environments.
Related Work

Machine learning has been widely adopted for network traf- c analysis tasks such as application identication, encrypted trafc classication, malware detection, and intrusion detec- tion. Early approaches relied heavily on manually engineered features and carefully tuned models, making them labor- intensive and difcult to scale. With the widespread adoption of encryption, recent research has shifted toward payload- agnostic representations and automated learning pipelines. Representative efforts include nPrintML [8], which integrates

packet-level representations with AutoML to reduce manual feature engineering, and GGFAST [9], which constructs in- terpretable classiers using packet size sequences and ow- level n-grams. While these methods demonstrate strong per- formance in payload-independent trafc analysis, they pri- marily operate on packet-level representations or task-specic classiers and do not systematically evaluate general-purpose AutoML frameworks for application-level trafc classication. More recent studies have explored AutoML and deep learn- ing for encrypted trafc analysis. AutoML4ETC [10] applies neural architecture search to encrypted trafc classication using packet header information, while Isingizwe et al. [11] employ AutoML pipelines for encrypted malware trafc detec- tion with automated model selection and ensemble learning. Although these works highlight the potential of AutoML in reducing tuning complexity and improving performance, their focus is largely limited to malware detection or packet-level classication, leaving encrypted Instant Messaging Applica- tion (IMA) trafc and ow-level statistical modeling relatively

unexplored.

In contrast, this work addresses this gap by present- ing a comprehensive AutoML-based framework for en- crypted IMA trafc classication using ow-level statistical features extracted via Tranalyzer2 [1]. Unlike prior stud- ies, we systematically evaluate both ensemble-driven and lightweight, time-budgeted AutoML frameworksAutoGluon and FLAMLunder identical experimental conditions, pro- viding practical insights into performanceefciency trade- offs. Section III details the proposed methodology and feature extraction process, followed by the rationale for selecting Tranalyzer2 in Section IV, AutoGluon-based experimental analysis in Section V, FLAML-based experimental analysis in Section VI, a comparative evaluation of AutoML frameworks in Section VII, and concluding insights in Section VIII.
Methodology

This section presents the complete experimental method- ology adopted for encrypted Instant Messaging Application (IMA) trafc classication using AutoML frameworks. The proposed workow is designed to ensure robustness, scala- bility, and reproducibility while avoiding informtion leakage and dataset bias. An overview of the end-to-end pipeline is illustrated in Figure 1.

The raw network trafc used in this study was obtained from the publicly available Encrypted Mobile Instant Messaging Trafc Dataset hosted on IEEE DataPort [4]. The dataset consists of ten packet capture (PCAP) les collected under controlled experimental conditions. Six PCAP les correspond to encrypted IMA trafc generated by Microsoft Teams, Dis- cord, Facebook Messenger, Signal, Telegram, and WhatsApp, while the remaining four PCAP les represent non-IMA traf- c, including web browsing, YouTube streaming, Gmail usage, and background system services. This composition enables the formulation of a realistic multi-class classication problem that reects both application-specic encrypted communica- tion and heterogeneous background trafc.

Fig. 1: End-to-end workow for encrypted IMA trafc clas- sication, illustrating the processing pipeline from raw PCAP collection and ow-level feature extraction to AutoML-based model training and evaluation.

Each PCAP le was processed independently using Trana- lyzer2, a high-performance ow-based trafc analysis frame- work. Tranalyzer2 aggregates packets into bidirectional ows and extracts a comprehensive set of statistical, temporal, transport-layer, entropy-based, and frequency-domain features via its modular plugin architecture. The output of each PCAP processing stage is a machine-learning-ready CSV le. In total, an initial feature set of 235 ow-level attributes, including the target label, was obtained. The extracted dataset comprises 106,284 ow records, providing sufcient scale and diversity for reliable supervised learning.

Following feature extraction, all CSV les were merged into a single integrated dataset. A domain-informed labeling strategy was applied to better reect realistic deployment scenarios. Specically, the four non-IMA trafc CSVs were consolidated into a single Non-IMA class, while each en- crypted IMA application was assigned an individual class label. This resulted in a seven-class classication problem consisting of six encrypted IMA classes and one aggregated

non-IMA class.

To enhance generalization performance and prevent infor- mation leakage, a domain-driven feature ltering process was conducted prior to model training. Features that could trivially reveal trafc identity without capturing intrinsic behavioral characteristics were removed. These include protocol identi- ers, ve-tuple attributes (source IP address, destination IP address, source port, destination port, and transport-layer pro- tocol), as well as location-dependent features. Such attributes may articially inate classication performance while reduc- ing robustness in real-world scenarios. After ltering, a rened feature set of 194 ow-level attributes, including the target label, was retained. These features collectively provide an application-agnostic representation of encrypted trafc suitable for automated learning.

The rened dataset was randomly split into training and testing subsets using an 80:20 ratio, resulting in 85,027 train- ing instances and 21,257 testing instances. The classication task was formulated as a multi-class supervised learning prob- lem. Two state-of-the-art AutoML frameworks, AutoGluon and FLAML, were employed to evaluate automated model selection, feature handling, and hyperparameter optimization under identical experimental conditions.

AutoGluon-Tabular was applied directly to the ltered dataset without manual preprocessing. The framework au- tomatically inferred feature types, handled missing values, encoded categorical attributes, and removed non-informative features. During training, AutoGluon evaluated a diverse pool of learners, including gradient boosting models, ensemble tree methods, and neural networks, and combined them using a weighted ensemble strategy. The default medium preset was used to balance computational efciency and predic- tive performance, leading to the automatic selection of the WeightedEnsemble_L2 model as the nal predictor.

To further assess the effectiveness of lightweight AutoML

under strict time constraints, FLAML was evaluated on the same dataset and traintest split. FLAML was congured for multiclass classication with a xed time budget of

120 seconds and optimized the log-loss objective. During optimization, FLAML dynamically allocated computational resources across candidate learners, including LightGBM, Random Forest, Extra Trees, XGBoost, and linear models, ultimately converging to an optimized LightGBM classier.

Finally, the performance of AutoGluon and FLAML was systematically compared using standard evaluation metrics, including accuracy, balanced accuracy, precision, recall, and F1-score. This comparative analysis enables a rigorous as- sessment of the trade-offs between predictive performance, automation capability, and computational efciency in the context of encrypted IMA trafc classication.
Rationale for Selecting Tranalyzer2

Based on a systematic comparison of widely used open- source network trafc analysis frameworks, Tranalyzer2 was selected as the ow-level feature extraction tool for this study due to its strong alignment with the requirements of encrypted

Instant Messaging Application (IMA) trafc classication. Unlike packet-centric analysis tools such as Scapy [6], or ow generators with comparatively limited statistical expres- siveness such as CICFlowMeter [5], Tranalyzer2 natively produces machine-learning-ready CSV outputs enriched with a comprehensive and extensible set of ow-level attributes through its modular plugin architecture.

In contrast to Zeek [7], which primarily focuses on semantic and event-driven logging and often requires substantial post- processing to adapt outputs for machine learning workows, Tranalyzer2 directly exposes a wide spectrum of discriminative features encompassing statistical, temporal, transport-layer, and TLS-related characteristics. This enables ne-grained be- havioral modeling of encrypted trafc without reliance on payload inspection. Its support for multiple protocols, includ- ing TCP, UDP, SSL/TLS, and application-aware deep packet inspection (DPI) plugins, is particularly well suited for modern IMA trafc, where encryption renders content-based analysis infeasible.

From a scalability perspective, Tranalyzer2 is optimized for high-throughput PCAP processing and demonstrates strong performance when handling large-scale trafc traces, a critical requirement for ow-based learning tasks. Furthermore, its lightweight and modular design allows controlled extensibility via plugins without introducing scripting overhead or complex congurations, in contrast to frameworks that depend heavily on custom scripts. Collectively, these characteristics motivate the choice of Tranalyzer2 for ow-level feature extraction, as it provides a robust, scalable, and encryption-resilient repre- sentation of network trafc that integrates seamlessly with the automated machine learning pipelines employed in this work.

AutoGluon-Based Experimental Analysis

This section presents a detailed experimental analysis of the proposed encrypted Instant Messaging Application (IMA) traf- c classication framework using AutoGluon-Tabular. Follow- ing an 80:20 traintest split, the resulting training and testing datasets comprised 85,027 and 21,257 instances, respectively, each represented by 194 ow-level features including target class. AutoGluon was employed for multiclass classication with accuracy as the primary optimization metric, automati- cally formulating a seven-class prediction problem correspond- ing to six encrypted IMA applications and a consolidated non- IMA class.

During preprocessing, AutoGluons automated feature engi- neering pipeline performed data type inference, missing value handling, categorical encoding, and feature transformation, while identifying and excluing non-informative attributes. Specically, 16 features were removed due to constant values, and an additional 25 features were ignored owing to limited informational contribution. As a result, 152 original features were retained and transformed into 161 processed features, reducing memory usage while preserving discriminative capa- bility. A summary of the feature composition before and after preprocessing is provided in Table II.

TABLE I: Model-wise Performance Metrics (Rounded to 3 Decimal Places)

Model	Acc.	Bal. Acc.	Prec. M	Prec.	Prec. W	Rec. M	Rec.	Rec. W	F1 M	F1	F1 W
XGBoost	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999
RandomForest (Entropy)	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999
RandomForest (Gini)	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999
ExtraTrees (Gini)	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999
ExtraTrees (Entropy)	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999
NeuralNetFastAI	0.999	0.998	0.998	0.999	0.999	0.998	0.999	0.999	0.998	0.999	0.999
WeightedEnsemble L2	0.999	0.998	0.998	0.999	0.999	0.998	0.999	0.999	0.998	0.999	0.999
LightGBM-XT	0.999	0.998	0.998	0.999	0.999	0.998	0.999	0.999	0.998	0.999	0.999
NeuralNetTorch	0.998	0.998	0.998	0.998	0.998	0.998	0.998	0.998	0.998	0.998	0.998
LightGBM	0.998	0.998	0.998	0.998	0.998	0.998	0.998	0.998	0.998	0.998	0.998
LightGBM-Large	0.998	0.997	0.999	0.998	0.998	0.997	0.998	0.998	0.998	0.998	0.998
CatBoost	0.996	0.995	0.994	0.996	0.996	0.995	0.996	0.996	0.995	0.996	0.996

TABLE II: Summary of Feature Types Used by AutoGluon

Feature Category Total Original Features	Count 193
Useless Features (Constant)	16
Unused Features (Ignored)	25
Original Features Considered	152
Processed Features Generated	161
Integer Features	62
Boolean Features	7
Float Features	53
Categorical Features	27

Datetime-derived Features 12

During the AutoML training phase, a diverse pool of base learners was evaluated, including gradient boosting mod- els (LightGBM, LightGBMXT, LightGBMLarge, and XG- Boost), ensemble tree-based methods (Random Forest and Extra Trees with both Gini and Entropy criteria), neural network architectures (NeuralNetFastAI and NeuralNetTorch), and CatBoost. Several models achieved validation accura- cies exceeding 99.8%. Based on validation performance, the WeightedEnsemble_L2 model was automatically selected as the nal predictor, with the NeuralNetFastAI model con- tributing most signicantly to the ensemble. As no explicit preset was specied, AutoGluon defaulted to the medium preset, offering a balanced trade-off between computational efciency and predictive performance.

A detailed comparison of model-wise performance metrics on the held-out test set is reported in Table I. The consistently high accuracy, balanced accuracy, precision, recall, and F1- scores across all evaluated models indicate strong separability of encrypted IMA trafc when represented using ow-level statistical features.

To enhance interpretability, feature importance scores were extracted from the trained AutoGluon ensemble. Figure 2 highlights the most inuential ow-level attributes contributing to encrypted IMA trafc discrimination, demonstrating that the classier relies primarily on intrinsic trafc dynamics rather than protocol identiers or payload-dependent characteristics. The feature importance analysis indicates that AutoGluon primarily leverages temporal and ow-structural characteristics for encrypted IMA trafc classication. Highly ranked features such as nDPIclass, timeFirst, and timeLast high-

Fig. 2: Top-ranked ow-level feature importances obtained from the AutoGluon ensemble model.

light the signicance of coarse protocol cues and ow timing information. Connection-level attributes including connSip, connDip, and connSipDptr further emphasize the role of sourcedestination interaction patterns. Overall, the dominance of payload-independent statistical features conrms that robust encrypted trafc discrimination can be achieved using intrinsic trafc dynamics without relying on payload inspection.

Fig. 3: Performance metrics of the optimized FLAML model on the encrypted IMA trafc classication task.

FLAML-Based Experimental Analysis

To further investigate the effectiveness of lightweight Au- toML frameworks for encrypted Instant Messaging Applica- tion (IMA) trafc classication, additional experiments were conducted using FLAML (Fast Lightweight AutoML). The experiments were performed on the same dataset and rened feature space used in the AutoGluon analysis to ensure a fair and controlled comparison. Specically, the complete set of 194 ow-level features including target class, obtained after domain-driven ltering, was employed along with an identical 80:20 traintest split.

FLAML was congured for multiclass classication with a xed time budget of 120 seconds and log-loss as the op- timization objective. Under this setting, FLAML dynamically allocated computational resources across a diverse pool of can- didate learners, including LightGBM, Random Forest, Extra Trees, XGBoost, deph-limited XGBoost, stochastic gradient descent (SGD), and L1-regularized logistic regression. The optimization process followed a holdout validation strategy, progressively rening hyperparameters while accounting for heterogeneous evaluation costs across models.

Throughout the optimization process, LightGBM consis- tently emerged as the most competitive learner. As the avail- able time budget increased, FLAML explored more expressive congurations of LightGBM, ultimately converging to an optimized classier that achieved the lowest validation error among all evaluated models. The nal retrained LightGBM model obtained under the 120-second budget delivered near- perfect classication performance on the held-out test set.

The quantitative performance of the optimized FLAML model is illustrated in Figure 3. The model achieved an accuracy of 99.94%, balanced accuracy of 99.92%, precision of 99.92%, recall of 99.92%, and an F1-score of 99.92%. These results demonstrate that FLAML is able to extract strong predictive performance from ow-level encrypted trafc features while maintaining signicantly lower computational overhead compared to ensemble-heavy AutoML frameworks. To enhance model interpretability, feature importance scores were extracted from the optimized LightGBM classier pro-

Fig. 4: Top-ranked ow-level feature importances obtained from the FLAML-optimized LightGBM model.

duced by FLAML. Figure 4 presents the top-ranked ow-level attributes contributing to the classication decision. The results indicate that connection-level, temporal, and statistical ow descriptors dominate the decision-making process, conrming that discriminative information is primarily derived from in- trinsic trafc dynamics rather than payload content or explicit protocol identiers.

Features such as srcMac_dstMac_numP, connG, and connSipDprt emerge as the top contributors, highlighting the importance of communication patterns, connection group- ing behavior, and destination port dynamics in distinguishing encrypted IMA trafc. These features capture how endpoints interact over time and how frequently specic communica- tion pairs exchange packets, which are strong indicators of application-specic behavior even under encryption. Simi- larly, the prominence of dstPortClassN, connSip, and flowInd suggests that port class abstraction and ow in- dexing information play a crucial role in characterizing trafc without exposing sensitive payload information.

Temporal attributes such as timeLast, timeFirst, and inter- arrival-time-related metrics (e.g., avgIAT, maxIAT) further

reinforce the observation that timing behavior is a key dis- criminative factor. These features capture session duration, burstiness, and packet spacing patterns, which tend to differ signicantly across instant messaging applications due to their underlying communication protocols and user interaction models. In addition, statistical ow descriptors, including byte and packet asymmetry metrics (e.g., bytAsm, l3BytesRcvd), provide insight into data exchange balance between sender and receiver, further strengthening class separability.

Lower-ranked features, such as TCP window statistics and payload entropy-related attributes, contribute marginally to the nal decision. This gradual decline in importance demonstrates that while these features add contextual information, the core predictive power is concentrated in a relatively compact subset of ow-level characteristics. Importantly, the absence of heavy reliance on payload entropy or protocol identiers conrms the robustness of the approach in encrypted settings, where deep packet inspection is infeasible.

Overall, this detailed feature-importance analysis substan- tiates that FLAML not only achieves near-perfect classica- tion performance under strict time constraints but also yields interpretable and behavior-driven models. The dominance of connection-level, temporal, and statistical features validates the effectiveness of ow-based analysis for encrypted IMA trafc classication. Consequently, FLAML proves to be an efcient, scalable, and practical AutoML framework for real-time and resource-constrained network monitoring environments, where rapid optimization, high accuracy, and transparent decision- making are essential.

Comparative Analysis of AutoGluon and

FLAML AutoML Frameworks

While both AutoGluon and FLAML are designed to auto- mate model selection and hyperparameter optimization, they differ substantially in design philosophy, computational strat- egy, and practical deployment objectives. To provide a clearer methodological perspective, Table III presents a structured comparison between the two frameworks as used in this study. FLAML is explicitly optimized for efciency under strict time and resource constraints. Its budget-aware optimiza- tion strategy dynamically allocates computational effort to- ward promising congurations, enabling rapid convergence to high-performing models with minimal overhead. This makes FLAML particularly suitable for time-sensitive and resource- constrained environments, such as real-time encrypted network trafc analysis. Moreover, its lightweight design ensures that it can be easily integrated into automated pipelines without requiring specialized hardware, allowing for scalable deploy- ment across multiple devices or network nodes. Its ability to quickly adapt to changing data patterns further reinforces its utility in dynamic trafc scenarios where rapid decision-

making is essential.

In contrast, AutoGluon prioritizes predictive performance through extensive ensembling, stacking, and bagging strate- gies. By leveraging a diverse set of base learners and multi- level ensemble construction, AutoGluon is able to achieve

TABLE III: Comparison between FLAML and AutoGluon

Feature / Aspect		FLAML			AutoGluon
Purpose		Lightweight AutoML for fast hyperparame- ter optimization			Comprehensive AutoML framework for high-performance modeling
Primary Focus		Time- and resource- efcient model selec- tion			Maximizing predictive performance via en- sembles
Supported Tasks		Classication, regres- sion, custom function tuning			Classication, regres- sion, and multimodal learning
Base Learners		LightGBM, XGBoost, Random Forest, Extra Trees, linear models			LightGBM, CatBoost, XGBoost, neural net- works
Hyperparameter Tuning		Budget-aware, cost- frugal optimization with early stopping			Bayesian optimization, search, and learning	grid meta-
Ensembling egy	Strat-	Limited; selecting model	focuses the	on best	Extensive stacking and bagging
Ease of Use		Simple, scikit- learnstyle interface			Feature-rich with moderate complexity
Speed / Resource Usage		Very fast with low memory footprint			Slower with higher memory and compute requirements
Zero-Shot AutoML		Supported			Not supported
Best Use Case		Rapid experimentation and resource-limited deployment			Achieving near state- of-the-art accuracy

strong generalization performance, albeit at the cost of in- creased training time and memory usage. This design makes AutoGluon well suited for scenarios where computational resourcesare less constrained and maximizing classication accuracy is the primary objective. Additionally, its built-in support for multi-modal data and automated feature engineer- ing further enhances its versatility in complex trafc classi- cation tasks. Its robustness to noisy or incomplete datasets also ensures consistent performance across diverse network environments.

Overall, the comparison highlights a fundamental trade- off between computational efciency and ensemble-driven performance. While both frameworks demonstrate strong suit- ability for encrypted IMA trafc classication, FLAML offers faster optimization and lower resource consumption, whereas AutoGluon provides a more comprehensive and performance- oriented AutoML pipeline. The complementary strengths of these frameworks justify their joint evaluation in this work. Understanding these differences also provides valuable guid- ance for practitioners when selecting an AutoML tool based on specic project constraints, operational environments, and desired outcomes. By carefully considering the trade-offs between speed, resource usage, and predictive accuracy, re- searchers and engineers can make informed decisions that align with both experimental and real-world deployment goals.

Conclusion

This study demonstrates that encrypted Instant Messaging Application (IMA) trafc can be accurately classied using ow-level statistical features without reliance on payload in- spection or protocol-specic identiers. By employing Tran- alyzer2 for feature extraction, the proposed framework effec- tively captures intrinsic trafc dynamics that remain robust under encryption.

A systematic evaluation of two AutoML frameworks high- lights their complementary strengths. AutoGluon achieves near-optimal classication performance through ensemble- driven optimization, making it well suited for scenarios where maximum predictive performance and generalization are re- quired. In contrast, FLAML delivers comparable performance with signicantly lower computational cost, emphasizing its suitability for time-critical and resource-constrained environ- ments.

Overall, the results conrm that AutoML provides a scalable and reproducible solution for encrypted trafc classication, enabling informed trade-offs between accuracy and efciency. These ndings offer practical guidance for deploying au- tomated learning systems in real-world, privacy-preserving network monitoring applications.

References

Tranalyzer2 Documentation. Available at: https://tranalyzer.com/ documentation
AutoGluon: AutoML for Tabular, Text, and Image Data. Available at: https://auto.gluon.ai/stable/index.html
Microsoft FLAML: Fast Lightweight AutoML. Available at: https:

//microsoft.github.io/FLAML/
Encrypted Mobile Instant Messaging Trafc Dataset. IEEE DataPort. Available at: https://ieee-dataport.org/documents/ encrypted-mobile-instant-messaging-trafc-dataset
Ahlashkari, A., et al.: CICFlowMeter: A Network Trafc Flow Gener- ator. Available at: https://github.com/ahlashkari/CICFlowMeter
Scapy: Packet Manipulation Program. Available at: https://scapy.net/
Zeek Network Security Monitor. Available at: https://zeek.org/
Holland, J., Schmitt, P., Feamster, N., Mittal, P.: New Directions in Automated Trafc Analysis. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 33663383.

ACM (2021). DOI: https://doi.org/10.1145/3460120.3484758
Piet, J., Nwoji, D., Paxson, V.: GGFAST: Automating Generation of Flexible Network Trafc Classiers. In: Proceedings of the ACM SIGCOMM 2023 Conference, pp. 850866 (2023). DOI: https://doi.org/ 10.1145/3603269.3604840
Malekghaini, N., et al.: AutoML4ETC: Automated Neural Architecture Search for Real-World Encrypted Trafc Classication. IEEE Transac- tions on Network and Service Management, 21(3), 27152730 (2024).

DOI: https://doi.org/10.1109/TNSM.2023.3324936
Isingizwe, D.F., Wang, M., Liu, W., Wang, D., Wu, T., Li, J.: Analyzing Learning-based Encrypted Malware Trafc Classication with AutoML. In: 2021 IEEE 21st International Conference on Communication Tech- nology (ICCT), pp. 313322. IEEE (2021). DOI: https://doi.org/10.1109/ ICCT52962.2021.9658106