Enhancing Neural Style To Multi-Style and Semantic Preservation

Shikha Dwivedi; Om Shinde; Gaurav Sunil Taralkar; Rohit Thube; Tanmay Sinare

doi:10.5281/zenodo.20846567

Volume 15, Issue 06 (June 2026)

Enhancing Neural Style To Multi-Style and Semantic Preservation

DOI : 10.5281/zenodo.20846567

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 1
Authors : Shikha Dwivedi, Om Shinde, Gaurav Sunil Taralkar, Rohit Thube, Tanmay Sinare
Paper ID : IJERTV15IS061018
Volume & Issue : Volume 15, Issue 06 , June – 2026
Published (First Online): 25-06-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Enhancing Neural Style To Multi-Style and Semantic Preservation

Shikha Dwivedi

Department of Computer Engineering JSPMs Jaywantrao Sawant College of Engineering, Hadapsar, Pune

Gaurav Sunil Taralkar

Department of Computer Engineering JSPMs Jaywantrao Sawant College of Engineering, Hadapsar, Pune

Om Shinde

Department of Computer Engineering JSPM,s Jayawantrao Sawant College of Engineering, Hadapsar, Pune.

Rohit Thube

Department of Computer Engineering JSPMs Jaywantrao Sawant College of Engineering, Hadapsar, Pune

Tanmay Sinare

Department of Computer Engineering JSPMs Jayw Sawant College of Engineering, Hadapsar, Pune

ABSTRACT

Neural Style Transfer (NST) is a deep learning technique that combines the content of one image with the artistic style of another image to generate visually appealing results. Traditional NST methods primarily focus on single style transfer and often fail to preserve important semantic information, resulting in distortion of object structures and image details. This paper presents an enhanced Neural Style Transfer framework that supports multi style blending while maintaining semantic consistency. The proposed approach integrates semantic segmentation and attention mechanisms to apply different artistic styles to semantically relevant regions of an image. A multi style fusion module is used to combine multiple style representations in a balanced and coherent manner. The framework utilizes a pre trained VGG19 network for feature extraction and optimization of content and style features. Experimental results demonstrate that the proposed system produces high quality stylized images with improved content preservation, smoother style transitions, and better semantic integrity. The proposed method offers a flexible and effective solution for artistic image generation and can be applied in digital art, image editing, multimedia design, and creative content generation.

INTRODUCTION

Neural Style Transfer (NST) is a deep learning technique that combines the content of one image with the artistic style of another image to generate visually appealing results. The concept was first introduced by Gatys et al. using deep convolutional neural networks to extract content and style representations from images [1]. Although traditional NST methods produce impressive stylized images, they are generally limited to single-style transfer and often fail to preserve important semantic information. Recent studies have focused on semantic-aware style transfer and multi-style blending to improve content preservation and visual consistency [3], [13]. To address these limitations, this work proposes an enhanced neural style transfer framework that integrates multi-style fusion and semantic preservation. By utilizing semantic segmentation and attention mechanisms, the proposed system applies multiple artistic styles effectively while maintaining the structural integrity of objects and improving the overall visual quality of the generated images [3].

Table1Traditional Techniques Used for Neural Style Transfer

Technique	Application in Neural Style Transfer	Limitations
Background Subtraction	Pre-processing step to isolate foreground content for style application	Ineffective with complex or dynamic backgrounds; sensitive to lighting variations and shadows
Optical Flow	As in temporal consistency for video style transfer	Computationally expensive; sensitive to noise and rapid object movements
Kalman Filter	Predicts object position across frames for consistent style tracking	Ineffective for non- linear or abrupt style changes; assumes linear motion and Gaussian noise
Mean-Shift / CAMShift	Supports region-based tracking for localized style application	Struggles with background color similarity; prone to failure under occlusion
Deep Learning (CNN-based)	Extracts hierarchical features for detailed style and semantic understanding	Requires large datasets; computationally INTENSIVE; MAY NOT SUPPORT REAL-TIME PROCESSING
YOLO / SSD (REAL-TIME DETECTORS)	Detects semantic regions for targeted multi-style transfer	Limited precision at object boundaries; FOCUSES ON BOUNDING BOXES RATHER THAN DETAILED CONTOURS
Siamese Network Trackers	Matches semantic features across frames for style CONTINUITY	Performs well for short durations; less effective under long occlusions or APPEARANCE CHANGES
Transformer- Based Models	Ensures long-range semantic consistency and multi-style blending	High computational cost; real-time application remains challenging
Multi-Object Tracking (MOT)	Manages semantic regions when multiple objects are present for consistent style application	Identity switching in dense scenes; highly dependent on initial detection accuracy

introduced Adaptive Attention Normalization to establish fine grained correspondence between content and style features, resulting in improved stylization quality and content preservation [1]. StyTr² employed a transformer based architecture to capture long range dependencies and enhance semantic consistency during style transfer [2]. Style Mixer proposed a semantic aware multi style transfer framework that automatically applies different artistic styles to semantically meaningful regions, achieving better style diversity and smoother transitions [3].

Recent studies have focused on semantic preservation and attention mechanisms. MAST aligned content and style features based on semantic manifolds to improve structural fidelity [9]. CAST utilized contrastive learning to learn richer style representations and reduce visual artifacts [8]. Zhao et al. introduced semantic masks to guide the style transfer process, preserving semantic boundaries and reducing distortions [13]. Although these methods improve stylization quality, challenges such as efficient multi style fusion, semantic consistency, and real time performance still remain. Therefore, this work proposes an enhanced neural style transfer framework that integrates multi style blending and semantic preservation to generate visually coherent and semantically meaningful stylized images.

Architecture Diagram

Input Layer (Image Loading and Preprocessing)

The system takes a content image and a style image as input. Both images are resized, converted into tensors, and normalized to make them compatible with the VGG19 network.
Feature Extraction Layer

LITERATURE REVIEW

Several researchers have proposed different approaches to improve Neural Style Transfer (NST) by enhancing style quality and preserving content information. AdaAttN

A pre-trained VGG19 model is used to extract content and style features from different convolutional layers. These features capture the structure and texture information of the images.
Style and Content Loss Computation Layer

Content loss measures how well the generated image preserves the original content, while style loss measures the similarity to the style image. Both losses are combned to calculate the total loss.
Optimization Layer

The generated image is optimized using the Adam optimizer. Pixel values are updated iteratively to minimize the total loss and produce the final stylized image.

Table2 Hyper Parameter set

Hyperpar ameter	Table 2 – Hyper Parameter Set
Hyperpar ameter	Value/Rang e	Purpose	Consideration
Image Size	512 × 512 pixels	Controls input resolution and processing speed	Larger images increase computational cost
Learning Rate	0.01	Step size for Adam optimizer	Can be adjusted for faster or smoother convergence
Total Steps	300	Number of optimization iterations	Higher steps improve quality but slow process
Style Weight	1e5	Balances the influence of style loss	Higher values prioritize style over content
Content Weight	1	Balances the influence of content loss	Lower values preserve less content structure
Optimizer	Adam	Optimization algorithm	Suitable for direct image optimization
Style Layers	conv1_1, conv2_1, conv3_1, conv4_1, conv5_1	Capture style at different scales	Multi-scale style extraction
Content Layer	conv4_2	Captures detailed content structure	Best preserves object integrity
Normaliza tion Mean	[0.485, 0.456, 0.406]	ImageNet mean for input normalization	Standard for pre- trained VGG models
Normaliza tion Std	[0.229, 0.224, 0.225]	ImageNet std for input normalization	Matches the VGG models training distribution

ACKNOWLEDGMENT

We would like to express our sincere gratitude to our project guide for their valuable guidance, continuous support, and encouragement throughout the development of this project titled “Enhancing Neural Style to Multi-Style and Semantic Preservation.” Their expert advice, constructive suggestions, and technical insights helped us understand the concepts of Neural Style Transfer, semantic preservation, and multi-style blending in depth. We are also thankful to the Head of the Department and all faculty members of the Department of Computer Engineering for their constant motivation and academic support during the course of this work.

We extend our heartfelt appreciation to our institution for providing the necessary infrastructure, laboratory facilities, and resources required for the successful completion of this project. We also acknowledge the contributions of researchers whose work in the fields of Deep Learning, Computer Vision, Neural Style Transfer, and Semantic Segmentation served as valuable

references for our study. Finally, we thank our friends, classmates, and family members for their encouragement, cooperation, and support throughout the project, which greatly contributed to its successful completion.

CONCLUSION

The implemented Neural Style Transfer system effectively transfers artistic styles onto content images while preserving structural integrity. By using the VGG19 network as a fixed feature extractor, the framework accurately captures both content and style representations. The combination of content and style loss functions ensures balanced stylization. The iterative optimization process successfully refines the generated image over multiple steps. Hyperparameters such as style weight, content weight, and learning rate significantly influence the stylization quality. The system shows flexibility for various styles and can be adapted for real-time applications with proper tuning. Additionally, the approach maintains semantic details of objects during the style transfer process. The Gram matrix-based style representation provides reliable texture matching. The architecture can be further extended to multi-style blending and video style transfer. Overall, this work lays a strong foundation for future enhancements in semantic-aware and region-specific style transfer systems

REFERENCES

Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 24142423.
Chen, D., Liao, J., Yuan, L., Yu, N., & Hua, G. (2017). Coherent online video style transfer. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 11051114.
Huang, H., Zhang, H., Zhang, Y., Li, P., & Lin, W. (2019). Style mixer: Semantic-aware multistyle transfer network. IEEE Transactions on Image Processing, 28(12), 59605971.
Sechenov, A., Chebotar, Y., & Lempitsky, V. (2021). Depth-aware neural style transfers for videos. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16601669.
Hua, Z., & Zhang, Y. (2021). Multi-attention network for arbitrary style transfer. International Conference on Neural Information Processing (ICONIP), 147158.
Zhang, M., Yang, X., Xu, Y., Yan, L., & Gao, X. (2022). CAST:

Contrastive arbitrary style transfer. ACM Transactions on Graphics (TOG), 41(4), 110.
Guo, X., & Hao, X. (2021). 3S-Net: Arbitrary semantic-aware style transfer with controllable ROI. 2021 IEEE International Conference on Image Processing (ICIP), 27832787.
Hu, R., Zhao, L., Zhou, T., Wu, D., & Zhang, L. (2022). Latent style: Multi-style image transfer via latent style coding and skip connections. Signal, Image and Video Processing, 16(6), 1467 1475.
Huo, Y., Gao, M., Li, W., & Qiao, Y. (2020). MAST: Manifold alignment for semantically aligned style transfer. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 44624471.
Zhao, S., Gallo, O., Frosio, I., & Kautz, J. (2017). Automatic semantic style transfer using deep convolutional neural networks and soft masks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 41604168.
Huo, Y., Gao, M., Li, W., & Qiao, Y. (2020). MAST: Manifold alignment for semantically aligned style transfer. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 44624471.
Yao, Y., Hu, Y., Zhang, C., Wang, Z., Wang, B., & Fu, Y. (2019).

Attention-aware multistroke style transfer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1467 1475.
Xiang, T., Feng, W., Liu, D., & Zhang, W. (2023). NCCNet: Arbitrary neural style transfer with multi-channel conversion. International Conference on Image and Graphics (ICIG), 140151.