DOI : 10.5281/zenodo.20846567
- Open Access

- Authors : Shikha Dwivedi, Om Shinde, Gaurav Sunil Taralkar, Rohit Thube, Tanmay Sinare
- Paper ID : IJERTV15IS061018
- Volume & Issue : Volume 15, Issue 06 , June – 2026
- Published (First Online): 25-06-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Enhancing Neural Style To Multi-Style and Semantic Preservation
Shikha Dwivedi
Department of Computer Engineering JSPMs Jaywantrao Sawant College of Engineering, Hadapsar, Pune
Gaurav Sunil Taralkar
Department of Computer Engineering JSPMs Jaywantrao Sawant College of Engineering, Hadapsar, Pune
Om Shinde
Department of Computer Engineering JSPM,s Jayawantrao Sawant College of Engineering, Hadapsar, Pune.
Rohit Thube
Department of Computer Engineering JSPMs Jaywantrao Sawant College of Engineering, Hadapsar, Pune
Tanmay Sinare
Department of Computer Engineering JSPMs Jayw Sawant College of Engineering, Hadapsar, Pune
ABSTRACT
Neural Style Transfer (NST) is a deep learning technique that combines the content of one image with the artistic style of another image to generate visually appealing results. Traditional NST methods primarily focus on single style transfer and often fail to preserve important semantic information, resulting in distortion of object structures and image details. This paper presents an enhanced Neural Style Transfer framework that supports multi style blending while maintaining semantic consistency. The proposed approach integrates semantic segmentation and attention mechanisms to apply different artistic styles to semantically relevant regions of an image. A multi style fusion module is used to combine multiple style representations in a balanced and coherent manner. The framework utilizes a pre trained VGG19 network for feature extraction and optimization of content and style features. Experimental results demonstrate that the proposed system produces high quality stylized images with improved content preservation, smoother style transitions, and better semantic integrity. The proposed method offers a flexible and effective solution for artistic image generation and can be applied in digital art, image editing, multimedia design, and creative content generation.
INTRODUCTION
Neural Style Transfer (NST) is a deep learning technique that combines the content of one image with the artistic style of another image to generate visually appealing results. The concept was first introduced by Gatys et al. using deep convolutional neural networks to extract content and style representations from images [1]. Although traditional NST methods produce impressive stylized images, they are generally limited to single-style transfer and often fail to preserve important semantic information. Recent studies have focused on semantic-aware style transfer and multi-style blending to improve content preservation and visual consistency [3], [13]. To address these limitations, this work proposes an enhanced neural style transfer framework that integrates multi-style fusion and semantic preservation. By utilizing semantic segmentation and attention mechanisms, the proposed system applies multiple artistic styles effectively while maintaining the structural integrity of objects and improving the overall visual quality of the generated images [3].
Table1Traditional Techniques Used for Neural Style Transfer
|
Technique |
Application in Neural Style Transfer |
Limitations |
|
Background Subtraction |
Pre-processing step to isolate foreground content for style application |
Ineffective with complex or dynamic backgrounds; sensitive to lighting variations and shadows |
|
Optical Flow |
As in temporal consistency for video style transfer |
Computationally expensive; sensitive to noise and rapid object movements |
|
Kalman Filter |
Predicts object position across frames for consistent style tracking |
Ineffective for non- linear or abrupt style changes; assumes linear motion and Gaussian noise |
|
Mean-Shift / CAMShift |
Supports region-based tracking for localized style application |
Struggles with background color similarity; prone to failure under occlusion |
|
Deep Learning (CNN-based) |
Extracts hierarchical features for detailed style and semantic understanding |
Requires large datasets; computationally INTENSIVE; MAY NOT SUPPORT REAL-TIME PROCESSING |
|
YOLO / SSD (REAL-TIME DETECTORS) |
Detects semantic regions for targeted multi-style transfer |
Limited precision at object boundaries; FOCUSES ON BOUNDING BOXES RATHER THAN DETAILED CONTOURS |
|
Siamese Network Trackers |
Matches semantic features across frames for style CONTINUITY |
Performs well for short durations; less effective under long occlusions or APPEARANCE CHANGES |
|
Transformer- Based Models |
Ensures long-range semantic consistency and multi-style blending |
High computational cost; real-time application remains challenging |
|
Multi-Object Tracking (MOT) |
Manages semantic regions when multiple objects are present for consistent style application |
Identity switching in dense scenes; highly dependent on initial detection accuracy |
introduced Adaptive Attention Normalization to establish fine grained correspondence between content and style features, resulting in improved stylization quality and content preservation [1]. StyTr² employed a transformer based architecture to capture long range dependencies and enhance semantic consistency during style transfer [2]. Style Mixer proposed a semantic aware multi style transfer framework that automatically applies different artistic styles to semantically meaningful regions, achieving better style diversity and smoother transitions [3].
Recent studies have focused on semantic preservation and attention mechanisms. MAST aligned content and style features based on semantic manifolds to improve structural fidelity [9]. CAST utilized contrastive learning to learn richer style representations and reduce visual artifacts [8]. Zhao et al. introduced semantic masks to guide the style transfer process, preserving semantic boundaries and reducing distortions [13]. Although these methods improve stylization quality, challenges such as efficient multi style fusion, semantic consistency, and real time performance still remain. Therefore, this work proposes an enhanced neural style transfer framework that integrates multi style blending and semantic preservation to generate visually coherent and semantically meaningful stylized images.
Architecture Diagram
-
Input Layer (Image Loading and Preprocessing)
The system takes a content image and a style image as input. Both images are resized, converted into tensors, and normalized to make them compatible with the VGG19 network.
-
Feature Extraction Layer
LITERATURE REVIEW
Several researchers have proposed different approaches to improve Neural Style Transfer (NST) by enhancing style quality and preserving content information. AdaAttN
A pre-trained VGG19 model is used to extract content and style features from different convolutional layers. These features capture the structure and texture information of the images.
-
Style and Content Loss Computation Layer
Content loss measures how well the generated image preserves the original content, while style loss measures the similarity to the style image. Both losses are combned to calculate the total loss.
-
Optimization Layer
The generated image is optimized using the Adam optimizer. Pixel values are updated iteratively to minimize the total loss and produce the final stylized image.
Table2 Hyper Parameter set
|
Hyperpar ameter |
Table 2 – Hyper Parameter Set |
||
|
Value/Rang e |
Purpose |
Consideration |
|
|
Image Size |
512 × 512 pixels |
Controls input resolution and processing speed |
Larger images increase computational cost |
|
Learning Rate |
0.01 |
Step size for Adam optimizer |
Can be adjusted for faster or smoother convergence |
|
Total Steps |
300 |
Number of optimization iterations |
Higher steps improve quality but slow process |
|
Style Weight |
1e5 |
Balances the influence of style loss |
Higher values prioritize style over content |
|
Content Weight |
1 |
Balances the influence of content loss |
Lower values preserve less content structure |
|
Optimizer |
Adam |
Optimization algorithm |
Suitable for direct image optimization |
|
Style Layers |
conv1_1, conv2_1, conv3_1, conv4_1, conv5_1 |
Capture style at different scales |
Multi-scale style extraction |
|
Content Layer |
conv4_2 |
Captures detailed content structure |
Best preserves object integrity |
|
Normaliza tion Mean |
[0.485,
0.456, 0.406] |
ImageNet mean for input normalization |
Standard for pre- trained VGG models |
|
Normaliza tion Std |
[0.229,
0.224, 0.225] |
ImageNet std for input normalization |
Matches the VGG models training distribution |
ACKNOWLEDGMENT
We would like to express our sincere gratitude to our project guide for their valuable guidance, continuous support, and encouragement throughout the development of this project titled “Enhancing Neural Style to Multi-Style and Semantic Preservation.” Their expert advice, constructive suggestions, and technical insights helped us understand the concepts of Neural Style Transfer, semantic preservation, and multi-style blending in depth. We are also thankful to the Head of the Department and all faculty members of the Department of Computer Engineering for their constant motivation and academic support during the course of this work.
We extend our heartfelt appreciation to our institution for providing the necessary infrastructure, laboratory facilities, and resources required for the successful completion of this project. We also acknowledge the contributions of researchers whose work in the fields of Deep Learning, Computer Vision, Neural Style Transfer, and Semantic Segmentation served as valuable
references for our study. Finally, we thank our friends, classmates, and family members for their encouragement, cooperation, and support throughout the project, which greatly contributed to its successful completion.
CONCLUSION
The implemented Neural Style Transfer system effectively transfers artistic styles onto content images while preserving structural integrity. By using the VGG19 network as a fixed feature extractor, the framework accurately captures both content and style representations. The combination of content and style loss functions ensures balanced stylization. The iterative optimization process successfully refines the generated image over multiple steps. Hyperparameters such as style weight, content weight, and learning rate significantly influence the stylization quality. The system shows flexibility for various styles and can be adapted for real-time applications with proper tuning. Additionally, the approach maintains semantic details of objects during the style transfer process. The Gram matrix-based style representation provides reliable texture matching. The architecture can be further extended to multi-style blending and video style transfer. Overall, this work lays a strong foundation for future enhancements in semantic-aware and region-specific style transfer systems
REFERENCES
-
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 24142423.
-
Chen, D., Liao, J., Yuan, L., Yu, N., & Hua, G. (2017). Coherent online video style transfer. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 11051114.
-
Huang, H., Zhang, H., Zhang, Y., Li, P., & Lin, W. (2019). Style mixer: Semantic-aware multistyle transfer network. IEEE Transactions on Image Processing, 28(12), 59605971.
-
Sechenov, A., Chebotar, Y., & Lempitsky, V. (2021). Depth-aware neural style transfers for videos. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16601669.
-
Hua, Z., & Zhang, Y. (2021). Multi-attention network for arbitrary style transfer. International Conference on Neural Information Processing (ICONIP), 147158.
-
Zhang, M., Yang, X., Xu, Y., Yan, L., & Gao, X. (2022). CAST:
Contrastive arbitrary style transfer. ACM Transactions on Graphics (TOG), 41(4), 110.
-
Guo, X., & Hao, X. (2021). 3S-Net: Arbitrary semantic-aware style transfer with controllable ROI. 2021 IEEE International Conference on Image Processing (ICIP), 27832787.
-
Hu, R., Zhao, L., Zhou, T., Wu, D., & Zhang, L. (2022). Latent style: Multi-style image transfer via latent style coding and skip connections. Signal, Image and Video Processing, 16(6), 1467 1475.
-
Huo, Y., Gao, M., Li, W., & Qiao, Y. (2020). MAST: Manifold alignment for semantically aligned style transfer. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 44624471.
-
Zhao, S., Gallo, O., Frosio, I., & Kautz, J. (2017). Automatic semantic style transfer using deep convolutional neural networks and soft masks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 41604168.
-
Huo, Y., Gao, M., Li, W., & Qiao, Y. (2020). MAST: Manifold alignment for semantically aligned style transfer. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 44624471.
-
Yao, Y., Hu, Y., Zhang, C., Wang, Z., Wang, B., & Fu, Y. (2019).
Attention-aware multistroke style transfer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1467 1475.
-
Xiang, T., Feng, W., Liu, D., & Zhang, W. (2023). NCCNet: Arbitrary neural style transfer with multi-channel conversion. International Conference on Image and Graphics (ICIG), 140151.
