DOI : https://doi.org/10.5281/zenodo.19788684
- Open Access
- Authors : Harsh Mahesh Antarkar, Pranav Karve, Ankit Yadav, Krishna Patil
- Paper ID : IJERTV15IS042180
- Volume & Issue : Volume 15, Issue 04 , April – 2026
- Published (First Online): 26-04-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
AdMeme: An Agentic AI Framework for Automated Meme Advertisement Generation and Virality Prediction using Vision-Language Models
Harsh Mahesh Antarkar, Pranav Karve, Ankit Yadav, Krishna Patil
Department of Articial Intelligence and Data Science Vivekanand Education Societys Institute of Technology Mumbai, India
AbstractThis paper presents AdMeme, an agentic articial intelligence framework designed for automated generation and evaluation of meme-based advertisements. The system integrates retrieval-augmented generation, workow orchestration, and multimodal reasoning to enable scalable and context-aware meme creation.
The framework operates on a curated dataset of approximately 10,000 meme images collected from publicly available sources. To evaluate the effectiveness of generated memes, a hybrid virality prediction model is proposed, combining psychological engagement principles, visual memorability features, and content diffusion characteristics.
Unlike conventional approaches, the proposed system provides end-to-end automation, including caption generation, template rendering, semantic evaluation, and virality scoring. Qualitative evaluation demonstrates that the system is capable of producing coherent, contextually relevant, and marketing-oriented meme content. The results highlight the potential of agentic multimodal AI systems in digital marketing applications.
-
Introduction
The rise of social media platforms has signicantly trans- formed the way information is created and shared. Among various forms of digital content, memes have emerged as a highly effective medium due to their simplicity, humor, and ability to convey messages rapidly.
In the context of marketing, memes serve as a powerful tool for engaging audiences, increasing brand visibility, and promoting viral dissemination of content. However, creating effective memes is a complex process that requires creativity, cultural awareness, and an understanding of evolving trends.
Traditional meme generation relies heavily on human in- tervention, making it difcult to scale content production for large-scale marketing campaigns. Additionally, evaluating the effectiveness of memes in terms of engagement and virality remains a challenging task.
To address these limitations, this paper introduces AdMeme, an agentic AI framework that automates the process of meme generation, evaluation, and ranking. The system leverages advancements in large language models and vision-language models to understand and generate multimodal content.
A key innovation of this work is the integration of a structured virality prediction model that captures multiple di- mensions of meme effectiveness, including emotional impact, visual distinctiveness, and audience alignment. By combining
these elements into a unied pipeline, the proposed system aims to provide a scalable solution for meme-based advertis- ing.
-
Contributions
The primary contributions of this work are summarized as follows:
-
Design and implementation of an end-to-end automated meme advertisement generation system.
-
Integration of multimodal reasoning using a ne-tuned Llama 3.2 Vision-Instruct model.
-
Development of a hybrid virality prediction model incor- porating 13 engagement-related parameters.
-
Deployment of a real-world pipeline using n8n workow automation, Ollama-based LLMs, and external APIs.
-
Qualitative evaluation demonstrating the feasibility of automated meme creation for marketing applications.
-
-
Related Work
Early research in meme analysis primarily focused on uni- modal approaches. Text-based models such as BERT demon- strated strong capabilities in sentiment analysis and language understanding, while image-based models such as convolu- tional neural networks were effective in object recognition tasks.
However, these approaches fail to capture the interplay between textual and visual elements, which is essential for understanding meme semantics. Memes often rely on the interaction between image context and textual overlays to convey meaning, making unimodal approaches insufcient.
To address this limitation, multimodal models such as CLIP and VisualBERT were introduced. These models learn joint representations of image and text, enabling improved cross- modal understanding. Despite these advancements, early- fusion architectures often struggle with complex semantic rela- tionships, particularly in cases involving sarcasm or contextual ambiguity.
Recent advancements in vision-language models have intro- duced more sophisticated mechanisms for integrating visual and textual information. Instruction-tuned models allow for contextual reasoning and interpretation of multimodal inputs,
making them suitable for tasks such as meme generation and evaluation.
The AdMeme framework builds upon these developments by combining multimodal reasoning with workow automa- tion and structured virality modeling.
-
Methodology
-
Dataset
A multimodal dataset consisting of approximately 10,000 meme images was constructed using publicly available sources. The dataset includes a wide variety of meme tem- plates and captions, covering different humor styles and con- textual scenarios.
To ensure consistency, all images were resized to a xed resolution and annotations were standardized. The dataset was split into training and testing subsets using an 80:20 ratio, enabling evaluation on unseen samples.
-
System Implementation Pipeline
The AdMeme framework is implemented as a modular pipeline that integrates language models, workow orchestra- tion, and external APIs.
Conceptualization Phase: The system utilizes a large lan- guage model to generate context-aware meme captions based on input prompts such as product descriptions and campaign objectives. Multiple candidate captions are generated to pro- vide diversity in outputs.
Data Transformation Phase: The generated captions are processed to ensure compatibility with meme templates. This involves mapping textual content to template-specic regions and validating input structure.
Rendering Phase: The processed captions are passed to an image rendering API, which embeds the text into predened meme templates to generate nal images.
Post-Processing and Enrichment: Generated meme out- puts are ltered to remove unsuccessful or low-quality results. Valid outputs are prepared for further evaluation using multi- modal models.
Virality Scoring: Each generated meme is evaluated using a structured scoring mechanism that considers contextual relevance, readability, and template effectiveness.
Vs = (Cw · R)+ (Tscore · 0.4) + (Fmatch · 0.2) (1)
-
Virality Prediction Model
To estimate meme virality, a hybrid model is proposed that integrates psychological engagement principles with visual and contextual features.
V = w1SC + w2TR + w3EM + w4PB + w5PV + w6ST
+ w7IM + w8VD + w9OP + w10SU
+ w11AA + w12CN + w13SP
(2)
This formulation captures multiple dimensions inuencing meme propagation, including emotional impact, visual distinc- tiveness, and audience alignment.
-
-
Training Configuration
The multimodal model is ne-tuned using a parameter efcient LoRA approach applied to the Llama 3.2 Vision- Instruct model. The use of LoRA enables efcient adaptation of large models without modifying all parameters.
Training is conducted using a learning rate of 2 × 104, a
batch size of 4, and 600 training steps. Gradient accumulation is used to improve training stability, while mixed precision training is employed to optimize computational efciency.
-
Experimental Results
-
System-Level Evaluation
The system was evaluated using real-world input scenarios to assess its ability to generate relevant and coherent meme advertisements. The pipeline successfully produces multiple outputs aligned with input context.
-
Qualitative Results
The generated memes demonstrate strong alignment be- tween visual templates and textual captions. This indicates that the model effectively captures multimodal relationships required for meme understanding.
Fig. 1. Sample generated meme demonstrating contextual relevance and alignment between visual and textual elements.
-
Discussion of Results
The results highlight the capability of the system to generate context-aware meme content. The integration of virality scor- ing provides additional interpretability by enabling ranking of generated outputs.
-
-
Discussion and Limitations
The AdMeme framework demonstrates the feasibility of automated meme generation using multimodal AI systems. However, meme interpretation is inuenced by cultural context and rapidly evolving trends, which remain challenging for automated models.
Additionally, the virality prediction model relies on heuristic parameters and does not incorporate real-world engagement data, which may affect prediction accuracy.
-
Conclusion and Future Scope
This paper presented AdMeme, an agentic AI framework for automated meme advertisement generation and evaluation. The system integrates multimodal reasoning, workow automation, and structured virality modeling to enable scalable meme creation.
The qualitative evaluation demonstrates the effectiveness of the proposed approach in generating relevant and engaging meme content. Future work will focus on incorporating real- world engagement data, expanding datasets, and extending the framework to support video-based memes.
References
-
A. Radford et al., Learning Transferable Visual Models, ICML, 2021.
-
D. Kiela et al., Hateful Memes Challenge, NeurIPS, 2020.
-
E. Hu et al., LoRA, ICLR, 2022.
-
J. Devlin et al., BERT, NAACL, 2019.
-
K. He et al., ResNet, CVPR, 2016.
-
J. Berger, Contagious, 2013.
-
A. Khosla et al., Image Memorability, ICCV, 2015.
-
D. Watts, Global Cascades, 2008.
