DOI : https://doi.org/10.5281/zenodo.20084728
- Open Access
- Authors : Ms. Gouthami, K. Saanvi, M. Divya Bharathi, Arekanti Mercy, Chavan Supriya, Nenavath Sreelatha
- Paper ID : IJERTV15IS050480
- Volume & Issue : Volume 15, Issue 05 , May – 2026
- Published (First Online): 08-05-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Ad Genie: A Multimodal Generative AI Framework for Automated Marketing Campaign Creation Using Product Images, Textual Prompts, and Web Intelligence
Ms. Gouthami
Project Guide, Assistant Professor
K. Saanvi
Author, Student (UG Scholar)
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Keshav Memorial Institute of Technology Hyderabad, Telangana, India
M. Divya Bharathi
Co-Author, Student (UG Scholar)
Keshav Memorial Institute of Technology Hyderabad, Telangana, India
Arekanti Mercy
Co-Author, Student (UG Scholar)
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Keshav Memorial Institute of Technology Hyderabad, Telangana, India
Chavan Supriya
Co-Author, Student (UG Scholar)
Keshav Memorial Institute of Technology Hyderabad, Telangana, India
Nenavath Sreelatha
Co-Author, Student (UG Scholar)
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Keshav Memorial Institute of Technology Hyderabad, Telangana, India
Abstract
Digital marketing requires product understanding, cus-tomer insight, competitive awareness, creative writing, and platform-specic communication. For small businesses, in-dependent sellers, student entrepreneurs, freelancers, and inuencers, producing effective campaigns is difcult be-cause it demands both creativity and continuous market re-search. Existing AI copywriting tools can generate pro-motional text, but many depend mainly on text prompts, produce generic outputs, and do not fully incorporate vi-sual product cues or real-time market context. This paper presents Ad Genie, a multimodal generative AI framework for automated marketing campaign creation. The proposed system accepts a product image and a campaign or product description as input, extracts visual and semantic features using vision-language models, generates search queries for market intelligence, retrieves trend and review-oriented in-formation from online sources, and produces structured campaign assets using a large language model. The gen-erated outputs include social media posts, a blog concept, a short promotional video script, target audience persona, market trends, sentiment summary, and structured interme-diate results. The prototype demonstrates how multimodal AI, natural language processing, computer vision, retrieval-augmented generation, and web intelligence can be inte-grated into a unied workow for context-aware marketing assistance. The work contributes a practical architecture for AI-driven campaign automation and identies future direc-tions such as multilingual generation, brand voice learning,
Keshav Memorial Institute of Technology Hyderabad, Telangana, India
automatic publishing, analytics, and AI-assisted video pro-duction.
Keywords: Multimodal AI, Generative AI, Digital Market-ing, Web Intelligence, Vision-Language Models, Retrieval-Augmented Generation, Campaign Automation, Customer Persona, Content Strategy.
-
Introduction
Digital marketing has become a central component of mod-ern commerce. Online sellers, social media creators, small businesses, and startups depend on product visibility across e-commerce marketplaces, short-video platforms, social networks, blogs, and search engines. A products success is inuenced not only by quality but also by how effec-tively it is presented to the intended audience. Product images, captions, customer reviews, hashtags, blog narra-tives, short videos, and inuencer-style messages collec-tively shape purchase decisions.
Creating high-quality marketing content is a multidisci-plinary task. It requires product interpretation, knowledge of customer psychology, awareness of market trends, com-petitor analysis, platform-specic writing ability, and cre-ative storytelling. Large companies often rely on dedicated marketing teams, analytics tools, and creative agencies. In contrast, small sellers and independent creators frequently lack the resources and expertise needed to conduct system-atic market research and produce polished campaign ma-terial. Their promotional content may therefore become
generic, inconsistent, or poorly aligned with customer ex-pectations.
Recent progress in large language models (LLMs) has made it possible to generate uent marketing copy from nat-ural language prompts [11, 9, 10]. At the same time, vision-language models have improved the ability of AI systems to interpret images and connect visual information with text [3, 4, 5, 6]. Retrieval-augmented generation and web in-telligence techniques further allow AI systems to ground their outputs in external information such as reviews, trends, news, and competitor activity [7]. These developments cre-ate an opportunity to move beyond simple text generation and toward complete, context-aware campaign generation.
This paper proposes Ad Genie, a multimodal AI agent for intelligent campaign creation. The system accepts two primary inputs: a product image and a textual campaign de-scription. It analyzes the product visually and semantically, retrieves relevant market information from online sources, extracts customer and trend insights, and generates a struc-tured campaign package. The current prototype demon-strates this workow through a web interface where users upload a product image, describe a campaign goal, and re-ceive organized outputs including social media posts, a blog concept, a video script, audience persona, and market anal-ysis.
The central research question addressed in this paper is:
How can multimodal product understanding and web intelligence be integrated with generative AI to automate the creation of context-aware digital marketing campaigns?
The main contributions of this work are:
-
A unied multimodal campaign generation pipeline that combines product image understanding, textual cam-paign intent, web intelligence, and generative AI.
-
A modular system architecture consisting of input vali-dation, multimodal feature extraction, query generation, market insight extraction, audience strategy generation, and content rendering.
-
A practical prototype that demonstrates campaign cre-ation from a product image and campaign topic through a user-friendly web interface.
-
A structured output design that produces promotional copy, market trends, persona insights, blog concepts, video scripts, and machine-readable intermediate results.
-
A research-oriented evaluation framework for compar-ing multimodal web-grounded generation with text-only LLM prompting and manual campaign drafting.
-
-
Background and Related Work
-
Generative AI in Digital Marketing
Generative AI has rapidly entered the marketing domain be-cause LLMs can produce uent text, summarize informa-tion, rewrite content in different tones, and generate cre-ative ideas [2, 11, 12]. Marketing applications include ad copywriting, product descriptions, email campaigns, blog outlines, customer support responses, and social media cap-tions. LLMs are particularly useful for reducing the time required for brainstorming and rst-draft creation.
Despite these advantages, prompt-based content genera-tion has limitations. If an LLM receives only a short product description, it may generate content that sounds polished but lacks product-specic detail, current market conext, or au-dience precision. The generated text may also hallucinate unsupported claims. For marketing use cases, unsupported claims can mislead customers or damage brand trust. There-fore, marketing generation systems benet from grounding mechanisms that connect generated content to product at-tributes and external evidence.
-
Multimodal AI and Vision-Language Models
Product marketing is naturally multimodal. A product im-age communicates color, shape, aesthetic style, material, usage context, and emotional tone. A product description communicates functional details, intended use cases, tech-nical specications, and brand messaging. A campaign gen-eration system that uses only text misses visual cues that are important for creating relevant content.
Vision-language models such as CLIP, BLIP, LLaVA, and GPT-4o-style multimodal systems have shown that visual and textual information can be represented and reasoned about jointly [3, 4, 5, 12]. CLIP aligns images and text in a shared embedding space, BLIP supports image caption-ing and vision-language understanding, and LLaVA-style systems connect visual encoders with language models to enable visual question answering and multimodal reason-ing. These models make it possible to extract product-level attributes such as dominant colors, product category, style, use case, and visual mood.
Prior work on image advertisements also shows that vi-sual information can improve the interpretation of advertis-ing symbolism and creative intent [15]. This supports the design choice of treating product images as rst-class inputs rather than optional decoration.
In Ad Genie, multimodal analysis is used to convert a product image and text prompt into a richer product repre-sentation. This representation informs both market query generation and campaign content generation.
-
Web Intelligence and Retrieval-Augmented Generation
Marketing content must be sensitive to current trends. Cus-tomer preferences, competitor positioning, seasonal de-mands, and social media discussions change frequently. Static model knowledge alone is not sufcient for trend-aware marketing. Retrieval-augmented generation ad-dresses this issue by combining generative models with ex-ternal information retrieval [7].
For marketing applications, web intelligence may include search results, product reviews, frequently asked questions, competitor listings, news articles, social media trends, and video review transcripts. This information can be analyzed for sentiment, pain points, keywords, and emerging cus-tomer needs. A web-grounded campaign system can then generate content that is more relevant than a purely prompt-based system.
-
Sentiment Analysis, Personas, and Strat-egy
Effective campaigns require an understanding of target au-diences. Customer personas summarize demographic, psy-chographic, behavioral, and motivational characteristics of potential buyers [13, 14]. Sentiment analysis identies whether customer discussions are positive, negative, or neu-tral [8]. Keyword extraction and trend mining identify the language customers use when searching for or discussing products.
In Ad Genie, the system generates an audience persona using product semantics, market trends, and campaign in-tent. The persona includes likely age group, user inter-ests, needs, pain points, psychographics, and recommended channels. This makes the generated output more strategic than simple ad copy.
-
Connection to SOMONITOR
The base paper used for this project, SOMONITOR, presents a framework for marketing analytics using explain-able AI, CTR prediction, LLM-based content pillar extrac-tion, persona mining, communication theme mining, and data-driven story generation [1]. SOMONITOR demon-strates how LLMs can support marketing workows by pro-cessing large amounts of advertising content, identifying au-dience segments, and creating actionable content briefs.
Ad Genie is conceptually related to SOMONITOR but differs in its primary objective. SOMONITOR focuses on monitoring and analyzing existing marketing content and campaign performance. Ad Genie focuses on generating a new campaign package from product-level input. In this sense, Ad Genie adapts the broader idea of AI-assisted marketing intelligence into a product-centered, multimodal campaign generation workow.
-
-
State-of-the-Art Positioning
The current state of the art in AI-assisted marketing is shaped by four converging research directions: large lan-guage models for natural language generation, vision-language models for multimodal product understanding, retrieval-augmented generation for grounding responses in external knowledge, and explainable marketing analytics for converting data into actionable strategy. Ad Genie is posi-tioned at the intersection of these directions.
Table 1 summarizes how the proposed framework dif-fers from adjacent approaches. Text-only LLM tools are fast and useful for rst drafts, but they depend heavily on prompt quality and may miss product-specic visual sig-nals. Vision-language models can describe product ap-pearance, but they do not independently produce a com-plete marketing strategy. Retrieval-augmented generation grounds outputs in external sources, but it must be con-nected to domain-specic insight extraction to become use-ful for marketing. SOMONITOR and related explainable advertising systems focus on analyzing existing campaigns and competitor content. Ad Genie combines these streams into a product-level campaign generation workow.
-
Research Gap
Although AI marketing tools and LLM-based copywriting systems are increasingly available, several gaps remain:
-
Many systems are text-only and do not use product im-ages for campaign creation.
-
Many generated outputs are generic because they are not grounded in real-time market context.
-
Most tools generate isolated pieces of content rather than a complete campaign strategy.
-
Existing tools may not produce explicit audience per-sonas, pain points, market trends, and recommended channels.
-
Small sellers require simple end-to-end workows rather than separate tools for research, analysis, writing, and formatting.
-
Pure LLM systems may hallucinate claims when no ex-ternal grounding is provided.
-
Marketing analytics frameworks often analyze existing campaign data but do not directly generate ready-to-use product campaign assets.
Ad Genie addresses these gaps by integrating multi-modal product understanding, web intelligence, strategic in-sight extraction, and structured generative output in a single workow.
-
-
Proposed System
Ad Genie is designed as a modular AI framework for auto-mated campaign creation. The system receives multimodal input, processes it through specialized modules, and pro-duces both human-readable and machine-readable outputs.
-
System Objectives
The major objectives are:
-
Automate the generation of digital marketing campaign assets.
-
Combine product image understanding with textual product or campaign descriptions.
-
Retrieve relevant market information from online sources.
-
Extract trends, keywords, sentiment, competitor insights, and audience needs.
-
Generate structured outputs such as social media posts, blog concepts, video scripts, and target personas.
-
Provide a simple interface suitable for non-technical users.
-
-
Target Users
The system is intended for small e-commerce sellers, local businesses, student entrepreneurs, freelancers, digital mar-keters, inuencers, personal-brand builders, and marketing agencies seeking rapid campaign drafts.
-
High-Level Workow
Figure 1 shows the overall workow. The system begins with a user-provided product image and campaign descrip-tion, validates the inputs, extracts visual and semantic fea-
Table 1: State-of-the-art positioning of Ad Genie against adjacent AI marketing approaches.
Approach Primary Capability Limitation for Small-Seller Campaign Creation
Ad Genie Extension
Text-only LLM copywriting Generates uent captions,
blogs, and ad copy from prompts
Vision-language product analysis Extracts visual attributes
and image captions from product photos
Retrieval-augmented generation Grounds generated
responses in external doc-uments or search results
Explainable marketing analytics Analyzes existing cam-
paigns, audiences, and competitor content
Ad Genie full pipeline Integrates image, text, re-trieval, insight extraction, and generation
Often generic; lacks im-age grounding and live market context
Does not automatically generate full campaign strategy
Requires domain-specic retrieval and summariza-tion design
Primarily analytic; not de-signed as a product-to-campaign generator
Requires broader bench-marking and production hardening
Adds product image analysis, market retrieval, persona generation, and structured outputs
Uses visual features as input to query genera-tion, audience strategy, and campaign generation Converts retrieved market signals into trends, senti-ment, keywords, and con-tent angles
Adapts explainable mar-keting insight into an end-to-end campaign creation agent
Provides a unied proto-type for context-aware, product-specic cam-paign drafting
Product image and campaign description
Input validation
Multimodal product understanding
Intelligent query generation
Web intelligence retrieval
Insight extraction
Audience strategy generation
Campaign content generation
Output rendering and export
-
ported, the uploaded le is not corrupted, the text input is not empty, and the system has enough information to per-form analysis.
Figure 1: End-to-end workow of the proposed Ad Genie frame-work.
tures, retrieves web intelligence, generates insights, and pro-duces campaign-ready assets.
-
Methodology
-
Input Layer
The input layer accepts a product image and a textual cam-paign or product description. The image may be uploaded in standard formats such as JPG, JPEG, PNG, or WEBP. The text description captures the users campaign goal or product context. For example, a user may enter: gifting a coffee mug to a friend. The system may also provide an optional manual image description eld when an image is not available or when the vision model is disabled.
The input layer validates that the image format is sup-
-
Multimodal Product Understanding
The multimodal module analyzes both image and text. The vision component extracts product category, dominant col-ors, physical design, aesthetic style, mood, and usage con-text. The text component extracts campaign theme, sub-themes, product benets, target use case, brand tone, and constraints such as affordability, energy, elegance, sustain-ability, or gift suitability.
The outputs from both components are fused into a com-pact structured representation, as shown in Listing 1.
{
"topic_semantics": { "main_theme": "gifting",
"subthemes": ["friendship", "coffee", "mug"], "brand_tone": "modern"
},
"visual_aesthetic": { "colors": ["white", "gray"],
"style_keywords": ["minimalist", "clean"], "mood": "invigorating"
}
}
Listing 1: Example structured representation produced by the multimodal module.
-
Intelligent Query Generation
After product understanding, the system generates search queries for market intelligence. These queries are derived from the product category, campaign theme, customer use case, and visual attributes. For a coffee mug gift exam-ple, possible queries include personalized coffee mug gift trends, best gifts for coffee lovers, unique gifts for friends, coffee mug customer reviews, and trending personalized gift ideas.
Query generation improves retrieval relevance because the system searches for market-specic context rather than relying on the users original prompt alone.
-
Web Intelligence Retrieval
The web intelligence module retrieves data from online sources such as search engines, review pages, news sources, social platforms, and product listings. Depending on imple-mentation and API availability, sources may include search engine results, e-commerce reviews, product FAQs, social media discussions, news articles, and YouTube review con-tent or transcripts.
The retrieved data is cleaned before analysis. Cleaning may include removing HTML tags, duplicate results, irrele-vant links, advertisements, stopwords, and noisy text.
-
Market Insight Extraction
The insight extraction module transforms retrieved data into actionable marketing intelligence. It identies market trends, customer pain points, common product expectations, customer sentiment, competitor strengths and weaknesses, SEO keywords, hashtags, and recommended content angles. For the prototype coffee mug case, the system identied trends such as personalized gifts, experiential gifts, and sub-scription services. The market sentiment was marked as neutral, and the system recognized the gift as thoughtful but
potentially not unique enough for some recipients.
-
Audience Strategy Generation
The audience strategy module generates a target persona and recommended channels. A persona may include a name, demographic prole, psychographic characteristics, needs and pain points, buying motivations, and recom-mended platforms. For the coffee mug example, the proto-type generated a persona named Coffee-Loving Friends, with demographics such as young adults aged 1835, ur-ban dwellers, coffee enthusiasts, and professionals. The rec-ommended channels included Instagram, TikTok, Pinterest, and Facebook Groups.
-
Campaign Content Generation
The nal generative module produces campaign-ready as-sets. The system generates three social media posts or tweets, a blog title and outline, a blog concept summary, a short video script with scenes and voiceover cues, a struc-tured campaign summary, and JSON output for debugging, export, or integration.
The content generator uses the structured product repre-sentation and market insights as input. This approach re-duces generic output by grounding the LLM in product-specic and market-specic context.
-
-
System Architecture
The system follows a modular architecture. The major com-ponents are:
Table 2: Module-wise technology mapping for the proposed sys-tem.
Module Representative Technologies
User interface Streamlit prototype; extensible to
React-based frontend
Vision analysis BLIP, LLaVA, CLIP-style vision-language models
Text strategy LLaMA/GPT-style large language
models
Web intelligence DuckDuckGo, Serper API, Bing
Search API, review/news sources Insight extraction Sentiment analysis, keyword exrac-
tion, trend mining
Output rendering Tabbed UI, JSON data view, tex-t/PDF export
User Interface
Input Validator
Multimodal Engine
Web Intelligence Module
Insight Extractor
Content Generator
Output Renderer
Figure 2: Modular architecture of Ad Genie.
-
Client/UI Layer: Provides elds for campaign topic, product image upload, optional manual image descrip-tion, model settings, and result display.
-
Input Validator: Checks le type, input completeness, and basic constraints.
-
Multimodal Engine: Uses vision-language models to extract visual and semantic information.
-
Query Generator: Produces search queries based on the product representation.
-
Web Intelligence Module: Retrieves market data from online sources.
-
NLP Insight Extractor: Performs sentiment analy-sis, keyword extraction, trend detection, and competitor summarization.
-
Content Generator: Uses LLMs to produce campaign assets.
-
Output Renderer: Displays the generated insights in structured tabs and provides export options.
-
Deployment View
The prototype is demonstrated as a local web applica-tion. The interface shown in the project screenshots runs at localhost:8501, which indicates a Streamlit-based prototype. The broader architecture can also be extended to a production environment using a separate frontend, back-end API layer, cloud GPU runtime, and external APIs.
A scalable deployment can consist of a client browser, web server or backend API, AI inference runtime, search and scraping APIs, external data sources, and output storage
or export layer.
-
Prototype Implementation
-
User Interface
The prototype provides a dark-themed web interface titled Ad Genie: Where Marketing Meets Magic. The sidebar includes model settings and indicates the active technolo-gies: LLaMA-3-8B for text strategy, BLIP/LLaVA for im-age analysis, and DuckDuckGo for market trends.
The main interface includes campaign topic input, prod-uct image upload, manual image description, generate but-ton, and result tabs for market analysis, audience strategy, creative content, and data view.
-
Product Example Used in Prototype
The demonstrated prototype uses the campaign topic gift-ing a coffee mug to a friend and a product image showing a white and black Lazy Panda coffee mug. The uploaded image contains a white mug with a black handle, panda il-lustration, and a small panda gure on the lid. The visual design suggests a cute, friendly, and gift-oriented product.
-
Prototype Output
The system generated visual palette attributes such as white and gray colors, minimalist and clean style, and an invigo-rating mood. It identied the core theme as gifting, with subthemes of friendship, coffee, and mug. It recognized a modern brand voice, market trends such as personalized gifts and experiential gifts, and neutral market sentiment. It also generated the persona Coffee-Loving Friends, rec-ommended channels such as Instagram, TikTok, Pinterest, and Facebook Groups, and produced social media posts, a blog concept, video script, and structured pipeline output.
-
Example Generated Content
The prototype generated post drafts such as:
-
Fuel their friendship with a personalized coffee mug! #coffee #giftideas
-
Want to make a lasting impression? Try gifting an expe-rience like a coffee-tasting tour! #experientialgifts #cof-fee
-
Ready to upgrade your gifting game? Discover unique and thoughtful presents that reect your friends inter-ests! #giftinspo #coffee
These examples show that the system does not only describe the mug but also expands the campaign toward broader gift-positioning strategies.
-
-
-
Experimental Design and Evalua-tion Framework
The current project bundle demonstrates a working proto-type and sample outputs. For a full research evaluation, this
paper proposes a structured experimental design. The eval-uation should compare Ad Genie with baseline approaches across multiple product categories.
-
Dataset Design
A small benchmark dataset can be created using product im-ages and descriptions from different categories: personal-ized gift items, wireless earbuds, eco-friendly reusable bot-tles, fashion accessories, skincare products, kitchen appli-ances, home decor items, stationery products, tness acces-sories, and mobile phone accessories. Each sample should include a product image, product title, short product de-scription, intended campaign goal, and human-written ref-erence campaign if available.
-
Baselines
Ad Genie should be compared against four baselines: man-ual campaign drafting, text-only LLM prompting, image captioning followed by LLM generation, and the full Ad Genie pipeline. The full pipeline uses image, text, web in-telligence, insight extraction, and content generation.
-
Evaluation Metrics
The evaluation should include functional, performance, and content-quality metrics.
-
Functional Metrics
Functional metrics include image upload success rate, in-valid input detection, web retrieval success rate, output gen-eration success rate, and export success rate.
-
Performance Metrics
Performance metrics include image analysis time, web re-trieval time, content generation time, end-to-end response time, and failure recovery time.
-
Content Quality Metrics
Human evaluators can rate outputs on a 15 scale using product relevance, creativity, audience t, market aware-ness, platform suitability, clarity, persuasiveness, practical usefulness, factual safety, and overall campaign quality.
-
-
-
Results and Discussion
-
Prototype Observation
The prototype successfully demonstrates the end-to-end concept of Ad Genie. In the coffee mug example, the sys-tem accepted a product image and campaign topic, analyzed visual and semantic attributes, identied market-oriented trends, generated a target persona, and produced structured creative content.
The generated output shows three important capabilities. First, the system performs visual grounding by recogniz-ing color and style cues from the uploaded product image. Second, it performs semantic grounding by connecting the
Table 3: Comparison of Ad Genie with baseline campaign generation approaches.
Method
Image
Web Trends
Persona
Posts
Blog
Video Script
Expected Strength
Manual drafting
Yes
Yes
Yes
Yes
Yes
Yes
High quality but time-
consuming
Text-only LLM
No
Limited
Partial
Yes
Yes
Yes
Fast but often generic
Image caption + LLM
Partial
No
Partial
Yes
Yes
Yes
Better product grounding
than text-only generation
Ad Genie full pipeline
Yes
Yes
Yes
Yes
Yes
Yes
Structured, context-aware,
and product-specic cam-
paign generation
Table 4: Representative test cases for system validation.
ID Module Expected Output
MM-01 Input Valid image and text are ac-cepted
MM-02 Input Corrupted image triggers an er-
ror
MM-03 Input Missing text triggers validation warning
API-01 Retrieval Valid query returns market data API-02 Retrieval Timeout triggers retry or fall-
back
IN-01 Insights Sentiment and keywords are ex-
10.4 Discussion
Ad Genie should be understood as a campaign drafting as-sistant rather than a fully autonomous marketing manager. Its outputs can reduce ideation time and help users create a rst draft of campaign material. Human review remains necessary for brand accuracy, legal safety, factual correct-ness, and nal publishing decisions.
-
-
Ethical, Legal, and Practical Con-siderations
AI-generated marketing content must be handled carefully.
CG-01 CG-02 CG-03 OP-01
tracted
Generation Three posts are generated Generation Blog concept is generated Generation Video script is generated Output Results are displayed in tabs
The system should avoid unsupported product claims, dis-close AI assistance where appropriate, protect uploaded product images, respect API and platform terms, avoid bi-ased persona assumptions, require human approval before publishing, avoid copying protected marketing material, and
campaign topic to themes such as gifting, friendship, cof-fee, and mug usage. Third, it performs strategic expansion by moving from a simple mug gift into broader trends such as personalized gifts and experiential gifts.
This suggests that multimodal AI can improve campaign generation by linking product appearance, campaign intent, and market positioning.
-
Strengths
The main strengths of Ad Genie are its end-to-end workow, multimodal understanding, structured strategy generation, user accessibility, extensibility, and practical relevance. The system combines tasks that are usually separate: product inspection, trend research, audience thinking, and content drafting. Even when the generated content is not nal, it gives users a structured starting point.
-
Limitations
The current system has limitations. Web intelligence qual-ity depends on available search results and external APIs. LLM-generated content may still require human review before publication. Sentiment analysis may be inaccu-rate when retrieved data is noisy or limited. The proto-type screenshots demonstrate one product example; broader evaluation is required. API rate limits and model latency can affect response time. Some product categories may require domain-specic prompts or ne-tuning.
prevent misleading or manipulative advertisements.
These safeguards are especially important if Ad Genie is extended to automatic publishing or paid advertising work-ows.
-
-
Future Enhancements
Future versions of Ad Genie can include multilingual cam-paign generation, seller dashboard integration, automatic content publishing after user approval, AI video genera-tion, brand voice learning, analytics dashboards, A/B test-ing, CTR prediction, price intelligence, and a mobile ap-plication. These extensions would allow Ad Genie to evolve from a campaign drafting assistant into a broader AI-powered digital marketing platform.
-
Conclusion
This paper presented Ad Genie, a multimodal generative AI framework for automated marketing campaign creation. The system integrates product image analysis, textual cam-paign understanding, web intelligence, market insight ex-traction, audience persona generation, and LLM-based con-tent creation. Unlike text-only copywriting tools, Ad Ge-nie uses both visual and semantic product information and grounds campaign generation in market-oriented insights.
The prototype demonstrates a practical workow in which a user uploads a product image, enters a campaign
topic, and receives structured marketing outputs including social media posts, a blog concept, a video script, persona information, and machine-readable intermediate data. The coffee mug case study illustrates how the system can trans-form a simple product and campaign goal into a more com-plete strategy involving gifting themes, audience segments, recommended platforms, and creative content.
Ad Genie is not intended to replace human marketers. Instead, it serves as an AI-assisted campaign drafting and research tool that can reduce manual effort, support small businesses, and provide structured creative direction. With further evaluation, stronger retrieval grounding, multilin-gual support, analytics integration, and human-in-the-loop safeguards, Ad Genie can evolve into a more complete AI-powered digital marketing assistant.
Acknowledgment
The authors thank the Department of Computer Science and Engineering, Keshav Memorial Institute of Technology, and the project guide Ms. Gouthami for their guidance and sup-port.
References
-
A. Farseev, Q. Yang, M. Ongpin, I. Gossoudarev, Y.-Y. Chu-Farseeva, and S. Nikolenko, SOMONI-TOR: Combining Explainable AI and Large Language Models for Marketing Analytics, arXiv:2407.13117, 2024.
-
A. Vaswani et al., Attention Is All You Need, in Advances in Neural Information Processing Systems, 2017.
-
A. Radford et al., Learning Transferable Visual Mod-els From Natural Language Supervision, in Inter-national Conference on Machine Learning, 2021, arXiv:2103.00020.
-
J. Li et al., BLIP: Bootstrapping Language-Image Pre-training for Unied Vision-Language Understand-ing and Generation, in International Conference on Machine Learning, 2022, arXiv:2201.12086.
-
H. Liu et al., Visual Instruction Tuning, arXiv:2304.08485, 2023.
-
L. Baraldi et al., The Revolution of Multimodal Large Language Models: A Survey, in Findings of the Association for Computational Linguistics, 2024, arXiv:2402.12451.
-
P. Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, in Advances in Neural Information Processing Systems, 2020, arXiv:2005.11401.
-
C. Hutto and E. Gilbert, VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text, in International AAAI Conference on Web and Social Media, 2014.
-
S. Makridakis, F. Petropoulos, and Y. Kang, Large Language Models: Their Success and Impact, Fore-casting, vol. 5, no. 3, pp. 536549, 2023.
-
S. Minaee et al., Large Language Models: A Survey,
arXiv preprint, 2024.
-
OpenAI, GPT-4 Technical Report, arXiv preprint, 2023.
-
OpenAI, Hello GPT-4o, OpenAI technical an-nouncement, 2024. [Online]. Available: https:// openai.com/index/hello-gpt-4o/
-
A. Malik, Persona Based Marketing Strategies: Cre-ation of Personas Through Data Analytics, Masters thesis, 2019.
-
D. Pelleg and A. W. Moore, X-Means: Extending -Means with Efcient Estimation of the Number of Clusters, in International Conference on Machine Learning, 2000.
-
A. Savchenko et al., Ad Lingua: Text Classication Improves Symbolism Prediction in Image Advertise-ments, in International Conference on Computational Linguistics, 2020.
