Sketch to Code - A Multi-Modal AI Framework for Automated Conversion of UI Sketches into Functional Web Code

doi:10.5281/zenodo.19591214

Volume 15, Issue 04 (April 2026)

Sketch to Code – A Multi-Modal AI Framework for Automated Conversion of UI Sketches into Functional Web Code

DOI : 10.5281/zenodo.19591214

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 89
Authors : Mr. Mohammed Amaan Shaikh, Mr. Aadiish Shukla, Mr. Aditya Mishra, Mr. Aaditya Devghare, Prof. Smita Dandge
Paper ID : IJERTV15IS040614
Volume & Issue : Volume 15, Issue 04 , April – 2026
Published (First Online): 15-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Sketch to Code – A Multi-Modal AI Framework for Automated Conversion of UI Sketches into Functional Web Code

Mr. Mohammed Amaan Shaikh, Mr. Aadiish Shukla, Mr. Aditya Mishra, Mr. Aaditya Devghare, Prof. Smita Dandge

Department of Computer Engineering Thakur Shyamnarayan Engineering College Mumbai, India

Abstract – The translation of hand-drawn user interface (UI) sketches into executable front-end code represents a persistent challenge in modern software engineering, requiring specialized expertise and considerable manual effort. This paper presents SketchToCode, a multi-modal artificial intelligence framework that automates the conversion of UI sketch images into separated, production-ready HTML, CSS, and JavaScript code. The proposed system integrates two complementary inference engines: a cloud-based pipeline leveraging the Google Gemini 2.5 Flash vision-language model for high-fidelity, instruction-guided generation via structured prompting, and a locally deployable sequence-to- sequence model combining a ResNet-18 convolutional encoder with a two-layer LSTM decoder for offline, privacy-preserving inference. The framework supports multi- image stitching, enabling designers to compose composite layouts from distinct header, body, and footer sketches. A client-side image compression pipeline reduces typical payloads from several megabytes to under 200 kilobytes, minimizing API latency. A React-based web interface, paired with cross-platform desktop (Tauri) and mobile (Capacitor/Android) deployment targets, delivers a complete application from a single codebase. The generated code is rendered in a sandboxed iframe for real-time preview and displayed in tabbed code editors for inspection. Persistent user history is managed through a Supabase PostgreSQL backend. Experimental evaluation demonstrates that the structured-prompting strategy reliably produces well-formed, separated code artifacts from diverse sketch inputs within acceptable response times. SketchToCode demonstrates that modern multi-modal AI can substantially reduce the design-to- code gap, offering a practical tool for rapid prototyping and accessible web development.

Keywords – sketch-to-code; multi-modal AI; Gemini API; CNN- LSTM; UI code generation; code automation; front-end prototyping; image-to-code; ResNet; sequence-to-sequence; cross- platform development; rapid prototyping.

INTRODUCTION

The gap between a designers conceptual sketch and a deployable UI implementation has long been a bottleneck in software development pipelines. Prototyping tools such as Figma and Adobe XD have partially bridged this divide, yet they still demand that a trained engineer manually translate visual designs into structured code. For small teams, rapid- prototyping environments, and non-technical stakeholders, this

translation overhead can slow iteration cycles significantly [1].

Recent advances in multi-modal large language models (LLMs) and deep learning-based image understanding have opened new avenues for automating this process. Vision- language models capable of jointly reasoning over image pixels and natural language instructions can, in principle, interpret the structural intent of a sketch and emit syntactically correct Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript (JS) in a single inference pass [2]. Large language models such as Googles Gemini family [3] and OpenAIs GPT-4V [4] can accept image inputs alongside natural language instructions and produce structured outputs including source code. Concurrently, sequence-to-sequence architectures combining convolutional neural networks (CNNs) with recurrent decoders have been applied to the specific task of generating markup from UI screenshots, achieving promising results on constrained domains [5].

Despite these advances, existing approaches suffer from several limitations. Cloud-based LLM solutions, while powerful, depend on internet connectivity and raise data privacy concerns. Local deep learning models trained on limited datasets may lack generalization capacity. Rule- based approaches require structured or domain-specific input formats. Earlier neural models (e.g., pix2code [5]) are constrained to narrow DSL-driven UI grammars and do not scale gracefully to free-form sketches. Contemporary LLM- based tools typically generate monolithic HTML blobs rather than maintainable, separated code artifacts. Few systems offer a unified, cross-platform deployment strategy that brings sketch-to-code capabilities to web, desktop, and mobile environments simultaneously [6].

This paper presents SketchToCode, an end-to-end multi- modal AI framework designed to address these gaps. The system accepts one or more UI sketch images optionally enriched by natural language refinement instructionsand produces structured, separated front-end code. A dual inference architecture accommodates both high-throughput cloud-based

generation via the Gemini 2.5 Flash API and an offline-capable local inference path built on a purpose-trained CNN-LSTM model. The application is deployed across web, desktop, and mobile platforms using React, Tauri, and Capacitor, respectively, with user session history persisted through a Supabase backend.

The remainder of this paper is organized as follows. Section II formalizes the problem. Section III states the system objectives. Section IV surveys related work. Section V describes the proposed system. Sections VI and VII detail the architecture and implementation. Section VIII enumerates the technology stack. Section IX discusses testing and validation. Section X presents results and discussion. Sections XI and XII address limitations and future scope. Section XIII concludes the paper.
PROBLEM STATEMENT

The conversion of a UI sketch to functional code is a non- trivial multi-step process. In current practice, a front-end engineer must (i) interpret the spatial layout of a hand-drawn or digitally sketched prototype, (ii) select appropriate HTML structural elements, (iii) author CSS rules that approximate the visual intent, and (iv) attach any required JavaScript behavior. Each of these steps is error-prone, time- consuming, and depends on domain-specific knowledge unavailable to many stakeholders.

Several specific challenges compound this problem:
1. Interpretation Ambiguity: Hand-drawn sketches are inherently imprecise. Developers must infer the intended component types (e.g., distinguishing a text input from a dropdown), spatial relationships, and responsive behavior from rough visual cues.
2. Code Separation: Professional web development mandates separation of concernsstructure (HTML),
  
  presentation (CSS), and behavior (JavaScript) should reside in distinct files or blocks. Automated tools that output monolithic inline-styled HTML fail to meet this standard.
3. Multi-Section Layouts: Real-world web pages are composed of multiple distinct sections (header, navigation, body content, footer). Users need the ability to specify these sections independently and have them stitched into a cohesive whole.
4. Latency and Privacy: Cloud-based AI services introduce network latency and require transmitting potentially sensitive design data to external servers. A complementary offline inference capability is desirable
  
  for enterprise and air-gapped deployment contexts.
5. Platform Accessibility: Developers and designers work across diverse platformsbrowsers, desktops, and mobile devices. A sketch-to-code tool confined to a single platform limits its practical utility.
SketchToCode is designed to systematically address each of these challenges through its dual-model architecture, multi- image pipeline, and cross-platform deployment strategy.
OBJECTIVES

The principal objectives of the SketchToCode system are enumerated below.
1. O1 Multi-modal Input Handling: Accept single or composite sketch images (header, body, and footer) as system input, with optional natural language refinement instructions.
2. O2 Structured Code Output: Emit separated HTML, CSS, and JavaScript artifacts from each inference request, rather than an undifferentiated HTML monolith.
3. O3 Dual Inference Architecture: Provide a high- throughput cloud inference path (Gemini 2.5 Flash) alongside an offline-capable local inference path (CNN- LSTM), selectable at runtime.
4. O4 Client-Side Image Compression: Build a browser-based compression pipeline that reduces image payloads without significantly degrading visual information, minimizing API latency.
5. O5 Sandboxed Preview: Render generated code inside a sandboxed iframe, providing an immediate visual preview without exposing the host application to script- injection risks.
6. O6 Cross-Platform Deployment: Deliver the application on web, desktop (via Tauri), and mobile (via Capacitor for Android) from a single React codebase.
7. O7 Generation History and Authentication: Persist generation history per authenticated user via a Supabase- backed PostgreSQL database, enabling review and retrieval of prior outputs.
8. O8 Real-Time Code Inspection: Provide tabbed views for HTML, CSS, JavaScript, and raw output, enabling developers to inspect and copy generated code segments independently.
RELATED WORK
Vinyals et al. [12] established the canonical CNN encoderLSTM decoder architecture for image captioning in their seminal Show and Tell system. SketchToCode adapts this paradigm to the code-generation domain: a ResNet-18
[13] backbone encodes sketch features, and a two-layer LSTM decoder generates HTML tokens autoregressively. Li et al. [14] demonstrated BLIP-2s bootstrapping approach for vision- language pre-training, informing the broader design space. The present work differs from the above in several key respects: (a) it employs a dual-model architecture combining cloud and local

inference, (b) it generates separated HTML, CSS, and JavaScript rather than monolithic output, (c) it supports multi-image stitching for compositional page layouts, and (d) it provides cross- platform deployment across web, desktop, and mobile.
PROPOSED SYSTEM
1. System Overview
  
  SketchToCode is designed as a modular, layered system comprising four principal components: (1) a client- side image processing pipeline, (2) a cloud-based generative AI service, (3) a locally trained deep learning model, and (4) a cross-platform application shell with persistent storage.
  
  At the topmost layer, a React 18 / TypeScript single- page application provides the user-facing interface for image upload, natural language instruction entry, code preview, and history browsing. Beneath the UI layer, two parallel inference services are available: the primary path delegates to the Gemini 2.5 Flash API, while the secondary path invokes a locally running Flask inference server wrapping the trained CNN-LSTM model.
  
  User sessions and generation history are managed by Supabase, which provides a PostgreSQL-backed relational store and a GoTrue-powered authentication service. All generated code is persisted as a JSON string containing the separated HTML, CSS, and JavaScript fields, alongside the users prompt and a server-side timestamp.
2. Inference Strategy Selection
  
  At runtime, the application routes sketch inputs to the Gemini API by default. If the local Flask server is detected (i.e., responsive at the configured endpoint), the user may optionally select the CNN-LSTM inference path. This dual- path design ensures that the system degrades gracefully in environments with restricted external network access, a requirement common in enterprise and educational deployment contexts.
3. Key Design Decisions
  1. Separated Code Output: Unlike most existing tools that produce monolithic HTML with inline styles, SketchToCode enforces separation of concerns by instructing the AI to return a structured JSON object with distinct “html”, “css”, and “javascript” fields.
  2. Client-Side Compression: Images are resized to a maximum dimension of 800 pixels and compressed to 50% JPEG quality before transmission, reducing typical payloads from several megabytes to under 200 kilobytes without significantly degrading the visual information necessary for code generation.
SYSTEM ARCHITECTURE AND WORKFLOW
Generated code is displayed in a tabbed panel offering five views: Preview, HTML, CSS, JS, and Raw. The Preview tab renders the assembled code inside a sandboxed iframe (sandbox=”allow-scripts”) to provide an immediate visual approximation of the generated interface. A robust JSON recovery routine strips markdown fencing (e.g., “`json … “`) from model responses before parsing, ensuring that output is correctly interpreted even when the model prepends

explanatory prose.
IMPLEMENTATION DETAILS
1. Image Compression Module
  
  The compressImage() function constructs an off-screen HTMLCanvasElement, draws the source image at the computed maximum-dimension scale, and invokes canvas.toDataURL(‘image/jpeg’, 0.5) to produce a compressed base64 artifact. This client-side preprocessing step reduces typical image payloads from several megabytes to well under 200 kilobytes, yielding a measurable reduction in round-trip latency without perceptible loss of structural information in sketch inputs.
2. CNN Encoder (ResNet-18)
  
  The vision encoder is based on a ResNet-18 [13] backbone pretrained on ImageNet-1K. The original 1,000- class classification head is replaced with a Linear(512, 256) projection layer that maps the pooled feature vector into a 256-dimensional embedding space shared with the LSTM decoder. Layers 0 through 5 of the ResNet backbone are frozen during fine-tuning to preserve general low-level and mid-level visual features, while layers 6 and above, together with the projection head, are trained end-to-end on the sketch-to-code dataset.
  
  Fig. 4. CNN+LSTM Model Class Diagram
  
  This UML class diagram details the internal architecture of the SketchToCodeModel class. The diagram shows two primary components: (1) the CNNEncoder class wrapping a pretrained ResNet-18 backbone with Linear(512, 256) projection, and (2) the LSTMDecoder class containing nn.Embedding(vocab_size, 256), a 2-layer LSTM with 512 hidden units, and a fully connected output layer. The
  
  init_hidden() method projects CNN features through separate linear layers to initialize h and c. The Tokenizer utility class manages the hybrid character/tag-level vocabulary with special tokens (<PAD>, <START>,
  
  <END>, <UNK>).
3. LSTM Decoder
  
  The decoder consists of a token embedding layer (nn.Embedding(vocab_size, 256)), followed by a two-layer LSTM with 512 hidden units per layer. The CNN feature vector initializes both the hidden state (h) and the cell state (c) of the LSTM via learned linear projections, enabling the decoder to condition every generated token on the full visual context of the input sketch. Generation proceeds via greedy decoding: at each time step, the token with the highest softmax probability is selected and fed as input to the subsequent step, continuing until the special <END> token is produced or the maximum sequence length is reached.
4. Tokenization Strategy
  
  A hybrid character/tag-level tokenizer is employed.
  
  Common HTML tags (e.g., <div>, <span>, <p>, <button>,
  
  <input>, <ul>, <li>, <a>, <img?) are treated as atomic single tokens, substantially reducing sequence lengths relative to pure character-level tokenization and improving decoder accuracy on structurally significant boundaries. Four special tokens are defined: <PAD> (index 0), <START> (index 1),
  
  <END> (index 2), and <UNK> (index 3) for out-of- vocabulary characters. This hybrid strategy reduces the average sequence length compared to pure character-level tokenization, improving both training efficiency and generation accuracy.
5. Database Schema
  
  Generation history is stored in a Supabase PostgreSQL table (generations) with the following schema: id (BigInt, primary key, identity sequence); user_id (UUID, foreign key to auth.users); code (Text, JSON string containing html, css, and javascript fields); prompt (Text, the users natural language refinement instruction, nullable); and created_at (Timestamp, defaulting to now()). Row-level security (RLS) policies restrict each users access to their own rows.

TOOLS AND TECHNOLOGIES

Table I summarizes the principal tools and technologies employed in the SketchToCode system.

TABLE I: TOOLS AND TECHNOLOGIES EMPLOYED IN SKETCHTOCODE

Category	Technology / Tool	p>Version / Notes
UI Framework	React (TypeScript)	v18 / Vite 5
Styling	Tailwind CSS + PostCSS	v3.4
Component Library	Shadcn UI (Radix UI)	Radix primitives
Icons	Lucide React	Latest
State / Data	TanStack Query v5	Server state caching
Routing	React Router DOM	v6
Forms	React Hook Form + Zod	Schema validation
Notifications	Sonner	Toast notifications
Primary AI	Google Gemini 2.5 Flash	Via REST API
Local AI Framework	PyTorch + Torchvision	ResNet-18 backbone
Local Inference Server	Flask + Flask- CORS	REST endpoint
Image Processing	Pillow + NumPy	Python 3.10+
Model Hub	Hugging Face Hub	Model distribution
Database / Auth	Supabase (PostgreSQL + GoTrue)	Cloud-hosted
Desktop Target	Tauri (Rust bridge)	Bundle: com.sketch-to- code.app
Mobile Target	Capacitor 8 (Android)	App: com.sketchtoco.app

TESTING AND VALIDATION

The SketchToCode system was evaluated through a combination of functional testing, integration testing, security validation, and qualitative assessment of generated output.
The CNN-LSTM local model was evaluated qualitatively on held-out sketch samples following training. Greedy-decoded output sequences were inspected for syntactic validity and semantic correspondence with the input sketch structure. The ResNet-18 encoders pretrained visual features proved well-suited to sketch inputs despite the domain shift from natural photographs, consistent with prior observations in the transfer learning literature [13].
RESULTS AND DISCUSSION
1. Code Generation Quality
  
  The primary Gemini-based inference path demonstrated consistently reliable structured output generation. The Gemini
  
  2.5 Flash model produced syntactically valid HTML5, CSS3, and ES6 JavaScript from sketch inputs in the majority of test cases. The enforced JSON output schema ensured clean separation of concerns. The structured prompting strategy instructing the model to respond exclusively with a JSON object containing html, css, and javascript keys proved substantially more effective than free-form prompting for downstream parsing. The JSON recovery routine successfully handled all observed non- conforming model responses, including responses wrapped in markdown code blocks and those containing a brief natural language preamble preceding the JSON object.
2. Compression and Latency
  
  The client-side image compression pipeline achieved an average payload reduction of approximately 7085% across test images, reducing typical payloads from several megabytes to under 200 kilobytes, with a median compression time of under 200ms per image in the browser. The combination of client-side compression and parallel multi-image dispatch yielded responsive end-to-end latency for typical sketch inputs. The 120-second timeout guard provided a reliable safety margin against intermittent API delays, with typical Gemini API response times ranging from 8 to 30 seconds depending on sketch complexity and network conditions.
3. Local Model Performance
  
  The hybrid tokenization strategy employed in the local CNN-LSTM model reduced effective sequence lengths compared to character-level baselines, improving decoder accuracy on tag boundaries. The CNN+LSTM model demonstrated the ability to generate basic HTML structures from simple sketch inputs. However, as is consistent with the inherent limitations of small-scale sequence-to-sequence models, the local models output fidelity was substantially lower than that of the Gemini cloud model, particularly for complex layouts and detailed styling. The local model is best positioned as a rapid prototyping tool for offline environments or privacy-sensitive contexts.
4. Multi-Image Stitching and Cross-Platform Deployment
  
  The multi-image stitching capability enabled users to compose composite page layouts from independently drawn header, body, and footer components, a workflow pattern not
  
  supported by prior sketch-to-code systems. This composability substantially increases the practical utility of the tool for realistic page-level design scenarios. The shared React codebase was successfully deployed across all three target platforms (web, desktop, mobile) with minimal platform- specific adaptations.
LIMITATIONS

Several limitations of the current system are acknowledged:
FUTURE SCOPE

Several directions for future development are identified:
1. Beam Search Decoding: Replacing the greedy decoding strategy in the local CNN-LSTM decoder with beam
  
  search or nucleus sampling is expected to improve output diversity and reduce systematic decoding errors.
2. Spatial Attention Mechanisms: Incorporating attention mechanisms into the local modelanalogous to those described in [15]would allow the decoder to focus on specific sketch regions when generating corresponding code segments, improving positional accuracy.
3. Iterative Refinement: Extending the Gemini inference path to support iterative refinement dialogues would allow users to correct specific regions of the generated code through targeted natural language instructions without re-submitting the full sketch.
4. Framework-Specific Output: Extending the system to generate code in popular frontend frameworks (React JSX, Vue SFC, Angular templates) rather than vanilla HTML/CSS/JS would increase practical utility.
5. Component Detection Pipeline: Integrating an object detection model (e.g., YOLO or Faster R-CNN) as a preprocessing step could enable explicit identification and classification of UI components before code generation.
6. Quantitative Benchmarking: Developing a benchmark comprising annotated sketch-code pairs with ground-truth HTML/CSS/JS and automated structural similarity metrics would enable reproducible comparison with future systems.
7. Fine-Tuned Open-Source VLM: Fine-tuning an open- source VLM (e.g., LLaVA or Qwen-VL) on a curated sketch-to-code dataset could yield a fully offline inference path competitive with the Gemini API in output quality.
8. Accessibility Compliance: Integrating automated accessibility linting (e.g., axe-core) into the generation pipeline and training the model to produce WCAG- compliant markup.
9. Expanded Platform Coverage: Extending support to iOS via Capacitor, as well as a browser extension interface, would broaden accessibility.
CONCLUSION

This paper has presented SketchToCode, a multi-modal AI system for automated conversion of hand-drawn UI sketches into separated, production-ready HTML, CSS, and JavaScript code. The system addresses the longstanding prototyping-to-implementation gap in software engineering by combining a cloud-based vision-language model inference path with a locally deployable CNN-LSTM sequence-to- sequence model, unified beneath a cross- platform React- based interface.

The structured prompting strategy, client-side image compression pipeline, and robust JSON recovery mechanism collectively ensure reliable, parseable output from the Gemini API under diverse sketch inputs and model response formats. The dual inference architecture provides a meaningful degree of deployment flexibility, accommodating both cloud- connected and air-gapped operational environments.

SketchToCode represents a practical step toward the broader vision of low-barrier, AI-assisted front-end development, and establishes a foundation upon which future work including attention-based local decoding, iterative refinement dialogues, and quantitative benchmark development can build.

ACKNOWLEDGMENT

The authors would like to thank the open-source communities behind React, PyTorch, Supabase, Tauri, and Capacitor for maintaining the foundational tools upon which this system is built. The authors would like to express their sincere gratitude to Mr. Kashif Sheikh & Mrs. Smita Dandge for their valuable guidance, support, and encouragement throughout the development of this project. Their insights and mentorship played a crucial role in the successful completion of this work.

REFERENCES

A. Moran, C. Bernal-Cárdenas, M. Curcio, R. Bonett, and D. Poshyvanyk, “Machine learning-based prototyping of graphical user interfaces for mobile apps,” IEEE Trans. Software Eng., vol. 46, no. 2,

pp. 196221, 2020.
J. Li, D. Li, S. Savarese, and S. Hoi, “BLIP-2: Bootstrapping language- image pre-training with frozen image encoders and large language models,” in Proc. ICML, 2023, pp. 1973019742.
G. Team et al., “Gemini: A Family of Highly Capable Multimodal Models,” arXiv preprint arXiv:2312.11805, 2023.
OpenAI, “GPT-4 Technical Report,” arXiv preprint arXiv:2303.08774, 2023.
T. Beltramelli, “pix2code: Generating Code from a Graphical User Interface Screenshot,” in Proc. ACM SIGCHI Symp. EICS, Paris, France, 2018, pp. 3:13:6.
C. Chen, T. Su, G. Meng, Z. Xing, and Y. Liu, “From UI design image to GUI skeleton: A neural machine translator to bootstrap mobile GUI implementation,” in Proc. IEEE/ACM ICSE, 2018, pp. 665676.
Microsoft Research, “SketcpCode: Transforming Hand-Drawn Designs into HTML using Deep Learning,” Microsoft AI Blog, 2018. [Online]. Available:https://www.microsoft.com/en/research/blog/sketcpcode.
T. A. Nguyen and C. Csallner, “Reverse Engineering Mobile Application User Interfaces with REMAUI,” in Proc. 30th IEEE/ACM ASE, Lincoln, NE, USA, 2015, pp. 248259.
X. Chen et al., “Towards Complete Icon Labeling in Mobile Applications,” in Proc. CHI, New Orleans, LA, USA, 2022, pp. 114.
C. Si, C. Zhang, T. Li, and D. Ramesh, “Design2Code: How far are we from automating front-end engineering?,” arXiv preprint arXiv:2403.03163, 2024.
A. Wu et al., “Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach,” arXiv preprint arXiv:2406.16386, 2024.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and Tell: A Neural Image Caption Generator,” in Proc. IEEE CVPR, Boston, MA, USA, 2015, pp. 31563164.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE CVPR, Las Vegas, NV, USA, 2016, pp. 770 778.
S. Hochreiter and J. Scmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 17351780, 1997.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge,” IEEE Trans. PAMI, vol. 39, no. 4, pp. 652663, Apr. 2017.
A. Vaswani et al., “Attention is all you need,” in Proc. NeurIPS, 2017,

pp. 59986008.
Supabase Inc., “Supabase Documentation,” 2024. [Online]. Available: https://supabase.com/docs

Manuscript received 08/04/2026. This work was conducted as part of a mini- project in the Department of Computer Engineering under the academic supervision of Mr. Kashif Sheikh & Mrs. Smita Dandge.

Volume 15, Issue 04 (April 2026)

Sketch to Code – A Multi-Modal AI Framework for Automated Conversion of UI Sketches into Functional Web Code

Sketch to Code – A Multi-Modal AI Framework for Automated Conversion of UI Sketches into Functional Web Code

Fig. 1. Unified Modelling Diagram (UML)

Fig. 2. System Architecture Diagram

Fig. 3. System Workflow Sequence Diagram

Fig. 4. CNN+LSTM Model Class Diagram