AI-Based Educational Content Analysis and Automatic Assessment Generation using Natural Language Processing and Transformer Models

Dhiraj Arikar; Prof. Sandip Buradkar; Dr. Rahul Nawkhare; Shweta Undirwade

doi:10.17577/IJERTV15IS061215

Volume 15, Issue 06 (June 2026)

AI-Based Educational Content Analysis and Automatic Assessment Generation using Natural Language Processing and Transformer Models

DOI : 10.17577/IJERTV15IS061215

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 0
Authors : Dhiraj Arikar, Prof. Sandip Buradkar, Dr. Rahul Nawkhare, Shweta Undirwade
Paper ID : IJERTV15IS061215
Volume & Issue : Volume 15, Issue 06 , June – 2026
Published (First Online): 05-07-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

AI-Based Educational Content Analysis and Automatic Assessment Generation using Natural Language Processing and Transformer Models

Dhiraj Arikar (1), Prof. Sandip buradkar (2), Dr. Rahul Nawkhare (3)

Shweta undirwade4 Department of Electronics & Telecommunication Engineering,

Wainganga College of Engineering and Management, Nagpur, Maharashtra

Abstract – The increasing availability of digital educational resources has created a growing demand for intelligent systems capable of automatically analyzing educational content and generating assessments. Traditional assessment preparation requires significant manual effort from educators and often lacks scalability. This paper presents an AI-based Educational Content Analysis and Automatic Assessment Generation framework utilizing Natural Language Processing (NLP) and Transformer models. The proposed framework processes educational PDF documents through text extraction, preprocessing, transformer-based summarization, keyword extraction, readability analysis, and automated assessment generation. Experimental evaluation was conducted using Machine Learning educational study materials. The proposed framework achieved an average summary compression ratio of 82.6% while preserving key educational concepts. Keyword analysis identified highly significant educational terms, with occurrence frequencies reaching 391 for Learning, 247 for Data, and 227 for Machine. Readability analysis produced a score of 37.89, indicating high content complexity suitable for advanced learners. Furthermore, the system automatically generated 120 assessment items, including Multiple Choice Questions (MCQs), Fill-in-the-Blank questions, True/False questions, and Blooms Taxonomy-based questions. Expert evaluation indicated that 89.4% of the generated questions were educationally relevant and suitable for academic assessment. The results demonstrate the effectiveness of integrating Transformer-based language models with NLP techniques for intelligent educational content processing and automated assessment generation, thereby reducing educator workload and enhancing learning support.

Keywords – Artificial Intelligence, Natural Language Processing, Transformer Models, Educational Content Analysis, Automatic Assessment Generation, Question Generation, Educational Technology, Blooms Taxonomy.

INTRODUCTION

The education system has undergone a radical transformation with the rapid advancement of AI and digital learning. Educational institutions are increasingly relying on electronic textbooks, lecture notes, online courses, and educational repositories for much teaching and learning activities. The continuous growth of digital educational content has resulted in challenges related to organization, knowledge extraction, assessment preparation etc. As a result, there is an increasing demand for intelligent educational systems that can automatically analyze educational content and generate valuable feedback to assist both teachers and students [1], [2]. Natural Language Processing (NLP) is a major branch of Artificial Intelligence. NLP has emerged as an effective solution to extract useful information from education text. Recent advancements in deep learning, together with the emergence of Transformer architectures, have substantially improved the performance of NLP tasks such as text summarization, keyword extraction, question answering, and automated content generation [2],[3]. Transformer-based models have shown their superior contextual understanding capabilities and are gaining popularity among educational applications like automated content analysis and auto-assessment generation [3], [10], [11].

In educational environments, educators find it one of the most important yet time-consuming tasks. Teachers must manually review the learning resource, identify key concepts, set levels of cognitive difficulty, and create evaluation questions. Effort required by such processes is a lot, also is increasingly challenging to scale in todays digital learning world. Latest research shows that Transformer models and Large Language Models (LLMs) are capable of automatically generating questions, creating cognitive- level evaluative tasks, and providing intelligent educational support [4], [5][8].

Numerous studies have been conducted on AI-based educational question generation. An automated educational question generation framework which can extract useful assessment items from textual learning materials was proposed by Bhowmick et al. [6]. Bulathwela et al. [5] showed that pre-trained language models can be effective for scalable question generation in education. Muse et al. [7] show that domain-specific scientific pre-training will improve the quality and relevancy of the generated questions. In addition, Scaria et al. [9] examined the generation of assessment in bloom taxonomy and use of large language models and found encouraging results in generation of educational question of cognitive level. Such studies show that the generation of assessment with the help of artificial intelligence can help enhance learning as well as evaluation. Even with this development in the educational system, the existing education system takes focuses on individual functionalities. While there has been some investigation within an educational framework that combines educational content extraction, transformer-based summarization, keyword extraction,

readability analysis, and assessment generation in multiple formats, this research is limited [5][9]. As a result, the demand for the educational AI systems that can analyze educational content and generate assessments automatically is continuously growing.

In order to overcome these limitations, a proposal for AI-Based Educational Content Analysis and Automatic Assessment Generation Framework will be presented, based on NLP and Transformer. The proposed framework performs several processes on educational PDFs including extracting texts, preprocessing the texts, summarizing the contents with transformers, extracting keywords, easiness/difficulty level based on readability measure, and automated assessment generation. The framework allows the generation of MCQs, Fill-in-the-Blanks, True/False questions, and Blooms Taxonomy-based questions for overall educational evaluation. The conceptual framework for the proposed research is shown in Fig. 1. The framework illustrates how unstructured informative educational content is transformed into assessment smart content with the help of Artificial Intelligence, Natural Language Processing, and transformer-based learning models. The analysis of educational documents helps extract meaningful knowledge, identify key concepts, and automatically generate assessments to facilitate teaching, learning and evaluation.

The major contributions of this research are summarized as follows:
- Development of an integrated educational content analysis framework using NLP and Transformer models.
- Implementation of transformer-based summarization for educational content compression and concept preservation.
- Automated extraction of educational concepts through keyword analysis.
- Difficulty assessment using readability-based educational content evaluation.
- Automatic generation of MCQs, Fill-in-the-Blank questions, True/False questions, and Blooms Taxonomy-based questions.
- Experimental validation demonstrating the effectiveness of the proposed framework for intelligent educational asessment generation.
Fig.1 Generalized Framework of the Proposed AI-Based Educational Content Analysis and Automatic Assessment Generation System

The organization of the rest of this paper is as follows. Section II presents an analysis of the literature and research gaps. The proposed methodology and system architecture is described in section III. The implementation details and experimental setups are discussed in Section IV. The results and discussion are presented in Section V. In conclusion, this research finding are summarized in the following section.

LITERATURE REVIEW

The integration of Artificial Intelligence (AI), Natural Language Processing (NLP), and Transformer-based models has significantly enhanced educational technologies, particularly in educational content analysis, question generation, and automated assessment systems. Researchers have explored various approaches to improve educational content understanding and automate learning support mechanisms.

Das et al. [4] surveyed the various automatic question generation and answer assessment systems. The study explored existing educational generation assessment techniques and embraced the growing contributions of Artificial Intelligence towards educational assessment. Challenges were noted with regard to the relevance of the content, quality of question and scalability issues. Bhowmick et al. [6] proposed a deep learning-based automated question generation framework for educational texts. Its possible to generate assessment questions from learning resources, their approach shows. Still, the framework mainly just includes question generation and it does not include education content analysis, readability or difficulty. Bulathwela et al. [5] investigated the scalable generation of educational questions using pre-trained language models. The experimental results showed better scalability and quality of questions for educational applications. Nonetheless, the framework focus on question generation and did not have the capacity for content summarization and complexity analysis. The effectiveness of pre-training on a scientific domain in educational question- generation was investigated by Muse et al. [7]. The findings reveal that pre-training on domain-specific data enhances the quality and relevance of generated questions. While these approaches improved the quality of questions but they were only focused on question generating and did not allow comprehensive analysis of educational content.

Awalurahman et al. [8] carried out a systematic review of literature on Transformer and LLM-based multiple-choice questions generation systems. Revealed that TG architectures significantly boost performance of teacher quality generation and educational relevance. Nonetheless, most of the existing systems are focused only on MCQ generation and other formats are neglected like Fill- in-the-Blank, True/False questions, etc. The study by Scaria et al. [9] focuses on generating assessment using LLMs based on Blooms Taxonomy. The study showed that LLMs can generate cognitive level questions from various categories of Bloom’s revised taxonomy. Nonetheless, the proposed method did not incorporate elements such as analysis of educational content complexity, keyword extraction and readability evaluation mechanisms. Transformer-based architectures have made significant contributions to educational systems of NLP. The Transformer architecture developed by Vaswani et al. uses self-attention to model sequences. Developing on this framework, Raffel et al. [10] proposed Text-to-Text Transfer Transformer (T5), whereas Lewis et al. [11] proposed the BART model for sequence-to-sequence text generation tasks. These models have shown remarkable results in summarizing text and generating content and have been widely adopted in NLP research in educational settings. Education NLP applications have advanced significantly, but they still show limitations. The existing systems focus on one particular educational task such as question generation, summarization, cognitive level assessment, etc. Scant research is available on collation of educational content extraction, transformer-based summarization, keyword analysis, readability check and assessment generation of multi-format in one framework. Moreover, prior techniques frequently do not provide support for generating various assessment formats in alignment with educational evaluation requirements [5][9].

Consequently, there arises a requirement for an intelligent educational framework that can amalgamate the analysis of educational content with the generation of assessment automatically. This is done with the help of any modern-day NLP as well as the Transformer model. The system developed in this paper generates educational contents based on text. It provides a comprehensive tool for extracting, summarizing, and generating various types of assessment questions like MCQs, Fill-in-the-Blank questions, and True/False questions. The system also generates questions based on Blooms Taxonomy. Based on our literature survey shown in Table I, it is clear that existing educational NLP systems focus on only one functionality like question generation, summarization, or Bloom-based assessment question generation. Not much research has been conducted on unifying educational content analysis with automated assessment generation technologies. Current methods also do not adequately support readability-based difficulty assessment or multi-format assessment generation.

To address these limitations, the proposed framework combines educational content extraction, Transformer-based summarization, keyword extraction, difficulty analysis, and generation of multiple assessment formats. This integrated approach provides a more comprehensive solution for intelligent educational content processing and assessment generation.

Table 1 Comparison of Existing Educational NLP And Assessment Generation Systems

Ref.

Methodology

Summarizatio n

Keyword Extraction

Difficulty Analysis

MCQ

Generatio n

Bloom’s Taxonomy

Major

Limitation

[4]

Survey on

AQG &

Assessment

Systems

No

Partial

No

Survey only; no implementation framework

[5]	Pre-trained Language Models	No	No	No	Yes	No	Focused mainly on question generation
[6]	Deep Learning- Based AQG	No	No	No	Yes	No	No educational content analysis
[7]	Scientific Text Pre- training	No	No	No	Yes	No	Limited to question generation task
[8]	Transformer + LLM Review	No	No	No	Yes	Partial	Focused mainly on MCQ generation
[9]	LLM-Based Bloom Assessment	No	No	No	Yes	Yes	No readability or keyword analysis

PROPOSED METHODOLOGY

Heres an alternate way to write this sentence.

The proposed AI-Based Educational Content Analysis and Automatic Assessment Generation framework is presented in this section The framework ues NLP and Transformer models to automatically analyze educational content and generate diverse assessment items of different categories. The complete design of the proposed system is shown in Fig. The framework that is proposed includes six modules.
1. Extraction of Educational Content.
2. Cleaning Data.
3. Summarization Through transformers.
4. Keyword Extraction.
5. Analysis of Difficulty
6. Creating assessments automatically.
The system takes educational PDF documents as input and operates them with a series of NLP-based operations. Transformer architectures are utilized to summarize the obtained educational content by maintaining the major ideas and minimizing the content length The next process is keyword extraction for important educational concepts and terminologies. Subsequently, readability analysis is performed to determine the level of difficulty of the material. Through this processed content, various assessment items can be generated including Multiple Choice Questions or MCQs, fill in the Blank Items, True/False Items, and most importantly, Assessment Items based upon Blooms Taxonomy.
1. Educational Content Extraction
  
  Educational study materials are provided in PDF format. The content extraction module converts PDF documents into machine- readable textual content using document parsing techniques. This stage serves as the foundation for subsequent NLP processing tasks.
2. Text Preprocessing
  
  The extracted text is preprocessed to improve content quality and remove noise. The preprocessing stage includes:
  - Tokenization
  - Stop-word removal
  - Text normalization
  - Sentence segmentation
  - Removal of special characters
    
    The resulting clean text is used for summarization and educational content analysis.
3. Transformer-Based Summarization
  
  Transformer models are employed to generate concise summaries of educational content. The summarization process reduces the size of the original document while preserving important educational concepts and learning objectives.
  
  The compression ratio is calculated as:
  
  Word will automatically render:
  
  = × 100 (1)
  
  where:
  - = length of the original document (words)
  - = length of the generated summary (words) Example Calculation (Using Your Results) Original Word Count = 8000
  Summary Word Count = 1392
  
  =
  
  CR=82.6%
  
  8000 1392
  
  8000
  
  × 100
  
  Experimental evaluation achieved an average compression ratio of 82.6%.
4. Keyword Extraction
  
  Keyword extraction is performed to identify significant educational concepts and domain-specific terminology. Frequency-based NLP techniques are used to calculate keyword importance.
  
  For a keyword Ki:
  
  Frequency (Ki) = Number of occurrences of Ki in the document
  
  Keywords with higher occurrence frequencies are considered educationally significant concepts.
  
  Fig 2. Architecture of the proposed AI-based educational content analysis and automatic assessment generation framework
5. Difficulty Analysis
  
  Difficulty analysis evaluates the complexity of educational content using readability metrics. The readability score is calculated using:
  
  Readability Score = f (Sentence Length, Vocabulary Complexity ) (2)
  
  The generated readability score is used to classify content into:
  - Easy
  - Medium
  - Hard
    
    The experimental results produced a readability score of 37.89, indicating advanced educational content complexity.
6. Automatic Assessment Generation
  
  The assessment generation module utilizes educational concepts extracted from the content analysis phase to automatically generate assessment items.The generated assessment formats include:
  - Multiple Choice Questions (MCQs)
  - Fill-in-the-Blank Questions
  - True/False Questions
  - Blooms Taxonomy-Based Questions
The generated questions are designed to evaluate conceptual understanding and cognitive learning outcomes at multiple educational levels. Algorithm 1 summarizes the proposed framework.

Algorithm 1: AI-Based Educational Content Analysis and Assessment Generation Input:

Educational PDF Document Output:

Summary, Keywords, Difficulty Level, Assessment Items Step 1: Extract text from PDF document.

Step 2: Perform preprocessing and text cleaning.

Step 3: Generate educational summary using Transformer model. Step 4: Extract significant educational keywords.

Step 5: Compute readability score and difficulty level. Step 6: Generate MCQs.

Step 7: Generate Fill-in-the-Blank questions. Step 8: Generate True/False questions.

Step 9: Generate Blooms Taxonomy-based questions. Step 10: Store generated assessment items.

Return generated educational assessments.

EXPERIMENTAL SETUP AND IMPLEMENTATION DETAILS

Dataset Description

The proposed framework was evaluated using educational study materials related to Machine Learning. The dataset consisted of educational PDF documents containing theoretical concepts, definitions, examples, algorithms, and explanatory content. The documents were obtained from academic teaching materials and lecture notes commonly used in undergraduate engineering courses. The educational documents served as input to the proposed framework for content extraction, summarization, keyword extraction, readability analysis, and assessment generation.

Software Environment

The proposed framework was implemented using Python programming language within the Jupyter Notebook environment.

Table 2. Software Configuration

Component	Specification
Programming Language	Python 3.11
Development Environment	Jupyter Notebook
NLP Library	NLTK
Transformer Library	Hugging Face Transformers
PDF Processing	PyPDF
Translation Library	Deep Translator

Text-to-Speech	gTTS
Data Analysis	Pandas
Visualization	Matplotlib

Hardware Environment

Table 3. Hardware Configuration

Component

Specification

Processor

Intel Core i5/i7

RAM

8 GB

Storage

512 GB SSD

Operating System

Windows 11

GPU

Optional
Experimental Workflow

The experimental workflow consisted of the following stages:

The generated outputs were evaluated based on educational relevance, content coverage, and assessment quality.
Evaluation Metrics

The following evaluation metrics were used:
- Summar Compression Ratio
- Keyword Frequency Analysis
- Readability Score
- Number of Generated Assessments
- Educational Relevance Score
= × 100 (3)

( ) = ()

(4)

=1

=

=1 ()

(5)

= × 100 (6)

RESULTS AND DISCUSSION
- Summarization Performance
  
  The summarization module generates short texts while preserving major educational concepts. The summaries generated reduced a lot of the length of the document and made the content more accessible. The results suggest that the proposed framework can effectively compress educational content while retaining essential information for learning and assessment generation.
  
  Table 4 Summarization Performance
  
  Parameter
  
  Value
  
  Original Word Count
  
  8,000
  
  Summary Word Count
  
  1,392
  
  Compression Ratio
  
  82.6%
- Keyword Extraction Results
  
  The keyword extraction module successfully identified significant educational concepts from Machine Learning study materials.
  
  Fig.4 Top Extracted Keywords with Frequency Table 5 Top Extracted Keywords
  
  Keyword
  
  Frequency
  
  Learning
  
  391
  
  Data
  
  247
  
  Machine
  
  227
  
  Model
  
  203
  
  Training
  
  157
  
  Algorithm
  
  108
  
  Hypothesis
  
  101
  
  Classification
  
  93
  
  Set
  
  87
  
  The extracted keywords correspond closely to the primary concepts present within the educational content, demonstrating effective concept identification.
- Difficulty Analysis
  
  The readability analysis module was utilized to estimate educational content complexity.
  
  Table 6 Readability Analysis Results
  
  Metric
  
  Value
  
  Readability Score
  
  37.89
  
  Difficulty Level
  
  Hard
  
  The generated readability score indicates that the analyzed educational content is suitable for advanced learners and higher education students.
- Assessment Generation Results
  
  The proposed framework successfully generated multiple categories of assessment items.
  
  Table 7 Generated Assessment Items
  
  Assessment Type
  
  Generated Items
  
  MCQs
  
  30
  
  Fill-in-the-Blanks
  
  30
  
  True/False Questions
  
  30
  
  Bloom’s Taxonomy Questions
  
  30
  
  Total
  
  120
  
  The generated assessments covered multiple cognitive levels and supported comprehensive educational evaluation.
  
  Fig.5. Distribution of Generated Assessment Item
- Educational Relevance Evaluation
  
  Educational experts evaluated the generated assessment items based on correctness, relevance, and educational usefulness.
  
  Table 8 Educational Relevance Analysis
  
  Parameter
  
  Value
  
  Generated Questions
  
  120
  
  Relevant Questions
  
  107
  
  Relevance Score
  
  89.4%
  
  The results demonstrate that the generated assessments are educationally meaningful and suitable for academic use.
  
  Fig.6. Overall Performance Metrics of the Proposed Framework
- Discussion
Machine Learning education study material was used for experimental evaluation. The average summary compression ratio of 82.6% was achieved by the proposed framework retaining core educational concepts. The analysis of keyword extraction showed the most significant educational keywords were Learning (391), Data (247), and Machine (227). Readability analysis produced a score of 37.89, meaning advanced-level content. Also, the system was able to generate 120 assessment items through various categories where an expert evaluation found the total generated questions had an educational value and suitable for the academic test at 89.4%.

COMPARATIVE ANALYSIS

To evaluate the effectiveness of the proposed framework, a comparative analysis was performed against the existing educational content analysis and assessment generation systems available in the literature.The study compares the main functionality of the tools in regard to summarizing educational content, extracting keywords, determining difficulty level, generating automatic questions and generating question assessment based on blooms taxonomy. The information in Table 9 clearly shows that the existing systems are primarily focused on a specific functionality such as question generation or Blooms Taxonomy-based assessment generation.

In contrast, the framework suggested by us effectively combines Transformer-based summarization, keyword extraction, readability- based difficulty analysis, and multi-format question generation for a comprehensive educational content analysis and assessment generation solution. In addition, the proposed framework achieved educational relevance and an average educational relevance score of 89.4% as well as generated 120 assessment items across four assessment categories.

Table 9 Comparison of Existing Educational Content Analysis and Assessment Generation Frameworks

Features	Das et al. [4]	Bulathw ela et al. [5]	Bhowmick et al. [6]	Muse et al. [7]	Awalurahma n et al. [8]	Scaria et al. [9]	Proposed Framework
Educational Content Analysis	No	Partial	No	No	No	No	Yes
Transformer- Based Summarization	No	No	No	No	No	No	Yes
Keyword Extraction	No	No	No	No	No	No	Yes
Difficulty Analysis	No	No	No	No	No	No	Yes
MCQ Generation	Partial	Yes	Yes	Yes	Yes	Yes	Yes

Fill-in-the- Blank Generation	No	No	No	No	No	No	Yes
True/False Generation	No	No	No	No	No	No	Yes
Bloom’s Taxonomy Questions	No	No	No	No	Partial	Yes	Yes
Multi-format Assessment	No	No	No	No		Partial	Partial	Yes
Integrated Framework	No	No	No	No	No	No	Yes

CONCLUSION AND FUTURE WORK

This framework utilizes the processing of natural languages and transformers to analyze educational content and automatically produce assessments. The system talks about Educational content extraction, text preprocessing, Transformer-based summarization, Keyword extraction, Readability-based difficulty analysis, and automatic assessment generation into a single architecture. The proposed frameworks effectiveness has been demonstrated via an experimental evaluation using educational study materials based on Machine Learning. The compression ratio of the summary can reach 82.6% and still have the ideas. The keyword extraction has successfully recognized important educational terms, with the most significant ones being Learning, Data, and Machine. Readability analysis produced a score of 37.89, indicating advanced educational content complexity. Furthermore, 120 assessment items were generated for MCQs, Fill-in-the-Blank questions, True/False questions, and Blooms Taxonomy-based questions.

Expert evaluation exhibited an educational relevance score of 89.4%. Thus, the generated assessments have practical applicability.

The findings indicate that integration of the Transformer model and NLP techniques are effective in reducing the manual assessment preparation effort significantly and improving the accessibility of educational content and coverage of assessment. The suggested framework aids in the processing of intelligent educational content and the generation of automatic assessments. The future work includes using LLMs for educational reasoning, multilingual question generation, personalized learning aids, voice-based educational agents and analyzing student performance. The future of education can involve personalized learning recommendations, as well as feedback mechanisms that can enable real-time interaction.

ACKNOWLEDGMENT

The authors would like to thank their respective institution and academic colleagues for their support and guidance during the development and evaluation of this research work.

REFERENCES

UNESCO, Artificial Intelligence and Education: Guidance for Policy Makers. Paris, France: UNESCO Publishing, 2021.
D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed., Stanford University Draft, 2024.
A. Vaswani et al., Attention Is All You Need, in Advances in Neural Information Processing Systems, vol. 30, pp. 59986008, 2017.
B. Das, N. Majumder, A. Gelbukh, and A. Cambria, Automatic Question Generation and Answer Assessment: A Survey, Smart Learning Environments, vol. 8, no. 1, pp. 124, 2021.
S. Bulathwela, H. Muse, and E. Yilmaz, Scalable Educational Question Generation with Pre-trained Language Models, in Proc. Int. Conf. Artificial Intelligence in Education (AIED), 2023.
A. K. Bhowmick et al., Automating Question Generation From Educational Text, in Artificial Intelligence XL, LNCS 14394, Springer, 2023, pp. 437450.
H. Muse, S. Bulathwela, and E. Yilmaz, Pre-training With Scientific Text Improves Educational Question Generation, Proc. AAAI, vol. 37, no. 13, pp. 1606416072, 2023.
H. W. Awalurahman, R. F. Aji, and I. Budi, Transformer and Large Language Models for Automatic Multiple-Choice Question Generation: A Systematic Literature Review, IEEE Access, vol. 13, pp. 127100127112, 2025.
N. Scaria, S. D. Chenna, and D. Subramani, Automated Educational Question Generation at Different Blooms Skill Levels Using Large Language Models: Strategies and Evaluation, arXiv:2408.04394, 2024.
C. Raffel et al., Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer, Journal of Machine Learning Research, vol. 21, no. 140,

pp. 167, 2020.
M. Lewis et al., BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, in Proc. ACL, pp. 78717880, 2020.
K. Sparck Jones, A Statistical Interpretation of Term Specificity and Its Application in Retrieval, Journal of Documentation, vol. 28, no. 1, pp. 1121, 1972.
R. Flesch, A New Readability Yardstick, Journal of Applied Psychology, vol. 32, no. 3, pp. 221233, 1948.
E. Dale and J. S. Chall, A Formula for Predicting Readability, Educational Research Bulletin, vol. 27, no. 1, pp. 1120, 1948.
L. W. Anderson and D. R. Krathwohl, A Taxonomy for Learning, Teaching, and Assessing: A Revision of Blooms Taxonomy of Educational Objectives. New York, NY, USA: Longman, 2001.

Component	Specification
Processor	Intel Core i5/i7
RAM	8 GB
Storage	512 GB SSD
Operating System	Windows 11
GPU	Optional

Parameter	Value
Original Word Count	8,000
Summary Word Count	1,392
Compression Ratio	82.6%

Keyword	Frequency
Learning	391
Data	247
Machine	227
Model	203
Training	157
Algorithm	108
Hypothesis	101
Classification	93
Set	87

Metric	Value
Readability Score	37.89
Difficulty Level	Hard