Gemini API-Based Automated English Paragraph Scoring Aligned with High School Thai Curriculum Writing Indicators

Phisit Deeboonmee Na Chumphae

doi:10.21093/ijeltal.v11i1.2553

Authors

Phisit Deeboonmee Na Chumphae Mahasarakham University

DOI:

https://doi.org/10.21093/ijeltal.v11i1.2553

Keywords:

Automated Scoring, Gemini API, Paragraph Writing, Writing Assessment

Abstract

This research aims to develop and evaluate an automated English paragraph scoring system using the Gemini API aligned with the Thai upper secondary curriculum's writing indicators, to address issues related to teacher workload and delayed feedback in writing assessments. The system integrates the Gemini 2.5 Pro API with a prompt-engineering framework designed to simulate expert EFL assessors. This research employs a sequential mixed-methods research approach. For the quantitative component, 160 upper secondary EFL students in Thailand were sampled from their written assignments, consisting of three expository paragraph assignments aligned with the Thai core curriculum. Cluster sampling was used to select participants. The students' writings were assessed using a validated analytical evaluation criterion comprising four aspects. The essays were independently scored by three evaluators, and the results were compared to automated scores generated by a Gemini-based system. Reliability between human evaluators was first checked using the Intraclass Correlation Coefficient (ICC), and the agreement between human and AI scores was measured using the Quadratic Weighted Kappa (QWK). The results showed a high level of agreement between the Gemini-generated scores and the human evaluators (QWK = 0.82), indicating that the system can approximate human judgment in evaluating English as a Foreign Language writing. Qualitative analysis of the AI-generated feedback further revealed that the system could provide diagnostic recommendations related to grammar, vocabulary, and sentence structure. These findings suggest that the system can support teachers in reducing grading workload while providing timely, criteria-based feedback to enhance students’ writing development.

References

Ajabshir, Z. F., & Ebadi, S. (2023). The effects of automatic writing evaluation and teacher-focused feedback on CALF measures and overall quality of L2 writing across different genres. Asian-Pacific Journal of Second and Foreign Language Education, 8(1), 26. https://doi.org/10.1186/s40862-023-00201-9

Aliakbari, M., Barzan, P., & Allahveysi, S. P. (2025). AI usage in academic writing: Perspectives of stakeholders. AI and Tech in Behavioral and Social Sciences, 3(4), 1–12. https://doi.org/10.61838/kman.aitech.4343

Al-Kadi, A., & Ali, J. K. M. (2024). A holistic approach to ChatGPT, Gemini, and Copilot in English learning and teaching. Language Teaching Research Quarterly, 43, 155–166. https://doi.org/10.32038/ltrq.2024.43.09

Anam, R. K. (2025). Prompt engineering and the effectiveness of large language models in enhancing human productivity. ArXiv. https://doi.org/10.31219/osf.io/ad9y5_v1

Atkinson, J., & Palma, D. (2025). An LLM-based hybrid approach for enhanced automated essay scoring. Scientific Reports, 15(1), 14551. https://doi.org/10.1038/s41598-025-87862-3

Aydın, B., Kışla, T., Elmas, N. T., & Bulut, O. (2025). Automated scoring in the era of artificial intelligence: An empirical study with Turkish essays. System, 133, 103784. https://doi.org/10.1016/j.system.2025.103784

Correia, A. P., Hickey, S., & Xu, F. (2025). Realizing the possibilities of the large language models: Strategies for prompt engineering in educational inquiries. Theory Into Practice, 64(4), 434–447. https://doi.org/10.1080/00405841.2025.2528545

Elhag, A., Al Abri, M., & Yousef, A. M. F. (2025). The effect of generative AI tools (ChatGPT, Gemini, etc.) on students’ achievement and their motivation towards learning. Journal of Technology and Science Education, 15(3), 746. https://doi.org/10.3926/jotse.3410

Firoozi, T., Mohammadi, H., & Gierl, M. J. (2023). Using active learning methods to strategically select essays for automated scoring. Educational Measurement: Issues and Practice, 42(1), 34–43. https://doi.org/10.1111/emip.12537

Gemini Team, G., Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., Marris, L., Petulla, S., Gaffney, C., Aharoni, A., Lintz, N., Pais, T. C., Jacobsson, H., Szpektor, I., … Helmholz, W. (2025). Gemini 2.5: pushing the frontier with advanced reasoning, multimodality, long context, and next-generation agentic capabilities. http://arxiv.org/abs/2507.06261

Google. (2025). Gemini 2.5 Pro Model Card. Https://Storage.Googleapis.Com/Deepmind-Media/Model-Cards/Gemini-2-5-Pro-Model-Card.Pdf.

Harmer, J. (2007). How to teach English. Pearson Longman.

Harmer, J. (2015). The practice of English language teaching. Pearson Longman.

Hou, X., He, S., & Cuigong, R. (2024). Learner use of AI-generated feedback for written corrective feedback in L2 writing: usefulness, user proficiency, and attitude. Proceedings of the 2024 8th International Conference on Education and Multimedia Technology, 70–76. https://doi.org/10.1145/3678726.3678767

Huang, Y., & Wilson, J. (2021). Using automated feedback to develop writing proficiency. Computers and Composition, 62, 102675. https://doi.org/https://doi.org/10.1016/j.compcom.2021.102675

Hyland, K., & Richards, J. C. (2014). Second language writing. Cambridge University Press.

Imran, M., & Almusharraf, N. (2024). Google Gemini as a next-generation AI educational tool: a review of emerging educational technology. Smart Learning Environments, 11(1), 22. https://doi.org/10.1186/s40561-024-00310-z

Islam, R., & Ahmed, I. (2024). Gemini-the most powerful LLM: Myth or truth. 2024 5th Information Communication Technologies Conference (ICTC), 303–308. https://doi.org/10.1109/ICTC61510.2024.10602253

Jarunthawatchai, W., & Baker, W. (2024). English language education and educational policy in Thailand. In The Oxford Handbook of Southeast Asian Englishes (pp. 557–574). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780192855282.013.30

Kartika, S. (2024). Enhancing Writing Proficiency through AI-Powered Feedback: A Quasi-Experimental Study Using Google Gemini. LinguaEducare: Journal of English and Linguistic Studies, 1(2), 83–96. https://doi.org/10.63324/h6q1ak58

Kasimova, M., & Babakulova, D. (2025). A comparative evaluation of ChatGPT, Gemini, and Perplexity feedback for B1-B2 EFL learners. Foreign Languages in Uzbekistan. https://doi.org/10.36078/1767688041

Kawinkoonlasate, P. (2025). A comparative study of Google Gemini and ChatGPT in enhancing English language learning for EFL learners: A case study of the English research writing course. Pedagogical Research, 10(4), em0251. https://doi.org/10.29333/pr/17670

Kim, Y., Mozer, R., Al-Adeimi, S., & Miratrix, L. (2025). ChatGPT vs. Machine Learning: Assessing the efficacy and accuracy of large language models for automated essay scoring. (EdWorkingPaper:25-1335). Annenberg Institute at Brown University. https://doi.org/https://doi.org/10.26300/7vj9-5y53

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159. https://doi.org/10.2307/2529310

Langan, J., & Albright, Z. L. (2020). Exploring writing: paragraphs and essays. McGraw-Hill Education.

Ministry of Education. (2008). The Basic Education Core Curriculum B.E. 2551. The Agricultural Co-operative Federation of Thailand, Ltd. Press.

Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050

Mughal, N., Imran, A. S., Daudpota, S. M., Kastrati, Z., & Noor, W. (2026). Exploring the potential of large language models for automated essay scoring in education. Discover Artificial Intelligence, 6(1), 166. https://doi.org/10.1007/s44163-026-01002-y

Nguyen, D. L., Le, P. T. T., & Le, T. T. (2025). Using Gemini for formative assessment in English academic writing - critical insights into the ai tool’s efficacy. AsiaCALL Online Journal, 16(1), 328–343. https://doi.org/10.54855/acoj.2516117

Nguyen, P. D. A. (2024). Gemini Google: A potential tool for English learning. Thu Dau Mot University Journal of Science, 6(3). https://doi.org/10.37550/tdmu.EJS/2024.03.586

Office of the Basic Education Commission. (2025). AI usage guide for teachers, students, schools, and parents in Thailand. Bureau of Academic Affairs and Educational Standards.

Ouyang, F., & Jiao, P. (2021). Artificial intelligence in education: The three paradigms. Computers and Education: Artificial Intelligence, 2, 100020. https://doi.org/10.1016/j.caeai.2021.100020

Rane, N., Choudhary, S., & Rane, J. (2024). Gemini versus ChatGPT: Applications, performance, architecture, capabilities, and implementation. Journal of Applied Artificial Intelligence, 5(1), 69–93. https://doi.org/10.48185/jaai.v5i1.1052

Sardi, J., Darmansyah, C., O., Yanto, D. T. P., & Eliza, F. (2025). How does generative AI influence students’ self-regulated learning and critical thinking skills? A systematic review. International Journal of Engineering Pedagogy (IJEP), 15(1), 94–108. https://doi.org/10.3991/ijep.v15i1.53379

Shermis, M. D., & Burstein, J. (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge/Taylor & Francis Group.

Silva, P., & Costa, E. (2025). Assessing large language models for automated feedback generation in learning programming problem-solving. ArXiv E-Prints, arXiv:2503.14630. https://doi.org/10.48550/arXiv.2503.14630

Swiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J. M., Milligan, S., Selwyn, N., & Gašević, D. (2022). Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence, 3, 100075. https://doi.org/10.1016/j.caeai.2022.100075

Teng, M. F., Qin, C., & Wang, C. (2022). Validation of metacognitive academic writing strategies and the predictive effects on academic writing performance in a foreign language context. Metacognition and Learning, 17(1), 167–190. https://doi.org/10.1007/s11409-021-09278-4

Tesfay, H. (2017). Investigating the practices of assessment methods in Amharic language writing skill context: The case of selected higher education in Ethiopia. Educational Research and Reviews, 12(8), 488–493. https://doi.org/10.5897/ERR2017.3169

Tongsilp, A., Tangdhanakanond, K., & Chaimangkol, N. (2024). Development of automated scoring system for Thai writing ability test of primary education level. Kasetsart Journal of Social Sciences, 45(3). https://doi.org/10.34044/j.kjss.2024.45.3.05

Trinh, N. T. N., & Dan, T. C. (2025). EFL students’ perceptions and practices of using Gemini for developing English argumentative essay writing skills. European Journal of Alternative Education Studies, 10(3). https://doi.org/10.46827/ejae.v10i3.6428

Vimala, A. (2025). English language learning in Thailand: Policy, practice, and pedagogy teaching approaches & methodologies. Journal of Asian Language Teaching and Learning, 6(2). https://so10.tci-thaijo.org/index.php/jote/article/view/2986

Wei, P., Wang, X., & Dong, H. (2023). The impact of automated writing evaluation on second language writing skills of Chinese EFL learners: A randomized controlled trial. Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1249991

Weigle, S. C. (2002). Assessing writing. Cambridge University Press. https://doi.org/10.1017/CBO9780511732997

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. ArXiv. https://doi.org/https://doi.org/10.48550/arXiv.2302.11382

Yancey, K. P., Laflair, G., Verardi, A., & Burstein, J. (2023). Rating short l2 essays on the CEFR scale with GPT-4. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA, 2023) (pp. 576–584). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.bea-1.49

Gemini API-Based Automated English Paragraph Scoring Aligned with High School Thai Curriculum Writing Indicators

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Current Issue

Developed By