Applying Generative Artificial Intelligence in Educational Research Instrument Development: A Methodological Framework

Main Article Content

Pawaris Saramano

Abstract

The development of educational research instruments is critical to the quality, credibility, and interpretability of research findings. However, many researchers continue to face challenges in constructing instruments that are both theoretically grounded and methodologically sound. At the same time, recent advances in Generative Artificial Intelligence (GenAI), particularly tools based on Large Language Models (LLMs) and Natural Language Processing (NLP), have created new opportunities to support construct definition, indicator specification, preliminary item drafting, and early-stage language review. Nevertheless, the application of these technologies to educational research instrument development still lacks a clearly articulated methodological framework grounded in educational measurement and evaluation principles.
This conceptual paper proposes a methodological framework for applying Generative Artificial Intelligence to educational research instrument development. The framework integrates four core components: educational measurement and evaluation foundations, instrument development processes, the role of Artificial Intelligence as a methodological support mechanism, and Human-in-the-Loop decision-making. Within this framework, AI is positioned as a methodological assistant rather than a substitute for scholarly judgment. It may support construct clarification, indicator development, preliminary item generation, and early-stage quality review, while responsibility for decisions concerning validity, reliability, measurement precision, measurement fairness, and ethical appropriateness remains with researchers and domain experts.
The proposed framework offers a structured, transparent, and theoretically grounded approach to integrating AI into instrument development. It further emphasizes that the academic value of AI-assisted instrument development depends not merely on procedural efficiency, but on its alignment with sound measurement and evaluation principles and responsible human oversight. This paper thus provides a preliminary methodological contribution for educational researchers, research instructors, and graduate students, while also offering a foundation for future empirical validation of AI-assisted instrument development practices in education.


 

Article Details

Section
Academic Article

References

AERA, APA, & NCME. (2014). Standards for educational and psychological testing. American Educational Research Association.

AI Thailand. (2022). Thailand National AI Strategy and Action Plan (2022–2027). https://www.ai.in.th/en/about-ai-thailand/

Cohen, L., Manion, L., & Morrison, K. (2018). Research methods in education (8th ed.). Routledge. https://doi.org/10.4324/9781315456539

DeVellis, R. F., & Thorpe, C. T. (2021). Scale development: Theory and applications (5th ed.). SAGE.

Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3–25). Lawrence Erlbaum Associates.

Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.

Flanagin, A., Pirracchio, R., Khera, R., Berkwits, M., Hswen, Y., & Bibbins-Domingo, K. (2024). Reporting use of AI in research and scholarly publication—JAMA Network guidance. JAMA, 331(13), 1096–1098. https://doi.org/10.1001/jama.2024.3471

Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., Luetge, C., Madelin, R., Pagallo, U., Rossi, F., Schafer, B., Valcke, P., & Vayena, E. (2018). AI4People—An ethical framework for a good AI society. Minds and Machines, 28(4), 689–707. https://doi.org/10.1007/s11023-018-9482-5

Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge. https://doi.org/10.4324/9780203850381

Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning. Center for Curriculum Redesign.

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000

Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, Article 102274. https://doi.org/10.1016/j.lindif.2023.102274

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/10.1037/0003-066X.50.9.741

UNESCO. (2023). Guidance for generative AI in education and research. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000386693

Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education. International Journal of Educational Technology in Higher Education, 16(1), Article 39. https://doi.org/10.1186/s41239-019-0171-0