Designing Open-Ended Question Scoring for Assessment of Student Mathematical Proficiency Levels Through Digital Technology

Main Article Content

Apinya Fiothong
Putcharee Junpeng
Prapawadee Suwannatrai
Samruan Chinjunthuk
Chaiwat Tawarungruang


The study aimed to (1) analyze students’ multidimensional response patterns for determining cut scores for assessment of mathematical proficiency levels on the topic of Measurement and Geometry, and (2) to design and assess the quality of open-ended question scoring for assessment of mathematical proficiency levels through digital technology. Design research was applied. The sample consisted of 528 grade 7 students. The research instrument was an open-ended question test on the topic of Measurement and Geometry through diagnostic tools in an online testing system—"eMAT-Testing.” The analysis of the collected data employed the MRCML model.
The results were as follows:
1. On determining cut scores of mathematical proficiency levels by defining criterion zones on Wright Map, it was found that mathematical processes featured five levels with four cut scores, ranging from the lowest to highest as follows: -2.30, -0.43, 0.78, and 1.15, respectively. Similarly, conceptual structures consisted of five levels with four cut scores, including -2.76, 0.11, 0.46, and 1.16, respectively. Such cut scores can be employed to determine proficiency ranges, scale scores, and raw scores as criteria for assessment of mathematical proficiency in each dimension.
2. In terms of designing the open-ended question scoring through digital technology, it featured five parts, namely (1) input, (2) process, (3) processing, (4) output, and (5) assessment reporting. The assessment of its quality through standards-based assessment and heuristic assessment conducted by experts showed that: (1) the standards-based assessment on all 3 aspects— accuracy, utility, and feasibility—were rated with the highest level of assessment. (2) Based on the heuristic assessment, the overall system had the highest level of suitability; visibility of system status was rated with the highest level of assessment, while aesthetic and minimalist design obtained the lowest level of assessment.


Download data is not yet available.

Article Details

Research Article


AERA, APA, & NCME. (2014). Standards for Educational and Psychological Testing (6th ed.). American Educational Research Association.

Adams, R. J., Wilson, M., and Wang, W.C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1-23.

Berggren, S. J., Rama, T., and Ovrelid, L. (2019). Regression or classification? Automated Essay Scoring for Norwegian.

Black, P., and William, D. (1998). Inside the black box: raising standards through classroom assessment. Phi Delta Kappan, 8(2), 139-148.

Demars, C. (2010). Item Response Theory: Understanding Statistics Measurement. Oxford University Press.

European Language Resources Association (ELRA). (2020). Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). Tokyo Metropolitan University.

Junpeng, P., Krotha, J., Chanayota, K., Tang, K. N., & Wilson, M. (2019). Constructing Progress Maps of Digital Technology for Diagnosing Mathematical Proficiency. Journal of Education and Learning, 8 (6), 90-102.

Junpeng, P., Marwiang, M., Chiajunthuk, S., Suwannatrai, P., Chanayota, K., Pongboriboon, K., Tang, K. N., Wilson, M. (2020b). Validation of a digital tool for diagnosing mathematical proficiency. International Journal of Evaluation and Research in Education (IJERE), 9(3), 665-674.

Koyama, Kiyuna, Kobayashi, Arai, and Komachi. (2020). Proceedings of the 12th conference on language resources and evaluation (LREC 2020). France.

Nielsen, J. (1992). Finding Usability Problems through Heuristic Evaluation. Paper presented at the ACM CHI'92, Monterey, CA.

Rodrigues, and Araújo. (2012, April). Automatic assessment of short free text answers.

Wang, J., and Brown, M.S. (2007). Automated Essay Scoring Versus Human Scoring: A Comparative Study. Journal of Technology, Learning, and Assessment (2).

Wilson, M. (2005). Constructing measures: An item response modeling approach. Routledge.

Wright, B. D., and Stone, M. H. (1979). Best test design: Rasch measurement. Mesa Press.

Wu, Adams, Wilson, and Haldane. (2007). ACER ConQuest version 2.0. ACER Press.

Aungkaseraneekul, S. (2012). Automated thai-language essay scoring. [Unpublished master’s thesis]. Kasetsart University. (in Thai)

Chinjunthuk, S., Junpeng, P. (2020). Assessment Guidelines for Student’s Personalized Mathematical Proficiency Development. Journal of Educational Measurement, Mahasarakram University, 26(1), 47- 64. (in Thai)

Jaihuek, S., and Mungsing, S. (2020). Scoring Thai Language Subjective Answer Automaic Sysem by Sematic. Information Technology Journal, 16(1), 15-23. (in Thai)

Junpeng, P., Marwiang, M., Chinjunthuk, S., Suwannatrai, P., Krotha, J., Chanayota, K., Tawarungruang, C., Thuanman, J., Tang K. N., and Wilson M. (2020a). Developing Students’ Mathematical Proficiency Level Diagnostic Tools through Information Technology in Assessment for Learning Report. The Thailand Research Fund and Khon Kaen University. (in Thai)

Suksiri, W. and Worain, C. (2016). Investigating Tentative Cut scores for Science Learning Area on the Ordinary National Educational Test Scores using the Construct Mapping Method: An Analysis for Further Judgments. National Institute of Educational Testing Service (Public Organization). (in Thai)

The institute for the Promotion of Teaching Science and Technology (IPST). (2020). PISA 2021 with assessment mathematical literacy. (in Thai)

Wongwanit, S. (2020). Design Research in Education (1st ed.). Chulalongkorn University Press. (in Thai)