Development of Automated Short Essay Models for Statistics and Information in Education Courses

Main Article Content

Suthisak Salika
Prapasiri Ratchaprapapornkul

Abstract

This research aims to develop a model for scoring short-answer free-response questions in statistics and educational information courses using machine learning. It compares the performance of models developed with different algorithms. Evaluation of the developed models uses a test dataset of questions verified by experts. Comparing five models: Single Learner with four algorithms - Random Forest, Support Vector Machine, Naive Bayes, Logistic Regression, and an Ensemble Learner model combining all four algorithms, it was found that the best-performing models came from Naive Bayes and Random Forest algorithms. Naive Bayes performed best for scoring question 1 and closely equaled Random Forest's performance. For other questions, where Random Forest excelled. The top-performing model for all five-model had f1-scores ranging from .90 to .97, Precisions from .95 to 1.00, Recalls from .77 to .92, Sensitivities from .84 to .96, and Specificities from .85 to 1.00. Recall was .77 for one model, indicating moderate performance, while the rest were no less than .85, considered good to very good. In terms of processing time, all Single Learner models were similar and comparable to the Ensemble Learner, with processing times ranging from 1.1 to 2.6 seconds. Therefore, Random Forest emerged as the most effective model in both accuracy and processing speed.

Article Details

Section
Research Article

References

Abdullah, D. M., & Abdulazeez, A. M. (2021). Machine Learning Applications based on SVM Classification: A Review. Qubabau Academic Journal, 1(2), 81-90. https://doi.org/10.48161/qaj.v1n2a50

Adugna, T., Xu, W., & Fan, J. (2022). Comparison of Random Forest and Support Vector Machine Classifiers for Regional Land Cover Mapping Using Coarse Resolution FY-3C Images. Remote Sensing, 14(3), 574. https://doi.org/10.3390/rs14030574

Jollyta, D., Gusrianty, & Sukrianto, D. (2019). Analysis of Slow Moving Goods Classification Technique: Random Forest and Naïve Bayes. Khazanah informatika, 5(2), 134-139. https://doi.org/10.23917/khif.v5i2.8263

Kumar, V. S., & Boulanger, D. (2020). Automated Essay Scoring and the Deep Learning Black Box: How Are Rubric Scores Determined?. International Journal of Artificial Intelligence in Education, 31(1), 538-584. https://doi.org/10.1007/s40593-020-00211-5

McNamara, D. S., Crossley, S. A., Roscoe, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. ScienceDirect, 23(1), 35-59. http://dx.doi.org/10.1016/j.asw.2014.09.002

Pranckevicius, T., & Marcinkevicius, V. (2017). Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression classifiers for text reviews classification. Baltic Journal of Modern Computing, 5(2), 221–232. https://doi.org/10.22364/bjmc.2017.5.2.05

Xin, Y., & Ren, X. (2022). Predicting depression among rural and urban disabled elderly in China using a random forest classifier. BMC Psychiatry, 22(1), 118. https://doi.org/10.1186/s12888-022-03742-4

Chuchip, K. (2018). Logistic Regression. Remote Sensing Technical Note, 5(1), 1-10. https://forest-admin.forest.ku.ac.th/304xxx/?q=system/fles/book/5%282018%29%20Logistic%20Regression.pdf (in Thai)

Ho, T. K. (2021). Algorithm Random Forest. Mathlabbkk. https://matlabbkk.medium.com/อัลกอริทึม-random-forest-a25517b92e04 (in Thai)

Ho, T. K. (2022). Algorithm Random Forest is what... used when... How does it work... Mathlabbkk. https://matlabbkk.medium.com/อัลกอริทึม-random-forest-คืออะไร-ใช้เมื่อไหร่-มีหลักการทำงานอย่างไร-11f9a036e348 (in Thai)

Kittinaradorn, C. (2020, January). Support Vector Machines. Github. https://guopai.github.io/ml-blog08.html (in Thai)

Kongruksiam. (2020, 27 March). Machine Learning (EP.6)- Naive Bayes Classification. Medium. https://kongruksiam.medium.com/สรุป-machine-learning-ep-5-การจัดหมวดหมู่ด้วย-naive-bayes-eb9ce0e1b010 (in Thai)

Pradyasin. (2019, 4 October). Support Vector Machines (SVM). Medium. https://medium.com/@pradyasin/support-vector-machines-svm-943f9a732a69 (in Thai)

Tongsilp, A. (2020). Development of Automated Scoring System for Thai Writing Ability Test of Primary Education Level [Doctoral Dissertation]. Chula Digital Collections. https://digital.car.chula.ac.th/chulaetd/4143/ (in Thai)