Quality Comparison of Scoring of Essay Tests to Measure Mathematics Problem Solving Ability Using Different Scoring Designs: Application of Generalizability Theory

Main Article Content

Surachai Matharach
Prakittiya Tuksino

Abstract

The objectives of this research were: 1) to study the quality of scoring the essay test measuring the mathematical problem-solving ability of the scorers under different scoring models; 2) to study the magnitude of the source of variance of the components and the reference summation coefficient of the essay test scores on mathematical problem solving when the number of scorers and scoring formats are different; 3) to compare the reference summation coefficients. (G-Coefficient) of the essay test that measures the ability to solve mathematical problems under different scoring formats and different number of scorers.The sample consisted of 132 grade 9 students, obtained from two-stage cluster sampling, and a group of 3 scorers who were mathematics teachers, obtained through purposive sampling. The research instrument was an essay test measuring the ability to solve mathematical problems, with 3 items. The difficulty ranged from 0.22 to 0.69 and the discrimination ranged from 0.56 to 0.80. The results were as follows: 1) Regarding the quality of scoring of the essay test measuring the ability to solve mathematical problems, scorer 1 had the most consensus on the test, compared to the scoring criterion of item 3. Scorer 2 had the most consensus on the test, compared to the scoring criterion of item 1 and 2, while scorer 3 had the least consensus on the test, compared to the scoring criteria of all 3 items; and the reliability of the scoring between the scorers (Inter Rater Reliability) was consistent at a very good level. 2) Regarding the source of variance in each scoring format, in the P x I x R scoring format, the source with the greatest variance was the PI, and the source with the lowest variance was the PR; and in the P x (I : R) scoring format, the source with the greatest variance was the PI: R, and the source with the lowest variance was then PR; 3) The scoring format with the highest referencing coefficient of summation was the P x I x R format, and the results of 3 raters were more reliable than 2 raters.

Article Details

Section
Research Article

References

Koo, T. K., & Li, M. Y. (2016). A Guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163.

Miller, M. D., & Legg, S. M. (1993). Alternative assessment in a high-stakes environment. Educational Measurement: Issues and Practice, 12(2), 9-15.

Aphaikawee, D. (2019). Improving the effectiveness of the subjective test scoring [Doctoral dissertation]. Khon Kaen University. (in Thai)

Bhu-iam, W. (2007). A comparison of generalizability coefficient and error variance between traditional and two-tier diagnostic tests In mathematics [Master’s thesis]. Chulalongkorn University. (in Thai)

Buasiripan, U. (2000). Comparison of reference sum coefficients of existing math tests. The methods of examination, the number of examiners, and the experience of the inspectors differ [Master’s thesis]. Srinakharinwirot University. (in Thai)

Chuithong, S. (2011). Model quality examination of the learning assessment toolkit based on actual conditions, mathematics learning subject group Mathayom 1 by applying the theory of reference summary [Master’s thesis]. Srinakharinwirot University. (in Thai)

Institute for the Promotion of Teaching Science and Technology. (2020). Mathematical Literacy. https://pisathailand.ipst.ac.th/about-pisa/mathematical-literacy/ (in Thai)

Inthanate, N. (2011). Characteristics of the open-ended mathematics test scores for Different numbers of raters and scoring patterns using Generalizability model and many-facet rasch model [Doctoral dissertation]. Srinakharinwirot University. (in Thai)

Kanjanawasee, S. (2012). New test theory (4th Ed.). Chulalongkorn University Press. (in Thai)

Prasertsinponma, D., & Tuksino, P. (2021). Comparison of results of subjective test for measuring mathematical competencies of primary 6: application of generalizability theory. The 22nd National Graduate Research Conference (Online Conference), 22(1), 398-408. (in Thai)

Saenpluem, B. (2013). Using the Characteristics Testing Method and the Proportion of the Number of Reviewers on the Accuracy of the Writing Ability Measurement of Grade 3 Students [Doctoral dissertation]. Srinakharinwirot University. (in Thai)

Sa-nguanwai, C. (2015). Comparison of the reliability of problem-solving ability tests mathematical creativity The application of summation theory references the reliability of measurement results [Master’s thesis]. Chulalongkorn University.(in Thai)

Tangdhanakanond, K. (2014). Measuring and evaluating practical skills. Chulalongkorn University Press. (in Thai)

Taoto, J., Jarernvongrayab, A., & Baikularb, P. (2016). A study of the reliability of mathematics essay test score for matayomsuksa 2 students: The different number of raters and scoring patterns using generalizability theory. Hatyai Journal, 14(1), 1-14. (in Thai)

Thitikanpodchana, W., & Tuksino, P. (2021). Comparisons of the Results of Essay Test by Different Scoring Designs: Application of Generalizability Theory. The 22nd National Graduate Research Conference (Online Conference), 22(1), 409-418. (in Thai)

Tuksino, P. (2013). Teaching documentation Educational research methodology. Faculty of Education, Khon Kaen University. (in Thai)

Umnacil, M. (2014). Comparison of reliability of modified essay question test for measuring Scientific problem solving ability using different scoring methods Under different numbers of event: An application of generalizability theory [Master’s thesis]. Chulalongkorn University. (in Thai)