การเปรียบเทียบความเที่ยงและความตรงตามเกณฑ์ในการตรวจให้คะแนนของผู้ตรวจ ที่มีคุณลักษณะต่างกันของแบบสอบอัตนัยสำหรับวัดสมรรถนะ ทางวิทยาศาสตร์ของนักเรียนชั้นมัธยมศึกษาปีที่ 3

Phanida Changwa; Prakittiya  Tuksino

PDF

Published: Jun 28, 2023

Keywords:

holistic scoring rubric rater agreement G Theory inter-rater reliability

Phanida Changwa

Faculty of Education, Khon Kaen University

Prakittiya Tuksino

Faculty of Education, Khon Kaen University

Abstract

This research aimed 1) to study the inter-rater reliability of the raters' scoring criteria of the essay test for measuring scientific competence by Intra-Class Correlation: ICC under different raters' characteristics. 2) to compare the validity of the raters' scoring criteria with the holistic scoring rubric of the essay test by Rater Agreement under different raters' characteristics. And 3) to compare the G-coefficient, under different raters' characteristics, for Cross design [p x i x r] and Nested design [p x (i : r)]. The sample was divided into 2 groups: the group of 100 Grade 9 students and the group of raters. The group of raters comprised 3 raters who were science majors and another 3 raters who were non-science majors. The research instruments were 1) an essay test to measure the scientific competency of grade 9 students in 3 situations, containing 9 questions; and 2) a holistic scoring rubric. Generalizability Coefficient scores were analyzed by EduG. The research findings were 1) the reliability of the scoring results for each item, analyzed by the Intra-Class Correlation (ICC) statistics, were found to be from low to very good for all of the raters in the group, both science majors and non-science majors. 2) the validity of the raters' scoring criteria for each item analyzed by Rater Agreement between the raters’ scores (x) and standard scores (y) revealed that the 3 raters who were science majors had the agreement index from 14 percent to 84 percent, and the 3 raters who were non-science majors had the agreement index from 27 percent to 89 percent. 3) The Generalizability Coefficient scores of the p x ( i : r ) design was higher than the p x i x r design for all raters in the group.

Issue

Vol. 29 No. 1: January - June 2023

Section

Research Article

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The content and information contained in the published article in the Journal of Educational Measurement Mahasarakham University represent the opinions and responsibilities of the authors directly. The editorial board of the journal is not necessarily in agreement with or responsible for any of the content.

The articles, data, content, images, etc. that have been published in the Journal of Educational Measurement Mahasarakham University are copyrighted by the journal. If any individual or organization wishes to reproduce or perform any actions involving the entirety or any part of the content, they must obtain written permission from the Journal of Educational Measurement Mahasarakham University.

References

Brennan, R. L., & Johnson, E. G. (1995). Generalizability of Performance Assessments. Journal of Educational Measurement, 14(4), 9-12.

Chiu, C., & Wolfe, E. (2002). A Method for Analyzing Sparse Data Matrices in the Generalizability Theory Framework. SAGE Journal, 26(3), 321-338.

Coffman, W. E. (1971). On the Reliability of Ratings of Essay Examinations in English. JSTOR Journal, 5(1), 24-36.

Hopkins, C. D., & Antes, R. L. (1990). Classroom Measurement and Evaluation. Peacock Press.

Koo, T. K., & Li, M. Y. (2016). A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. Journal of Chiropractic Medicine, 15(2), 155-163.

Swartz, C. W., Hooper, S. R., Montgomery, J. W., Wakely, M. B., Kruif, R. E. L., Reed, M., Brown, T. T., Levine, M. D., & White, K. P. (1999). Using Generalizability Theory to Estimate the Reliability of Writing Scores Derived from Holistic and Analytical Scoring Methods. Sage Journal, 59(3), 492-506.

Welk et al., (2004). Reliability of accelerometry-based activity monitors: a generalizability study. Ovid Journal, 36(9), 1637-1645.

Aphaikawi, D. (2019). Scoring results of subjective exams when different groups of inspectors and examination patterns. The 27th Thailand Measurement Evaluation and Research, 108-124. (in Thai)

Intanate, N. (2011). Characteristic of the open-ended mathematics test scores for different numbers of raters and scoring patterns using generalizability model and many-facet Rasch model [Doctoral dissertation]. Srinakharinwirot University. (in Thai)

Kanjanawasee S. (2007). Modern test theories. Chulalongkorn University Press. (in Thai)

Kwanja, N. (2013). Comparison of summaries reference coefficients of the process skills scale Grade 4 science with different scoring patterns [Master’s thesis]. Mahasarakham University. (in Thai)

Ministry of Education. (2017). Thailand Education Plan B.E. 2560 - 2579 (A.D. 2017 – 2036). Office of the Education Council Press. (in Thai)

Phadungphon, S. (2017). Comparison of reliability of modified essay question test for measuring the abilities in using scientific method in physic under different numbers of event and rater: an application of generalizability theory. Educational Electronic Journal, 12(4), 381-393. (in Thai)

Phusing N. (2020). Scienceteacher development model throughstem education for the schools with non-science majoringteachers (nsmt). Journal of MCU Ubon Review, 5(3), 439-454. (in Thai)

Pinyoanuntapong, B. (2004). Measurement and evaluation. Srinakharinwirot University Press. (in Thai)

Sanguanwai, C. (2015). Comparison of test reliability for Measuring Mathematical Creative problem-solving ability: Application of Generalizability theory [Master’s thesis]. Chulalongkorn University. (in Thai)

Saosin, K. (2019). Comparison of Reliability of Math Problem Solving Proficiency Test with Sub-analytical Scoring At the lower secondary level: application of summary theory referring to the reliability of measurement results. Educational Electronic Journal, 13(3), 423-438. (in Thai)

Taoto, J. (2016). A study of the confidence values of students' math subjective test scores. Secondary school with different number of examiners and scoring patterns using the theory of summaries. Reference. Hat Yai Academic Journal, 14(1), 1-14. (in Thai)

The Institute for the Promotion of Teaching Science and Technology, (2020). Scientific Literacy. https://pisathailand.ipst.ac.th/about-pisa/scientific-literacy/ (in Thai)

Tuksino, P. (2013). Teaching documentation educational research methodology. Khon Kaen University Press. (in Thai)

Umnacil, M. (2014). Comparison of reliability of modified essay question test for measuring scientific problem-solving ability using different scoring methods under different number of events: an application of generalizability theory [Master’s thesis]. Chulalongkorn University. (in Thai)

Article Sidebar

Main Article Content

Abstract

Article Details

References