# CHARACTERISTICS OF THE OPEN-ENDED MATHEMATICS TEST SCORES FOR DIFFERENT NUMBERS OF RATERS AND SCORING PATTERNS USING GENERALIZABILITY MODEL AND MANY- FACET RASCH MODEL

## Abstract

The purpose of this research was to study the characteristics of the open-ended
mathematics test scores analyzed by using Generalizability Model and Many-Facet Rasch
Model under different conditions of three numbers of raters (2, 3 and 4 raters) and three
scoring patterns. The scoring patterns were 1) the rater rated all items of all students,
2) the rater rated all items of some students and 3) the rater rated some items of all students.
The score characteristics were considered from the magnitude of variance components,
the reliability, and the concurrent validity.
The research tool was the 12-item open-ended mathematics test for lower secondary
level, according to the 2008 Basic Education Curriculum. The sample consisted of 180 Mathayomsuksa 4 students in the schools attached to Nan Educational Service Area Offce in 2009
academic year, and was selected by two-stage random sampling.
The results of the study were as follows :
1. Analyzed by Generalizability Model, when used the same scoring patterns in
all conditions of numbers of raters, there were similar magnitudes of variances in the same
components. The generalizability coeffcient obtained from the 2nd pattern was maximum,
followed by the 1st pattern, and the 3rd pattern was minimum. The generalizability coeffcient
obtained from the 1st pattern was higher when the numbers of raters increased. The values of
concurrent validity of scores in all different conditions were high but not signifcantly different.
2. Analyzed by Many-Facet Rasch Model, in all conditions of numbers of raters
and scoring patterns, the examinees variance had the maximum value, followed by items and
the raters variance which had the minimum value. The person separation reliability of the 1st
pattern had the maximum value in all conditions of numbers of raters. The person separation
reliability of the 1st pattern was higher when the numbers of raters increased. The values
of concurrent validity of measured scores in all different conditions were high but not
signifcantly different