Main Article Content
The main purpose of this study was to investigate the accuracy of concurrent
calibration for the assessment of examinee’s growth ability (O2- O1) in mixed–format test
consisting in terms of multiple choice (MC) items and constructed–response (CR) items where
the MC was dichotomous response model and the CR was polytomous response model.
In order to fulfll this purpose, the 3PL/GPCM model combination was then used to
simulate the item responses data of 1,000 examinees, in which four factors–growth ability,
test lengths (the number of MC:CR, i.e. 30 : 10, 24 : 8, and 15 : 5), item diffculties and
scoring categories of mixed–format test – were manipulated. In total, there were 243
conditions (9x3x3x3) with respect to the four variables. The accuracy of concurrent calibration
was determined from the degrees of bias (BIAS) and the root mean square errors (RMSE) of
the estimated growth ability (O2- O1).
The results of the research were as follows :
1. Pearson’s correlation coeffcients between the estimated ability and the true
ability were negatively high and statistically signifcant at the .01 level.
2. For all conditions, the accuracy of concurrent calibration when the standard
deviations of growth ability were 0.80 and 1.00 was statistically insignifcantly higher than when
the standard deviation of growth ability was 1.2
2.1 When the item diffculties and scoring categories were fxed, the accuracy of
concurrent calibration of the 24 : 8 mixed–format test was statistically signifcantly higher, at
the .05 level, than those of the 15 : 5 and the 30 : 10 mixed–format test.
2.2 When the test lengths and scoring categories were fxed, the accuracy of
concurrent calibration of the 1st mixed–format test with the 0.00 diffculty was statistically
signifcantly higher, at the .05 level, than those of the 1st test mixed–format test with the -0.50
diffculty and the 0.50 diffculty.
2.3. When the test lengths and item diffculties were fxed, the accuracy of concurrent calibration of the mixed–format test with three–categories CR items (0/1/2) was statistically signifcantly higher, at the .05 level, than those of the mixed–format test with four–categories CR items (0/1/2/3) and with fve–categories CR items (0/1/2/3/4).