Development of a Measure of Student’s Scientific Argumentation Skills: An Application of Construct Modeling

Main Article Content

Parinya Mutcha
Putcharee Junpeng


The research aimed to (1) develop a construct map of scientific argumentation skills of lower secondary school students, and (2) develop and verify the quality of the measure for assessment of scientific argumentation skills. The sample consisted of 514 lower secondary school students in the academic year 2020 under the Secondary Education Service Area Office 25, obtained from the total population of 31,744 students using multistage sampling. The development of the assessment measure followed the construct modeling approach which had four steps: 1) developing a construct map of scientific argumentation skills; 2) designing items; 3) determining an outcome space; and 4) analyzing data through the MRCML model based on Wright Map. The results were as follows:
(1) The construct map of scientific argumentation skills featured two dimensions, namely development of argumentation elements and use of scientific knowledge. The former had 4 levels, ranging from level one—drawing an irrelevant conclusion, to level four—able to construct a counterclaim with justification. The latter also had 4 levels, ranging from level one—explaining with irrelevant scientific content or the student cannot recall scientific content, to level four—the student can use complex scientific knowledge to explain the matter.
(2) The measure for assessment of scientific argumentation skills consisted of 22 open-ended questions using dichotomous scoring (0-1) and polytomous scoring (0-3); 12 of the questions measured development of argumentation elements while the rest were concerned with the use of scientific knowledge.
(3) The results of the verification of the measure reflected validity evidences. It was found that: on the aspect of the validity of the content of the questions, their difficulty covered the range of students' skills; that is, they could explain scientific argumentation skills; on the aspect of students’ responses, it was found that the students could understand the content or situation of the questions as intended in this study; and the aspect of internal structure, the questions could be employed to measure scientific argumentation skills (Infit MNSQ range of 0.74 to 1.35) through the multidimensional model which was consistent with students’ responses, with statistical significance (gif.latex?\chi2=17.8, df=2, p<.001). The AIC and BIC of the multidimensional model were lower than those of the unidimensional model. Considering validity evidence, the EAP/PV reliability of both dimensions of the argumentation construct was 0.89, which was in the acceptable range, and the standard errors of measurement was low, having SEM gif.latex?\theta1 (development of argumentation elements) in the range of 0.38 – 0.56 while SEM gif.latex?\theta2 (use of scientific knowledge) in that of 0.58 – 1.87.


Download data is not yet available.

Article Details

Research Article


Adams, R. J., Wilson, M., & Wang, W. chung. (1997). The multidimensional random Coefficients MULTINOMIAL logit model. Applied Psychological Measurement, 21(1), 1–23.

Adams, R. J. & Khoo, S. T. (1996). QUEST: the interactive test analysis system version 2.1. The Australian Council for Educational Research.

Briggs, D. C. & Wilson, M. (2003). An Introduction to multidimensional measurement using Rasch models. Journal of Applied Measurement, 4(1), 87-100.

Faize, F. A., Husain, W., & Nisar, F. (2017). A Critical Review of Scientific Argumentation in Science Education. EURASIA Journal of Mathematics, Science and Technology Education, 14(1).

Gotwals, A. W., & Songer, N. B. (2009). Reasoning up and down a food chain: Using an assessment framework to investigate students middle knowledge. Science Education.

Hambleton, R.K., Swaminathan, H. and Rogers, H.J. (1991) Fundamentals of Item Response Theory. Sage Publications.

Lin, T. H., & Dayton, C. M. (1997). Model selection information criteria for non-nested latent class models. Journal of Educational and Behavioral Statistics, 22(3), 249–264.

Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7(4), 328.

Osborne, J., Erduran, S., & Simon, S. (2004). Enhancing the quality of argumentation in school science. Journal of Research in Science Teaching, 41(10), 994–1020.

Osborne, J. F., Henderson, J. B., Macpherson, A., Szu, E., Wild, A., & Yao, S. (2016). The development and validation of a learning progression for argumentation in science. Journal of Research in Science Teaching, 53(6), 821-846.

Osborne, J. (2010). Arguing to Learn in Science: The Role of Collaborative, Critical Discourse. Science, 328(5977), 463-466.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2). 461-464.

Siegel, H. (1995). Why Should Educators Care about Argumentation? Informal Logic, 17(2). 159-176,

Songsil, W., Pongsophon, P., Boonsoong, B., & Clarke, A. (2019). Developing scientific argumentation strategies using revised argument-driven inquiry (rADI) in science classrooms in Thailand. Asia-Pacific Science Education, 5(1).

Toulmin, S. E., Rieke, R. D., & Janik, A. (1984). An introduction to reasoning. Macmillan.

Suksiri, W., & Worain, C. (2016, December 21). Investigating Tentative Cut scores for Science Learning Area on the Ordinary National Educational Test Scores using the Construct Mapping Method: An Analysis for Further Judgments. The National Institute of Educational Testing Service.

Wilson, M. (2005). Constructing measures: An item response modeling approach (Har/Cdr edition). Routledge.

Wright, B. D., & Stone, M. H. (1979). Best Test Design: Rasch Measurement. Mesa Press.

Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. A. (2007). ACER ConQuest: Generalised Item Response Modeling Software [Computer software]. Version 2. Camberwell, Australian Council for Educational Research.

Yao, L., & Schwarz, R. D. (2006). A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests. Applied Psychological Measurement, 30(6), 469–492.