Analysis of Quality of Critical Thinking Skills Test Based on Item Response Theory Using R-Program

This research aims to describe the quality of critical thinking skill test of fluid mechanics material in senior high schools in Makassar City. The emphasis is the content validity aspect, the characteristics of each item for the One-Parameter Logistic (1PL) model, the Two-Parameter Logistic (2PL) model, dan the Three-Parameter Logistic (3PL) model. It is a descriptive quantitative research with the subject of all student responses to the critical thinking skills test of fluid mechanics material at senior high schools in Makassar City. There were 726 student responses. The data were collected online by google form and analyzed by using descriptive quantitative technique. The results showed that the critical thinking skill test of fluid mechanics material had fulfilled the content validity. Analysis of the quality of CTS test items showed that the models of 1PL, 2PL, and 3PL were all consistent, showing that the test items were mostly able to discriminate the high abilitytestees from the low abilitytestees. Of the three logistic model approaches used to estimate the parameters of item difficulty level, item discrimination, and guess factor, the 3PL model is better than the other twomodels.


Introduction
The 21 st century learning including physics learning was characterized by 4C. Teachers must understand the 4C: creativity and innovation, collaboration, communication, and critical thinking and problem solving (Arafah, Rusyadi, Arafah&Arafah, 2020). The indicators of Critical thinking skill (CTS) are interpretation, analysis, evaluation, conclusions, explanations, and selfregulation (Facione, 2013). In addition, teachers should have digital literacy awareness consisting of information literacy, media literacy, and ICT literacy as their standard reference in providing instruments and making assessments, both in the affective, cognitive, and psychomorphic realms (Andi &Arafah, 2017;Arafah, Arafah&Arafah, 2020). To meet the demands of 21st century learning, a critical thinking skill instrument was developed in fluid mechanics material in senior high school (Arafah&Kaharuddin, 2019).No researcher has ever analyzed the instrument on fluid mechanics material. Even if there is, it is only limited to the classical theoretical approach in determining the discrimination power and difficulty level of items (Arafah&Setiyawati, 2020). This research was conducted to explore items characteristics by Item Renponse theory (IRT) approach. In this case, the One-Parameter Logistic (1PL) model, the Two-Parameter Logistic (2PL) model, and the Three-Parameter Logistic (3PL) model are used. This research aims to reveal the parameters of difficulty level of item (b i ), discrimination of item (a i ), and testee guess answers (c i ). This research also reveals the information function of item and the information function of test. Apart from these parameters, the item characteristic curve (ICC) estimation is also important in thisresearch.

PSYCHOLOGY AND EDUCATION (2021) 58(1): 1167-1174 ISSN:00333077
Literature Review The 1PL model is an estimation model of test item parameter reviewing difficulty level of item (b i ) by assumption that the discrimination of (a i ) is the same for all items and the guess answer (c i ) is zero. Preparatory to the analysis is to carry out the requirement test on the test item instrument. This requirement test is in the form of a unidimensional test and a local independence test (Naga, 2010). The unidimensional assumption is fulfilled if the items in the test instrument only measure one ability of the testees. Furthermore, local independence test is carried out to determine whether or not the testee responses to different test items are independent. The assumption of local independence depicts that the responses between one item and another are not related. The responses to an item from one testee with other testees are also not interconnected (independent). The equation for the 1PL model is described by equation 1 below (Umobong,2017). (1) Where D = 1,7  = Ability of testee b i = difficulty level of item The 2PL model is a model that focuses on both the difficulty level (bi) and discrimination power (ai). For this reason, the guess factor is still assumed to be zero or there is no guess. It means that the chance of testees to answer a test item correctly is determined by the two characteristics of the item: the difficulty level of the test item and the test item discrimination. The equation for the 2PL logistic model is written as equation 2 (Naga,2010). (2) Where D = 1,7  = Ability of testee b i = difficulty level of item a = itemdiscrimination The 3PL model is determined by three item parameters: difficulty level (bi), discrimination power (ai) and guess answer (ci) which are controlled jointly. It means that the odds of testees to answer a test item correctly is determined by the three characteristics of the item. As interpretation in 3PL model it also applies that the bi value or item difficulty index ranges from -2 to 2 and the a i value ranges from 0 to 2. For the guess answer parameter as an odds measure for the testees to guess correctly, a test is expected to have little odds of guessing even close to zero. The equation for the 3PL logistic model is written as follows (Naga,2010).
Where D = 1,7  = Ability of testee b i = difficulty level of item a = itemdiscrimination c = guess factor The test quality aspect estimated in this research is the test information function. The test information function shows how much information a test instrument provides if it is given to the testees of certain ability.

Methodology
This research is a descriptive study with quantitative approach. The subjects were all answer sheet data of State Senior High School (SMAN) students in Makassar City who had responded to the given CTS questions in 2020.
There were 726 answer sheets. The data analysis technique was theoretical and empirical data analysis. The theoretical analysis was to examine the test items with Expert Judgment ( (1996).The oneparameter logistic model is a parameter estimation model of test items reviewing the difficulty level of item (b i ) assuming that the discrimination of (a i ) is the same for all items and the guess answer (c i ) is equal to zero. The local independent test, unidimensional, and the model fit test were all fulfilled. In the 2PL model, the difficulty level (b i ) and discrimination of (a i ) were calculated by assuming that the guess factor was equal to zero, while in the 3PL model the difficulty level (b i ), the discrimination of (a i ), and the guess answer (c i ) were controlled concurrently. Both the 1PL, 2PL and 3PL models display the iteminformation in Figure 2.
The test information function curve shows that if the estimation is made based on 1PL model, the CTS test of Fluid Mechanics material provides maximum information when applied to testees whose ability between (-1) -(+1). The information function of item I () is a description of test information function. Basically, the test information function is an accumulation of the item information function. The largest item information function is shown by item 14 with I () = 0.69 and  = (+0.46) -(+0.489). Furthermore, if the estimation based on 2PL model, the CTS test of Fluid Mechanics material provides maximum information if it is applied to testees whose ability between (-1) -(+1). The test information function and item information based on the 2PL model are shown in Figure 3.
The test information function curve shows that if CTS test of Fluid Mechanics material isestimated based on 2PL model, it will provide maximum information when imposed on testees with ability between (-1) -(+1). Based on the curve in Figure  3, the largest item information function is indicated by item 34 with I () = 2.33 and  = (-0.079) -(0). Below is presented the estimation results of the characteristic of information function and item information function curve of the 40 items of CTS test of fluid mechanics in senior high schools ( figure 4). Based on the curve in Figure 4, the maximum information is shown by the testees whose ability between (+1) -(+1.178) with information value of test I () = 60.05.    Figure 5 above shows that the curve of the 40 test items moves from left to right, increasing continuously. The lowest asymptote approaches the normal ogive but never goes to zero and the highest asymptote approaches one. The highest curve is the curve for item 6 which describes that this item has odds of being answered correctly by moderate ability testees.Furthermore,thelowestcurveistheitem 25 which describes items categorized as very difficult. This Item 25 has odds of being answered correctly only by high-ability testees. Estimation results of the CTS test item discrimination for 2PL model indicates that 23 test items or 57.5% have low value discrimination with an index ranging from (+0.842) -(+1.656). 14 test items or 35.0% have moderate value discrimination with a discrimination index ranging from (1.705) -(+2,469). 3 test items or 7.5% have high value with an index ranging from (+3.101) -(+3.404). This shows that there are only 3 (7.5%) test items which could discriminate the high-ability testees and the low-ability testees. However, there are still 23 test items capable to discriminate well between the high-ability testees and the lowability testees. As to item difficulty level is found 14 test items with easy category level of difficulty with the index between (+0.044) -(+0.072). 25 test items or 62.5% have moderate category level of item difficulty with the index ranging from (+0.073) -(+0.101). Only 1 test item or 2.5% has difficult category level with an index (+0.131). Item characteristic curve (ICC) denotes the odds thetesteeswithcertainabilityhavetoanswerthe items correctly. Figure 6 below show the item characteristic curve of 2PL model.   If the estimation is based on the 2PL model, the maximum information will be given to testees with ability () between (-1) -(+1). If the estimation is based on the 3PL model, then the maximum information is indicated by testees with ability ()between (+1) -(+1.178) with test information value of I () 60.05. Hence, the test information function is theaccumulation of item information function. The greater the parameter value of item difficulty level (b i ), the greater the ability required to answer the item correctly. Item characteristic curve (ICC) states the relationship of the odds of testee to answer correctly Pi(θ) with the ability of testee (θ). Based on figure 5, the curve for item 6 has odds to be answered correctly by testee with moderate ability. Meanwhile the item 25 has odds to be answered correctly only by testee with high ability, because the item is verydifficult.
The characteristic curve of the 40 items of CTS test shows that generally the S-shaped curve is quite sloping. This indicates that test items with discrimination power, difficulty levels, and guessing odds function properly. The greater the ability of testee, the odds to answer correctly the test items is also greater and the odds of guessing answer is equal to zero.

Conclusion
Based on the research results and discussion, the conclusions are drawn as follows; The CTS test for Fluid Mechanics material can be used to measure the critical thinking skills of senior high school students in Makassar City because it has fulfilled the Content Validity. Analysis of the CTS test items quality shows that both the 1PL, 2PL, and 3PL models are consistent, showing that generally the test items are able to discriminate the high and low-ability testees. Of the three logistic model approaches used to estimate the parameters of item difficulty level, item discrimination power, and guess factor, the 3PL model is the best among other models. Thus it can be said that the CTS test for Fluid Mechanics material has goodquality.

Limitations and Future Studies
This study only investigated three indicators of process skills presented by Facione (2013): interpretation, analysis, and evaluation. Other