英语论文网

that are "sample-free." IRT can also provide "item-free" estimates of students' abilities. Naturally, a full discussion of IRT is beyond the scope of this article. However, Henning (1987) discusses the topic in terms of the steps involved in item banking for language tests and provides recipe-style descriptions of how to calculate the appropriate IRT statistics. -44- -------------------------------------------------------------------------------- Several other references may prove helpful for readers interested in more information on IRT. In language testing, Madsen and Larson (1986) use computers and IRT to study item bias, while de Jong (1986) demonstrates the use of IRT for item selection purposes. (For readers who want more technical information on applications of IRT to practical testing problems in general education, see Lord, 1980; Hambleton & Swaminathan, 1985; Andrich, 1988; Suen, 1990; Wainer & Mislevy, 1990; and Hambleton, Swaminathan, & Rogers, 1991.) I am not saying that item banking is without potential problems. Green (1988) outlines some of the problems that might be encountered in using IRT in general, and Henning (1991) discusses specific problems that may be encountered with the validity of item banking techniques in language testing settings. Another serious limitation of IRT is the large number of students that must be tested before it can responsibly be applied. Typically, IRT is only applicable for full item analysis (that is, for analysis of two or three parameters) when the numbers of students being tested are very large by the standards of most language programs, that is to say, in excess of one thousand. Smaller samples in the hundreds can be used only if the item difficulty parameter is studied. Minimal item banking can be done without computers by using file cards, and, of course, the traditional item analysis statistics can be done (using the sizes of groups typically found in language programs) with no more sophisticated equipment than a hand-held calculator. Naturally, a personal computer can make both item banking and item analysis procedures much easier and much faster. For example, standard database software can be used to do the item banking, (e.g., Microsoft Access, 1996; or Corel Paradox, 1996). For IRT analyses, more specialized software will be needed. The following are examples of computer programs that can be used for IRT analysis: TESTAT (Stenson, 1988), BIGSTEPS (Wright, Linacre, & Schulz, 1990), and PC-BILOG (Mislevy & Bock, 1986). Alternatively, the MicroCAT Testing System (1984) program can help with both item banking and IRT analyses. BestTest (1990) is another less sophisticated program, which can be used in both item banking and test creation. An example of a software program specifically designed for item banking is the PARTest (1990) program. If PARTest is used in conjunction with PARScore (1990) and PARGrade (1990), a completely integrated item banking, test analysis, and record-keeping system can be set up and integrated with a machine scoring system. Indications have also surfaced that computers may effectively be used to assemble pre-equated language tests from a bank of items (see Henning, Johnson, Boutin, & Rice, 1994). Computer-Assisted Language Testing Tests that are administered at computer terminals, or on personal computers, are called computer-assisted tests. Receptive-response items-including multiple-choice, true-false, and matching items-are fairly ea