pirical item difficulty. Bachman and colleagues, on the other hand (Bachman, Davidson, Lynch, & Ryan, 1989; Bachman, Davidson, & Milanovic, 1991; Bachman, Davidson, Ryan, & Choi, in press) have found that by using a content-rating instrument based on a taxonomy of test method characteristics (Bachman, 1990b) and by training raters, a high degree of agreement among raters can be obtained, and such content ratings are related to item difficulty and item discrimina-tion. In my view, these results are not inconsistent. The research of Alderson and colleagues presents, I believe, a sobering picture of actual practice in the design and development of language tests: Test designers and experts in the field disagree about what language tests measure, and neither the designers nor the experts have a clear sense of the levels of ability measured by their tests. This research uncovers a potentially serious problem in the way language testers practice their trade. Bachman’s research, on the other hand, presents what can be accomplished in a highly controlled situation, and provides one approach to solving this problem. Thus, an important area for future research in the years to come will be in the refinement of approaches to the analysis of test method character-istics, of which content is a substantial component, and the inves-tigation of how specific characteristics of test method affect test performance. Progress will be realized in the area of language test-ing practice when insights英语论文网 【http://www.51lunwen.org】 from this area of research inform the de-sign and development of language tests. The research on test con-tent analysis that has been conducted by the University of Cam-bridge Local Examinations Syndicate, and the incorporation of that research into the design and development of EFL tests is illustrative of this kind of integrated approach (Bachman et al., 1991), The 1980s saw a wealth of research into the characteristics of test takers and how these are related to test performance, generally under the rubric of investigations into potential sources of test bias; I can do little more than list these here. A number of studies have shown differences in test performance across different cultural, linguistic or ethnic groups (e.g., Alderman & Holland, 1981; Chen & Henning, 1985; Politzer & McGroarty, 1985; Swinton & Powers, 1980; Zeidner, 1986), while others have found differential performance between sexes (e.g., Farhady, 1982; Zeidner, 1987). Other studies have found relationships between field dependence and test performance (e.g., Chapelle, 1988; Chapelle & Roberts, 1986; Hansen, 1984; Hansen & Stansfield, 1981; Stansfield & Hansen, 1983). Such studies demonstrate the effects of various test taker characteristics on test performance, and suggest that such characteristics need to be considered in both the design of language tests and in the interpretation of test scores. To date, however, no clear direction has emerged to suggest how such considerations translate into testing
本文来自:英语论文网 【http://www.51lunwen.org】 |