1、IV Criterion of a good language test,Reliability validity appeared with the development of the second generation in language tests Though being challenged by the third generation still remain very important in language testing studies.,Critics,Some think that they can not be separated.,Others consid
2、er them as two entirely distinct concepts,Bachman believes both can be better understood by recognizing them as complementary aspects of a Common concern in measurement -identifying, estimating and controlling the effects of factors thataffect test scores. (Bachman: 160),4.1. Reliability,100 student
3、s A 100-item test On Wednesday afternoon and Thursday afternoon Test is excellently designed Conditions of administration almost identical No subjective judgment on the part of the scorers (though impossible) Carried out with perfect care No learning or forgetting taking place during one day interva
4、l,Would we expect the students to have got exactly the same scores?,The answer: no!,This is inevitable. We must accept it.,What we have to do is:,To construct, administer and score the tests in such a way that the scores of the two tests taken in different days are likely to be very similar to each
5、other. Same student, with the same ability , at a different time. The more similar, the more reliable the test is.,4.1.1 Definition:,Reliability is concerned with the question, “How much of an individuals test performance is due to measurement errors, to factors other than the language ability we wa
6、nt to measure?”(Bachman: 160),4.1.2 Factors that affect reliability,1. Length of test, more choices, more reliable, 2.Homogeneity(同质性)similar in dificulty, test form, number of items, coverage, test lay out and directions 3. Power of discrimination: the stronger, the more reliable.,4. Similar test c
7、onditions ( including health) 5. Grading method,4.1.3 How to make tests more reliable?,1). Have enough samples, but too long a test also decreases reliability. 2). Do not allow candidates too much freedom. 3). Write unambiguous items Where does the author direct the reader who is interested in non-s
8、tandard dialects of English? Some answered: P3 But the expected answer: further reading section of the book.,Compare the following writing task:,A) Write a composition on tourism. B) write a composition on tourism in this country. C)write a composition on how we might develop the tourist industry in
9、 this country. D) discuss the following measures intended to increase the number of foreign tourists coming to this country:,I) more /better advertising and /or information (where? What form should it take?) II) Improve facilities(hotels, transportation, communication) III) training of personnel (gu
10、ides hotel managers),The successive tasks impose more and more control over what is written. The fourth task is likely to be much more reliable for a writing test Where does the author direct the reader who is interested in non-standard dialects of English? Some answered: P3 But the expected answer:
11、 further reading section of the book.,4.1.3 How to make the test more reliable?,4). Provide clear and explicit instructions.5). Ensure that tests are well laid out and perfectly legible.6). Candidates should be familiar with forms and testing techniques.7). Provide uniform and non-distracting condit
12、ions of administration.,4.1.3 How to make test more reliable,8). Use items that permit scoring which is as objective as possible. 9). Make comparison between candidates as direct as possible. (one topic rather than different ones) 10). Provide detailed criterion for scoring key. For composition or o
13、ral test, representatives of different levels should be selected. Only when all scorers are agreed on the scores, should real scoring begin. 11). Train scorers 12. Identify candidates by number not name 13). Employ multiple, independent scorers, compare the two sets of scores and investigate discrep
14、ancies.,4.1.4 How to know if the test scores are reliable?(coefficient of reliability),3 ways(Gui: 130 Shu:61 ) 1). Test-retest method( use the same test paper with interval between) risk: remember 2). Equivalent forms method (use two papers at two different times(2 weeks): advantage: avoid mechanic
15、al repetition. Risk: really equivalent. 3).split half method(use just one test paper): first half one time, second half the other. Or odd number one time, even number the other.,If one test paper has high coefficient, it can be used as an equivalent for identifying if an other testis reliable.(Gui: 1986),Questions for thought:,1. Look at your own instructional tests. Use the list of points in the chapter to say in what ways you could improve the reliability. 2. Zhou Shen: 43:2,