1、An Introduction to English Language TestingBy Chen HuilinDefinition of terms: measurement, test, evaluationMeasurement: the process of quantifying the characteristics of persons according to explicit procedures and rules.Test: a procedure designed to elicit certain behavior from which one can make i
2、nferences about certain characteristics of an individual.Evaluation: the systematic gathering of information for the purpose of making decisions. Approaches to language testing The essay-translation approach: the subjective judgment of the teacher is considered to be of paramount importance. Tests u
3、sually consist of essay writing, translation, and grammatical analysis, have a heavy literary and cultural bias.The structuralist approach: characterized by the view that language learning is chiefly concerned with the systematic acquisition of a set of habits, identify and measure the learners mast
4、ery of the separate elements and skills of the target language. It is considered essential to test one thing at a time.The integrative approach: involve the testing of language in context and is thus concerned with meaning and total communicative effect of discourse 论述. Designed to assess the learne
5、rs ability to use two or more skills simultaneously 同时地, and concerned with a global view of proficiency.The communicative approach: concerned with how language is used in communication. Success is judged in terms of the effectiveness of the communication which takes place rather than formal linguis
6、tic accuracy. Based on precise and detailed specifications of needs of the learners. Difference between approach and methodApproach: theoretical positions and beliefs about the nature of language, the nature of language learning, and the applicability of both to testing. Method: the way in which lan
7、guage or knowledge of language is elicited from a test taker.Test methodsA framework of test method facetsTest environment Familiarity of the place and equipmentPersonnelTime of testing Physical conditionsTest rubric Test organization Time allocationInstructionFacets of the inputFormatNature of lang
8、uage Facets of the expected responseFormat Nature of language Restrictions on responseRelationship between input and responseReciprocal Nonreciprocal Adaptive Characteristics of individuals Personal characteristicsAgeSexNationality Resident status Native languageLevel and type of general educationTy
9、pe and amount of preparationThe topical knowledge that test takers bring to the language testing situation Their affective schemata Their language ability Communicative language abilityA theoretical framework of communicative language abilityLanguage knowledgeOrganizational knowledgeGrammatical know
10、ledgeTextual knowledge Pragmatic knowledgeFunctional knowledge Sociolinguistic knowledgeStrategic competenceGoal setting Assessment Planning ExecutionPsychophysiological mechanisms Uses of language testsUses of language tests in educational programsThe information regarding educational outcomes is e
11、ssential to effective formal education, to make decisionsTo improve learning and teaching through appropriate changes in the program, based on feedbackTo measure educational outcomesResearch uses of language testsResearch on language proficiencyResearch on the nature of language processingResearch o
12、n the nature of language acquisitionResearch on the nature of language attritionInvestigation of effects of different instructional settings and techniques on language acquisitionClassifying types of language tests according to intended useSelection: whether or not the students should enter the prog
13、ramPlacement: placing students into appropriate groupsDiagnosis: diagnosing students areas of strength and weakness in order to determine appropriate types and levels of teaching and learning activities Progress and grading: providing continuous feedback to both the teacher and the learner for makin
14、g decisions regarding appropriate modifications in the instructional procedures and learning activities. Classifying types of language tests according to contentProficiency tests: measuring general ability or skillAptitude tests: measuring capability or potential related to language acquisition as w
15、ell as the use of language Achievement tests: measuring the extent of learning of the material presented in a particular course, textbook, or program of instructionClassifying types of language tests according to formatDirect tests: measuring ability directly in an authentic context and formatIndire
16、ct tests: fostering inference about one kind of behavior or performance through measurement of another related kind performance. Classifying types of language tests according to complexity of responseDiscrete-point tests: employing items measuring performance over a unitary set of linguistic structu
17、res or featuresIntegrative tests: measuring knowledge of a variety of language features, modes, or skills simultaneously Classifying types of language tests according to scoring Objective tests: scored with reference to a scoring key and not requiring expert judgment in the scoring process Subjectiv
18、e tests: depending on impression and opinion at the time of scoring Classifying types of language tests according to norm of referenceNorm-referenced tests: evaluating ability against a standard of mean or normative performance of a group, implying standardization through prior administration to a l
19、arge sample of examineesCriterion-referenced tests: assessing achievement or performance against a cut-off score that is determined as a reflection of mastery or attainment of specified objectives. Classifying types of language tests according to time limitSpeed tests: limiting time allowed for comp
20、letion so that the majority of examinees would not be expected to finish it, containing so easy items that, given enough time, most persons would respond correctly. Power tests: allowing sufficient time for nearly all examinees to complete it, but containing material of sufficient difficulty that it
21、 is not expected that a majority of examinees will get every item correct. Test usefulnessReliabilityValidityAuthenticity InteractivenessImpactPracticablity Reliability The consistency of the scores obtainable from a test. Test-retest method: calculated by the means of product-moment correlation of
22、two sets of scores for the same person.Parallel forms method: two tests are administered to the same sample of persons and the results are correlated using product-moment correlation.Split half reliability: dividing a test into two nearly equal parts, correlating the scores together for the two part
23、s, and adjusting the coefficient using the Spearman-Brown Prophecy Formula.Inter-rater reliability: correlation between different raters ratings of the same objects or performances, adjusted by the Spearman-Brown Prophecy Formula. Validity The extent to which a test measures the ability or knowledge
24、 that it is purported to measure.Face validity: a subjective impression, usually on the part of examinees, of the extent to which the test and its format fulfills the intended purpose of measurement. Content validity: a non-empirical expert judgment of the extent to which the content of a test is co
25、mprehensive and representative of the content domain purported to be measured by the test.Concurrent validity: the magnitude of the correlation between scores for a given test and some recognized criterion measure.Construct validity: the extent to which we can interpret a given test score as an indi
26、cator of the ability(ies), or construct(s), we want to measure.Response validity: the extent to which examinee responses to a test or questionnaire can be said to reflect the intended purpose in measurement.Predictive validity: an indication of how well a test predicts intended performance.Relations
27、hip between reliability and validity Reliability: how much of the variance in test scores is reliable variance; examining variance in test scores themselves; agreement between similar measures of the same trait. Validity: what abilities contribute to this reliable variance; examining the relationshi
28、p between test performance and factors outside the test itself; agreement between different measures of the same trait.A test cannot be valid unless it is reliable; it is quite possible for a test to be reliable but invalid.Maximizing reliability may lead to reducing validity.Authenticity the degree
29、 of correspondence between the characteristics of a given language test task to the features of a TLU task.Real-life approachThe appearance or perception of the test and how this may affect test performance and test use.The accuracy with which test performance predicts future non-test performance.In
30、teractional/ability approachThe interaction between the language user, the context, and the discoursethe extent to which test performance reflects language abilities, or construct validity. Interactiveness the extent and type of involvement of the test takers individual characteristics in accomplish
31、ing a test task.language ability language knowledgestrategic competencetopical knowledgeaffective schemata Impact the positive or negative feedback of a test on teaching and learning.washback: the effect of a test on instruction. Practicability the relationship between the resources that will be req
32、uired in the design, development, and use of the test and the resources that will be available for these activities. Reasons for test planning Providing the best means for assuring that the test will be useful for intended purposesIncreasing accountability: the ability to say what was done and what
33、was right.Increasing the amount of satisfaction we experience.Stages of test development Statement of the problemWriting specifications for the testWriting the testPretesting Validation of the testStatement of the problemWhat kind of test is it to be?What is its precise purpose?What abilities are to
34、 be tested?How detailed must the results be? How accurate must the results be?How important is washback?What constraints are set by unavailability of expertise, facilities, time? Writing specifications for the testTest specifications - the blueprint to be followed by test and item writers, and essen
35、tial in the establishment of tests construct validity.Content OperationsType of textAddresseesTopicsFormat and timingCriterial levels of performance Writing the testSampling Item writing and moderationWriting and moderation of scoring keyPretestingPurposesAssessing the usefulness of the testMaking t
36、he inferences or decisions for which the test intendedAdministering tests and collecting feedbackAnalyzing test scoresArchivingCollecting feedback for assessing usefulnessKinds of feedbackMethods of obtaining feedbackKinds of feedbackFeedback about test takers language abilityFeedback about the test
37、ing procedure itselfMethods of obtaining feedbackQuestionnaires Multiple-choice questionnairesRating scalesOpen-ended questionsThink-aloud protocolsObservation and description Interviews Item typesObjective-type itemsMultiple choiceDichotomous itemsMatching Information transferOrdering tasksEditing
38、Gap fillingClozeC-testDictationShort-answer questionsSubjectively marked testsCompositions and essays SummariesOral interviewsInformation gap activitiesGeneral problems of items What an item is actually testing?Each item should be independent of othersInstructions for all items must be clear Multipl
39、e choice itemsThe correct answer must be genuinely correctThere is only one correct answerEach wrong alternative should be attractive to at least some of the studentsMultiple choice items should be presented in contextThe correct alternative should not look so different from the distractors that it
40、stands out from the restEach option should fit equally well into the stem Item should not be independent from the reading or listening passageDichotomous items 50% possibility of getting any item right by chanceIt is necessary to have a large number of such items in order to discount the effect of c
41、hanceIncluding a third category “not given” or “does not say”MatchingTo give more alternatives than the matching task requiresEach item in the first column only matches one item in the secondInformation transfer The task can be complicated in the transfer but linguistically easyMay be culturally or
42、cognitively biasedOrdering tasksNot easy to provide words or phrases which only makes sense in one orderMarked wholly right or wholly wrongThe effort in constructing and in answering the item may not be considered EditingThere is one mistake per lineStudents should be told how many errors there are
43、Gap filling It is important to reduce the number of alternative answers to the minimum and to ensure that there are no other possible answers which are not listed in the answer keyCandidates may not think of an answer not because they have poor language but because the word does not spring to mindA
44、banked gap-filling task may be usedIt is important to tell students whether each gap is to be filled by one or by more than one wordClozeWords are deleted mechanicallyThe choice of the first deletion can have an effect on the validity of the testThere may be many possible answers for any one gapFew
45、of the items may test the aspects of language with which the tester is concernedC-testInstructions are too complicatedThe number of missing letters should be shown in each gapEnough clues should be providedDictationIt is important to be presented in the same way to all the studentsIt is not clear wh
46、ether a word is misspelt or just wrong in the process of markingIt is both time-consuming and boring to markThere may be many possible answers if students are required to write down the main pointsShort-answer questionsCandidates must know what is expected of them There are many ways of saying the s
47、ame thingCompositions and essaysInstructions must be clearThe students are required to have a wide general knowledgeGive students some information before writingSummariesIt may be impossible to know whether the test taker is poor in comprehension or in writingMarking is complexTo provide a bank of p
48、ossible words and phrasesOral interviews Only a limited vocabulary is used, not stretching the students ability to use complex structures Needs to be carefully structured to cover the aspects of language to be testedEach student is tested in a similar wayTo put candidates at easeInformation gap acti
49、vitiesDifficult t construct Having a tendency to elicit a limited range of languageThe task can be biasedPersonal response assessmentIndividual tutorialsSelf- and peer-assessmentPortfoliosWhy test grammar?Content validityWashback effectImpact on skills performance Writing specificationsSyllabus (achievement tests)Textbooks and teaching materials All the structures (placement structure)SamplingWide selectionConcentration on the most importantTests of grammar and usage Multiple-choice itemsError-recognition itemsRearrangement items Completion items Transformation itemsItems involv