1、A Balanced Approach to Health InformationEvaluation: A Vocabulary-Based Nave BayesClassifier and Readability FormulasGondy Leroy and Trudi MillerSchool of Information Systems and Technology, Claremont Graduate University, 130 E. Ninth Street, Claremont,CA 91730. E-mail: Gondy.Leroy;Trudi.Millercgu.e
2、duGraciela Rosemblat and Allen BrowneLister Hill National Center for Biomedical Communications, U.S. National Library of Medicine,Bethesda, MD 20894. E-mail: grosemblatmail.nih.gov; brownenlm.nih.govSince millions seek health information online, it is vitalfor this information to be comprehensible.
3、Most studiesuse readability formulas, which ignore vocabulary, andconclude that online health information is too difficult.We developed a vocabularly-based, nave Bayes classi-fier to distinguish between three difficulty levels in text.It proved 98% accurate in a 250-document evaluation. Wecompared o
4、ur classifier with readability formulas for 90new documents with different origins and asked repre-sentative human evaluators, an expert and a consumer,to judge each document. Average readability grade levelsfor educational and commercial pages was 10th grade orhigher, too difficult according to cur
5、rent literature. In con-trast, the classifier showed that 7090% of these pageswere written at an intermediate, appropriate level indi-cating that vocabulary usage is frequently appropriate intext considered too difficult by readability formula eval-uations. The expert considered the pages more diffi
6、cultfor a consumer than the consumer did.IntroductionPeople have always searched for information to maintainor improve their health. This information used to come fromhealthcare providers (doctors, nurses) or from close familyand friends. In recent years, the health information exchangehas changed a
7、nd millions now look online for information.Todays online consumers are not only people in poor healthwho want to get healthy but also healthy people who want toremain healthy. Baker et al. (2003) reported in 2003 that 40%of their 60,000 household sample looked for health infor-mation online. They f
8、ound that for at least a third of theirReceived November 7, 2007; revised January 2, 2008; accepted January 2,2008.This is a U.S. Government work and, as such, is in the public domain inthe United States of America. Published online 28 April 2008 in WileyInterScience (). DOI: 10.1002/asi.20837respon
9、dents, the online health information affected decisionsabout health, healthcare, and visits to a healthcare provider.Warner and Procaccino (2004) reported a stronger influence,as 80% of the women they interviewed claimed that onlineinformation affected their treatment decisions.Consumers not only lo
10、ok to the Internet to get informationbut also, increasingly, claim a stake in providing infor-mation. Information providers now include many patientswho communicate with one another and provide adviceonline. Johnson and Ambrose (2006) report that almost 30%of Internet users participated in medical o
11、r health-relatedgroups. In addition to consumers themselves, there are thetypical information providers, such as clinicians, hospitals,the government, and libraries, as well as pharmaceuticalcompanies and other commercial enterprises.When there are millions of online Web pages and millionsof readers
12、, the usability, trustworthiness, and readability ofthis information are no small matter. Human-computer inter-action has focused on optimal Web site usability for averageusers and increasingly for groups with special needs. Forexample, Becker (2004) evaluated 125 Web sites based onguidelines provid
13、ed by the National Institute on Aging andfound that, counter to current recommendations (Morrell,2005), too many homepages still used a small font, werelengthy, required scrolling, did not allow for font resizing,and used pull-down menus. Others focused more on optimaldesign for online health commun
14、ities (Neal et al., 2006). Inaddition to usability, the trustworthiness of information alsorequires evaluation, and researchers such as Gaudinat et al.(2006) are trying to help consumers assess the credibility ofonline health information. Finally, the text itself cannot beignored. This type of evalu
15、ation checks if all necessary con-tent is included and if it is presented at an appropriate readinglevel. When there are mismatches, two approaches can befollowed according to Parker and Kreps (2005): CommunityJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 59(9):14091419, 20
16、08programs can be developed to increase the health literacyof consumers, or information providers can provide easier,alternative versions to consumers.Most ongoing research on the readability of healthcareinformation looks at the content or the readability of theonline text. We discuss both in the n
17、ext section. Our workalso focuses on readability, but it differs from existingapproaches in that it provides a second, complementarymethod for assessing readability. We use a vocabulary-basednave Bayes classifier to categorize documents into threereadability groups. This work advances evaluation of
18、read-ability of online health information. Developing tools thathelp providers assess information allows them to tune theinformation so that it becomes more understandable. Whenconsumers better understand the information, they becomemore knowledgeable and are able to ask more informed ques-tions of
19、their caregivers (Fox Q. T. Zeng et al., 2005) that also indi-cates how easy a term is to understand for this sector of thepopulation (Keselman et al., 2006). We found that patientswho blogged used easier to understand terms but also dis-cussed easier concepts (Leroy, Eryilmaz, we did retain all wor
20、dsbut not numbers.Our classifier calculates the probability that a text belongsto the easy, intermediate, or difficult group. For each docu-ment, the three hypotheses (easy, intermediate, difficult) arecalculated. The evidence consists of the vocabulary containedin the document. In our case, the com
21、parison between thethree required probabilities can be simplified. Becausethe evidence being evaluated (the document) does notchange, the denominator can be dropped for the comparison.Moreover, because we do not know how many easy, interme-diate, or difficult documents are presented on the Internet,
22、 weassume that the numbers are approximately equal, and we fur-ther simplify the calculations by ignoring p(h). We found thateven without this information, the classifier is very accurate(see results). The final probability to be calculated is p(e|h),which is formally calculated (see Equation 2) for
23、 each cat-egory by multiplying for all the words in the document theprobability of occurrence in that specific class or:p(Doc|Catj) =productdisplayip(wordi|Catj) (2)where:Doc = document being classifiedCatj= the category being tested: easy, intermediate, or difficult classWordi= word in the document
24、Smoothing. When classifying a new text, words will befound in that test document that do not appear in the train-ing corpus. This results in zero probabilities for those wordsand they decrease the accuracy of the classifier. To avoid thesezero probabilities, we used add-lambda smoothing to approx-im
25、ate their frequency. Add-lambda smoothing uses a positiveprobability to recognize the likelihood of encountering a wordunobserved in the training set. This approach was success-fully used before by others, such as Dreyer and Eisner (DreyerBerland et al. (2001) looked at breast cancer, childhoodasthm
26、a, depression, and obesity.Our goal is not to evaluate as many Web sites as possible,but to investigate the complementary nature of vocabulary-based measures with readability formulas. We collecteddocuments discussing three common conditions, melanoma,depression and prostate cancer, from commercial
27、Web sites,government/educational Web sites, and those provided byconsumer groups themselves. All three represent informa-tion sources that patients will encounter online. We searchedwith Google to find the Web sites. The commercial sites wereonly selected when they offered a treatment, drug, or ther
28、apy.We included alternative or complementary medicine sitesbecause patients will read them when looking for information(Walji, Sagaram, Meric-Bernstam, Johnson, Gemoets et al., 2004), imple-ments a number of readability formulas, including those usedin this study. The tool is written in Java with aW
29、eb-based frontend. It uses tokenization and variant-generation softwaredeveloped at NLM and publicly available syllable-counters.This tool provides an average score based on averaging theresults given out by the application of these different formu-las on a given text. It also provides other informa
30、tion such assentence count, word count, words per sentence, and type-to-token ratio. Interestingly, the readability formulas pro-vided as part of Microsoft Word do not assign grade levelshigher than 12th. Thus, we did not deem this tool appropriatefor our task.Manual Evaluation: Expert and Consumer
31、JudgmentsTo provide an additional evaluation that was not machine-based, we invited a representative expert and a consumer toevaluate each document in our set. The expert has spent overJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJuly 2008 1413DOI: 10.1002/asiTABLE 3. Overvi
32、ew of definitions provided to evaluate documents.Instructions provided toExpert ConsumerDocument vocabularyEasy medical vocabulary used by the average consumer here are medical terms that you would use in conversationIntermediate medical vocabulary used in consumer health education after reading the
33、 whole document or after asking for helpwith a few words, you understand the medical terms usedDifficult medical vocabulary typically used by health professionals there are many medical terms you do not understandbut not by consumers.Document structureEasy a manner of speaking or syntactic construct
34、ions typically used by this has a structure that you would writethe average consumerIntermediate a manner of speaking or syntactic constructions typically this has a structure that you can understandused in consumer health educationDifficult a manner of speaking or syntactic constructions this has a
35、 structure that health professionals would writetypically used by health professionalsOverall evaluationEasy understood by the average consumer without the need to consult you can understand the document without helpreference sources or his/her network of friends/familyIntermediate understood as con
36、sumer health education you can understand the document with the help ofreferences or your network of friends/familyDifficult understood by medical professionals but usually not by the difficult or impossible to understand; might“typical” consumer be understood by medical professionals25 years in Ref
37、erence and Information Services departmentsof academic medical libraries that also served patients andthe public. She chaired committees that prepared consumerpamphlets, taught medical terminology to staff and students,gave lectures to cancer survivors and families, and workedon projects that analyz
38、ed consumer Web site materials. Werequested that she provide her expert opinion on the overallreadability (in terms of appropriate audience) for the texts.Since research tells us that the average American readinglevel is only as high as the ninth grade, we asked her to makeher determinations based o
39、n consumers with no higher thana ninth-grade reading level. We specifically asked her not torely on readability formulas at all. Our consumer representa-tive is a 55-year-old native English speaker without a medicalor healthcare background. Her highest education level wasHigh School, completed 37 ye
40、ars ago. She earned additionalcertifications but they are unrelated to medicine or health-care. Both evaluators assessed the vocabulary, structure, andoverall appearance of documents. They received an hourlycompensation for their task. Table 3 provides an overview ofhow “easy,” “intermediate,” and “
41、difficult” were defined foreach in their respective instructions.Corpus EvaluationWe used both the Readability Analyzer and the naveBayes classifier to evaluate the 90 documents. We reporttwo measures from the Readability Analyzer: the Flesch-Kincaid grade level and the average grade level based ont
42、he five formulas in Table 2. From the classifier, we report thefinal classification (easy, intermediate, difficult) that the doc-uments received. All 90 documents were also shown to bothhuman evaluators. They were provided with all documents intext format and a spreadsheet to indicate their evaluati
43、ons.ResultsWe first describe the Readability Analyzer and classi-fier results for the documents by origin and topic. Then,we describe the evaluations by the expert and consumer.We complete the analysis by calculating and comparing thecorrelations between the different evaluations.Grade levels. Overa
44、ll, the Flesch-Kincaid metrics scoredcommercial documents at the 12th-grade level (12.1), theconsumer documents at just under the eighth-grade level(7.8), and the governmental/non-profit documents at the11th-grade level (11.0). The five-formula average providesa slightly higher estimate, with commer
45、cial documents atalmost the 14th-grade level (13.7), consumer documentsat the eighth-grade level (8.2), and government/non-profitdocuments at the 12th-grade level (12.4). Figure 1 showsa detailed overview of the readability of the texts based onthe Flesch-Kincaid formula and on the Readability Analy
46、zerfive-formula average for the different types and topics of thedocuments.To evaluate if the differences were significant, we per-formed two 3 3 ANOVAs with origin and topic as theindependent variables, and either the Flesch-Kincaid Grade1414 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
47、AND TECHNOLOGYJuly 2008DOI: 10.1002/asiFIG. 1. Readability scores.Level or the Readability Analyzer average as the depen-dent variable. The results were identical for both dependentvariables. We found a significant main effect for origin forthe Flesch-Kincaid Readability, F(2,81) = 22.9, p.001,and f
48、or the Readability Analyzer average, F(2, 81) = 43.7,p.001. There was no significant effect for topic or for theinteraction between origin and topic. Post-hoc contrasts indi-cated that the differences between pages from consumer ver-sus commercial sources (p.001, Bonferroni adjustment) orgovernment
49、sources (p.001, Bonferroni adjustment) weresignificant for both measures. The difference between pagesfrom commercial versus government/non-profit sources wasnot significant.Classifier levels. We then used the classifier to assign oneof three levels (easy, intermediate, or difficult) to each doc-ument. Figure 2 provides an overview of the scores fordocuments according to their origin and topic. Commercialand government/non-profit pages scored on average at theintermediate level, with pages on prostate cancer slightlymore difficult. In addition to ave