1、DRAFTSpeech and Language Processing: An introduction to natural language processing,computational linguistics, and speech recognition. Daniel Jurafsky we will cover the algorithms currently used in the field, as well asimportant component tasks.Many other language processing tasks are also related t
2、o the Web. Another suchtask is Web-based question answering. This is a generalization of simple web search,QUESTIONANSWERINGwhere instead of just typing keywords a user might ask complete questions, rangingfrom easy to hard, like the following:DRAFT2 Chapter 1. Introduction What does “divergent” mea
3、n? What year was Abraham Lincoln born? How many states were in the United States that year? How much Chinese silk was exported to England by the end of the 18th century? What do scientists think about the ethics of human cloning?Some of these, such as definition questions, or simple factoid question
4、s like datesand locations, can already be answered by search engines. But answering more com-plicated questions might require extracting information that is embedded in other texton a Web page, or doing inference (drawing conclusions based on known facts), orsynthesizing and summarizing information
5、from multiple sources or web pages. In thistext we study the various components that make up modern understanding systems ofthis kind, including information extraction, word sense disambiguation, and so on.Although the subfields and problems weve described above are all very far fromcompletely solve
6、d, these are all very active research areas and many technologies arealready available commercially. In the rest of this chapter we briefly summarize thekinds of knowledge that is necessary for these tasks (and others like spell correction,grammar checking, and so on), as well as the mathematical mo
7、dels that will be intro-duced throughout the book.1.1 KNOWLEDGE IN SPEECH AND LANGUAGE PROCESSINGWhat distinguishes language processing applications from other data processing sys-tems is their use of knowledge of language. Consider the Unix wc program, which isused to count the total number of byte
8、s, words, and lines in a text file. When used tocount bytes and lines, wc is an ordinary data processing application. However, when itis used to count the words in a file it requires knowledge about what it means to be aword, and thus becomes a language processing system.Of course, wc is an extremel
9、y simple system with an extremely limited and im-poverished knowledge of language. Sophisticated conversational agents like HAL,or machine translation systems, or robust question-answering systems, require muchbroader and deeper knowledge of language. To get a feeling for the scope and kind ofrequir
10、ed knowledge, consider some of what HAL would need to know to engage in thedialogue that begins this chapter, or for a question answering system to answer one ofthe questions above.HAL must be able to recognize words from an audio signal and to generate anaudio signal from a sequence of words. These
11、 tasks of speech recognition and speechsynthesis tasks require knowledge about phonetics and phonology; how words arepronounced in terms of sequences of sounds, and how each of these sounds is realizedacoustically.Note also that unlike Star Treks Commander Data, HAL is capable of producingcontractio
12、ns like Im and cant. Producing and recognizing these and other variationsof individual words (e.g., recognizing that doors is plural) requires knowledge aboutmorphology, the way words break down into component parts that carry meanings likesingular versus plural.DRAFTSection 1.1. Knowledge in Speech
13、 and Language Processing 3Moving beyond individual words, HAL must use structural knowledge to properlystring together the words that constitute its response. For example, HAL must knowthat the following sequence of words will not make sense to Dave, despite the fact thatit contains precisely the sa
14、me set of words as the original.Im I do, sorry that afraid Dave Im cant.The knowledge needed to order and group words together comes under the heading ofsyntax.Now consider a question answering system dealing with the following question: How much Chinese silk was exported to Western Europe by the en
15、d of the 18thcentury?In order to answer this question we need to know something about lexical seman-tics, the meaning of all the words (export, or silk) as well as compositional semantics(what exactly constitutes Western Europe as opposed to Eastern or Southern Europe,what does end mean when combine
16、d with the 18th century. We also need to knowsomething about the relationship of the words to the syntactic structure. For examplewe need to know that by the end of the 18th century is a temporal end-point, and not adescription of the agent, as the by-phrase is in the following sentence: How much Ch
17、inese silk was exported to Western Europe by southern merchants?We also need the kind of knowledge that lets HAL determine that Daves utteranceis a request for action, as opposed to a simple statement about the world or a questionabout the door, as in the following variations of his original stateme
18、nt.REQUEST: HAL, open the pod bay door.STATEMENT: HAL, the pod bay door is open.INFORMATION QUESTION: HAL, is the pod bay door open?Next, despite its bad behavior, HAL knows enough to be polite to Dave. It could,for example, have simply replied No or No, I wont open the door. Instead, it firstembell
19、ishes its response with the phrases Im sorry and Im afraid, and then only indi-rectly signals its refusal by saying I cant, rather than the more direct (and truthful) Iwont.1 This knowledge about the kind of actions that speakers intend by their use ofsentences is pragmatic or dialogue knowledge.Ano
20、ther kind of pragmatic or discourse knowledge is required to answer the ques-tion How many states were in the United States that year?What year is that year? In order to interpret words like that year a question answer-ing system need to examine the the earlier questions that were asked; in this cas
21、e theprevious question talked about the year that Lincoln was born. Thus this task of coref-erence resolution makes use of knowledge about how words like that or pronouns likeit or she refer to previous parts of the discourse.To summarize, engaging in complex language behavior requires various kinds
22、 ofknowledge of language:1 For those unfamiliar with HAL, it is neither sorry nor afraid, nor is it incapable of opening the door. Ithas simply decided in a fit of paranoia to kill its crew.DRAFT4 Chapter 1. Introduction Phonetics and Phonology knowledge about linguistic sounds Morphology knowledge
23、of the meaningful components of words Syntax knowledge of the structural relationships between words Semantics knowledge of meaning Pragmatics knowledge of the relationship of meaning to the goals and inten-tions of the speaker. Discourse knowledge about linguistic units larger than a single utteran
24、ce1.2 AMBIGUITYA perhaps surprising fact about these categories of linguistic knowledge is that mosttasks in speech and language processing can be viewed as resolving ambiguity at oneAMBIGUITYof these levels. We say some input is ambiguous if there are multiple alternative lin-AMBIGUOUSguistic struc
25、tures that can be built for it. Consider the spoken sentence I made her duck.Heres five different meanings this sentence could have (see if you can think of somemore), each of which exemplifies an ambiguity at some level:(1.1) I cooked waterfowl for her.(1.2) I cooked waterfowl belonging to her.(1.3
26、) I created the (plaster?) duck she owns.(1.4) I caused her to quickly lower her head or body.(1.5) I waved my magic wand and turned her into undifferentiated waterfowl.These different meanings are caused by a number of ambiguities. First, the words duckand her are morphologically or syntactically a
27、mbiguous in their part-of-speech. Duckcan be a verb or a noun, while her can be a dative pronoun or a possessive pronoun.Second, the word make is semantically ambiguous; it can mean create or cook. Finally,the verb make is syntactically ambiguous in a different way. Make can be transitive,that is, t
28、aking a single direct object (1.2), or it can be ditransitive, that is, taking twoobjects (1.5), meaning that the first object (her) got made into the second object (duck).Finally, make can take a direct object and a verb (1.4), meaning that the object (her) gotcaused to perform the verbal action (d
29、uck). Furthermore, in a spoken sentence, thereis an even deeper kind of ambiguity; the first word could have been eye or the secondword maid.We will often introduce the models and algorithms we present throughout the bookas ways to resolve or disambiguate these ambiguities. For example deciding whet
30、herduck is a verb or a noun can be solved by part-of-speech tagging. Deciding whethermake means “create” or “cook” can be solved by word sense disambiguation. Reso-lution of part-of-speech and word sense ambiguities are two important kinds of lexicaldisambiguation. A wide variety of tasks can be fra
31、med as lexical disambiguationproblems. For example, a text-to-speech synthesis system reading the word lead needsto decide whether it should be pronounced as in lead pipe or as in lead me on. Bycontrast, deciding whether her and duck are part of the same entity (as in (1.1) or (1.4)or are different
32、entity (as in (1.2) is an example of syntactic disambiguation and canDRAFTSection 1.3. Models and Algorithms 5be addressed by probabilistic parsing. Ambiguities that dont arise in this particu-lar example (like whether a given sentence is a statement or a question) will also beresolved, for example
33、by speech act interpretation.1.3 MODELS AND ALGORITHMSOne of the key insights of the last 50 years of research in language processing is thatthe various kinds of knowledge described in the last sections can be captured throughthe use of a small number of formal models, or theories. Fortunately, thes
34、e models andtheories are all drawn from the standard toolkits of computer science, mathematics, andlinguistics and should be generally familiar to those trained in those fields. Among themost important models are state machines, rule systems, logic, probabilistic models,and vector-space models. Thes
35、e models, in turn, lend themselves to a small numberof algorithms, among the most important of which are state space search algorithmssuch as dynamic programming, and machine learning algorithms such as classifiersand EM and other learning algorithms.In their simplest formulation, state machines are
36、 formal models that consist ofstates, transitions among states, and an input representation. Some of the variationsof this basic model that we will consider are deterministic and non-deterministicfinite-state automata and finite-state transducers.Closely related to these models are their declarative
37、 counterparts: formal rule sys-tems. Among the more important ones we will consider are regular grammars andregular relations, context-free grammars, feature-augmented grammars, as wellas probabilistic variants of them all. State machines and formal rule systems are themain tools used when dealing w
38、ith knowledge of phonology, morphology, and syntax.The third model that plays a critical role in capturing knowledge of language islogic. We will discuss first order logic, also known as the predicate calculus, as wellas such related formalisms as lambda-calculus, feature-structures, and semantic pr
39、imi-tives. These logical representations have traditionally been used for modeling seman-tics and pragmatics, although more recent work has focused on more robust techniquesdrawn from non-logical lexical semantics.Probabilistic models are crucial for capturing every kind of linguistic knowledge.Each
40、 of the other models (state machines, formal rule systems, and logic) can be aug-mented with probabilities. For example the state machine can be augmented withprobabilities to become the weighted automaton or Markov model. We will spenda significant amount of time on hidden Markov models or HMMs, wh
41、ich are usedeverywhere in the field, in part-of-speech tagging, speech recognition, dialogue under-standing, text-to-speech, and machine translation. The key advantage of probabilisticmodels is their ability to to solve the many kinds of ambiguity problems that we dis-cussed earlier; almost any spee
42、ch and language processing problem can be recast as:“given N choices for some ambiguous input, choose the most probable one”.Finally, vector-space models, based on linear algebra, underlie information retrievaland many treatments of word meanings.Processing language using any of these models typical
43、ly involves a search throughDRAFT6 Chapter 1. Introductiona space of states representing hypotheses about an input. In speech recognition, wesearch through a space of phone sequences for the correct word. In parsing, we searchthrough a space of trees for the syntactic parse of an input sentence. In
44、machine trans-lation, we search through a space of translation hypotheses for the correct translation ofa sentence into another language. For non-probabilistic tasks, such as state machines,we use well-known graph algorithms such as depth-first search. For probabilistictasks, we use heuristic varian
45、ts such as best-first and A* search, and rely on dynamicprogramming algorithms for computational tractability.For many language tasks, we rely on machine learning tools like classifiers andsequence models. Classifiers like decision trees, support vector machines, GaussianMixture Models and logistic
46、regression are very commonly used. A hidden Markovmodel is one kind of sequence model; other are Maximum Entropy Markov Modelsor Conditional Random Fields.Another tool that is related to machine learning is methodological; the use of dis-tinct training and test sets, statistical techniques like cros
47、s-validation, and careful eval-uation of our trained systems.1.4 LANGUAGE, THOUGHT, AND UNDERSTANDINGTo many, the ability of computers to process language as skillfully as we humans dowill signal the arrival of truly intelligent machines. The basis of this belief is the factthat the effective use of
48、 language is intertwined with our general cognitive abilities.Among the first to consider the computational implications of this intimate connectionwas Alan Turing (1950). In this famous paper, Turing introduced what has come to beknown as the Turing Test. Turing began with the thesis that the quest
49、ion of what itTURINGTESTwould mean for a machine to think was essentially unanswerable due to the inherentimprecision in the terms machine and think. Instead, he suggested an empirical test, agame, in which a computers use of language would form the basis for determining ifit could think. If the machine could win the game it would be judged intelligent.In Turings game, there are three participants: two people and a computer. One ofthe people is a contestant and plays the role of an interrogator. To win, the interrogatormust determine which of the ot