HAPPY BOOKSGIVING
Use code BOOKSGIVING during checkout to save 40%-55% on books and eBooks. Shop now.
Register your product to gain access to bonus material or receive a coupon.
Multilingual Natural Language Processing Applications is the first comprehensive single-source guide to building robust and accurate multilingual NLP systems. Edited by two leading experts, it integrates cutting-edge advances with practical solutions drawn from extensive field experience.
Part I introduces the core concepts and theoretical foundations of modern multilingual natural language processing, presenting today’s best practices for understanding word and document structure, analyzing syntax, modeling language, recognizing entailment, and detecting redundancy.
Part II thoroughly addresses the practical considerations associated with building real-world applications, including information extraction, machine translation, information retrieval/search, summarization, question answering, distillation, processing pipelines, and more.
This book contains important new contributions from leading researchers at IBM, Google, Microsoft, Thomson Reuters, BBN, CMU, University of Edinburgh, University of Washington, University of North Texas, and others.
Coverage includes
Core NLP problems, and today’s best algorithms for attacking them
This book will be invaluable for all engineers, software developers, researchers, and graduate students who want to process large quantities of text in multiple languages, in any environment: government, corporate, or academic.
Multilingual Natural Language Processing Applications: Finding the Structure of Words
Download the sample pages (includes Chapter 1 and Index)
Preface xxi
Acknowledgments xxv
About the Authors xxvii
Part I: In Theory 1
Chapter 1: Finding the Structure of Words 3
1.1 Words and Their Components 4
1.2 Issues and Challenges 8
1.3 Morphological Models 15
1.4 Summary 22
Chapter 2: Finding the Structure of Documents 29
2.1 Introduction 29
2.2 Methods 33
2.3 Complexity of the Approaches 40
2.4 Performances of the Approaches 41
2.5 Features 41
2.6 Processing Stages 48
2.7 Discussion 48
2.8 Summary 49
Chapter 3: Syntax 57
3.1 Parsing Natural Language 57
3.2 Treebanks: A Data-Driven Approach to Syntax 59
3.3 Representation of Syntactic Structure 63
3.4 Parsing Algorithms 70
3.5 Models for Ambiguity Resolution in Parsing 80
3.6 Multilingual Issues: What Is a Token? 87
3.7 Summary 92
Chapter 4: Semantic Parsing 97
4.1 Introduction 97
4.2 Semantic Interpretation 98
4.3 System Paradigms 101
4.4 Word Sense 102
4.5 Predicate-Argument Structure 118
4.6 Meaning Representation 147
4.7 Summary 152
Chapter 5: Language Modeling 169
5.1 Introduction 169
5.2 n-Gram Models 170
5.3 Language Model Evaluation 170
5.4 Parameter Estimation 171
5.5 Language Model Adaptation 176
5.6 Types of Language Models 178
5.7 Language-Specific Modeling Problems 188
5.8 Multilingual and Crosslingual Language Modeling 195
5.9 Summary 198
Chapter 6: Recognizing Textual Entailment 209
6.1 Introduction 209
6.2 The Recognizing Textual Entailment Task 210
6.3 A Framework for Recognizing Textual Entailment 219
6.4 Case Studies 238
6.5 Taking RTE Further 248
6.6 Useful Resources 252
6.7 Summary 253
Chapter 7: Multilingual Sentiment and Subjectivity Analysis 259
7.1 Introduction 259
7.2 Definitions 260
7.3 Sentiment and Subjectivity Analysis on English 262
7.4 Word- and Phrase-Level Annotations 264
7.5 Sentence-Level Annotations 270
7.6 Document-Level Annotations 272
7.7 What Works, What Doesn’t 274
7.8 Summary 277
Part II: In Practice 283
Chapter 8: Entity Detection and Tracking 285
8.1 Introduction 285
8.2 Mention Detection 287
8.3 Coreference Resolution 296
8.4 Summary 303
Chapter 9: Relations and Events 309
9.1 Introduction 309
9.2 Relations and Events 310
9.3 Types of Relations 311
9.4 Relation Extraction as Classification 312
9.5 Other Approaches to Relation Extraction 317
9.6 Events 320
9.7 Event Extraction Approaches 320
9.8 Moving Beyond the Sentence 323
9.9 Event Matching 323
9.10 Future Directions for Event Extraction 326
9.11 Summary 326
Chapter 10: Machine Translation 331
10.1 Machine Translation Today 331
10.2 Machine Translation Evaluation 332
10.3 Word Alignment 337
10.4 Phrase-Based Models 343
10.5 Tree-Based Models 350
10.6 Linguistic Challenges 354
10.7 Tools and Data Resources 356
10.8 Future Directions 358
10.9 Summary 359
Chapter 11: Multilingual Information Retrieval 365
11.1 Introduction 366
11.2 Document Preprocessing 366
11.3 Monolingual Information Retrieval 372
11.4 CLIR 378
11.5 MLIR 382
11.6 Evaluation in Information Retrieval 386
11.7 Tools, Software, and Resources 391
11.8 Summary 393
Chapter 12: Multilingual Automatic Summarization 397
12.1 Introduction 397
12.2 Approaches to Summarization 399
12.3 Evaluation 412
12.4 How to Build a Summarizer 420
12.5 Competitions and Datasets 424
12.6 Summary 426
Chapter 13: Question Answering 433
13.1 Introduction and History 433
13.2 Architectures 435
13.3 Source Acquisition and Preprocessing 437
13.4 Question Analysis 440
13.5 Search and Candidate Extraction 443
13.6 Answer Scoring 450
13.7 Crosslingual Question Answering 454
13.8 A Case Study 455
13.9 Evaluation 460
13.10 Current and Future Challenges 464
13.11 Summary and Further Reading 465
Chapter 14: Distillation 475
14.1 Introduction 475
14.2 An Example 476
14.3 Relevance and Redundancy 477
14.4 The Rosetta Consortium Distillation System 479
14.5 Other Distillation Approaches 488
14.6 Evaluation and Metrics 491
14.7 Summary 495
Chapter 15: Spoken Dialog Systems 499
15.1 Introduction 499
15.2 Spoken Dialog Systems 499
15.3 Forms of Dialog 509
15.4 Natural Language Call Routing 510
15.5 Three Generations of Dialog Applications 510
15.6 Continuous Improvement Cycle 512
15.7 Transcription and Annotation of Utterances 513
15.8 Localization of Spoken Dialog Systems 513
15.9 Summary 520
Chapter 16: Combining Natural Language Processing Engines 523
16.1 Introduction 523
16.2 Desired Attributes of Architectures for Aggregating Speech and NLP Engines 524
16.3 Architectures for Aggregation 527
16.4 Case Studies 531
16.5 Lessons Learned 540
16.6 Summary 542
16.7 Sample UIMA Code 542
Index 551