Multilingual Natural Language Processing Applications: From Theory to Practice

By Daniel Bikel, Imed Zitouni
Published May 1, 2012 by IBM Press. Part of the IBM Press series.

eBook

Your Price: $107.09
List Price: $125.99
About Watermarked eBooks

This PDF will be accessible from your Account page after purchase and requires PDF reading software, such as Acrobat® Reader®.

The eBook requires no passwords or activation to read. We customize your eBook by discreetly watermarking it with your name, making it uniquely yours.

Watermarked eBook FAQ

Add to cart

Description

Sample Content

Updates

More Information

Description

Copyright 2012
Dimensions: 7" x 9-1/8"
Edition: 1st

eBook
ISBN-10: 0-13-704780-0
ISBN-13: 978-0-13-704780-2

Multilingual Natural Language Processing Applications is the first comprehensive single-source guide to building robust and accurate multilingual NLP systems. Edited by two leading experts, it integrates cutting-edge advances with practical solutions drawn from extensive field experience.

Part I introduces the core concepts and theoretical foundations of modern multilingual natural language processing, presenting today’s best practices for understanding word and document structure, analyzing syntax, modeling language, recognizing entailment, and detecting redundancy.

Part II thoroughly addresses the practical considerations associated with building real-world applications, including information extraction, machine translation, information retrieval/search, summarization, question answering, distillation, processing pipelines, and more.

This book contains important new contributions from leading researchers at IBM, Google, Microsoft, Thomson Reuters, BBN, CMU, University of Edinburgh, University of Washington, University of North Texas, and others.

Coverage includes

Core NLP problems, and today’s best algorithms for attacking them

Processing the diverse morphologies present in the world’s languages
Uncovering syntactical structure, parsing semantics, using semantic role labeling, and scoring grammaticality
Recognizing inferences, subjectivity, and opinion polarity
Managing key algorithmic and design tradeoffs in real-world applications
Extracting information via mention detection, coreference resolution, and events
Building large-scale systems for machine translation, information retrieval, and summarization
Answering complex questions through distillation and other advanced techniques
Creating dialog systems that leverage advances in speech recognition, synthesis, and dialog management
Constructing common infrastructure for multiple multilingual text processing applications

This book will be invaluable for all engineers, software developers, researchers, and graduate students who want to process large quantities of text in multiple languages, in any environment: government, corporate, or academic.



Sample Content

Preface xxi

Acknowledgments xxv

About the Authors xxvii

Part I: In Theory 1

Chapter 1: Finding the Structure of Words 3

1.1 Words and Their Components 4

1.2 Issues and Challenges 8

1.3 Morphological Models 15

1.4 Summary 22

Chapter 2: Finding the Structure of Documents 29

2.1 Introduction 29

2.2 Methods 33

2.3 Complexity of the Approaches 40

2.4 Performances of the Approaches 41

2.5 Features 41

2.6 Processing Stages 48

2.7 Discussion 48

2.8 Summary 49

Chapter 3: Syntax 57

3.1 Parsing Natural Language 57

3.2 Treebanks: A Data-Driven Approach to Syntax 59

3.3 Representation of Syntactic Structure 63

3.4 Parsing Algorithms 70

3.5 Models for Ambiguity Resolution in Parsing 80

3.6 Multilingual Issues: What Is a Token? 87

3.7 Summary 92

Chapter 4: Semantic Parsing 97

4.1 Introduction 97

4.2 Semantic Interpretation 98

4.3 System Paradigms 101

4.4 Word Sense 102

4.5 Predicate-Argument Structure 118

4.6 Meaning Representation 147

4.7 Summary 152

Chapter 5: Language Modeling 169

5.1 Introduction 169

5.2 n-Gram Models 170

5.3 Language Model Evaluation 170

5.4 Parameter Estimation 171

5.5 Language Model Adaptation 176

5.6 Types of Language Models 178

5.7 Language-Specific Modeling Problems 188

5.8 Multilingual and Crosslingual Language Modeling 195

5.9 Summary 198

Chapter 6: Recognizing Textual Entailment 209

6.1 Introduction 209

6.2 The Recognizing Textual Entailment Task 210

6.3 A Framework for Recognizing Textual Entailment 219

6.4 Case Studies 238

6.5 Taking RTE Further 248

6.6 Useful Resources 252

6.7 Summary 253

Chapter 7: Multilingual Sentiment and Subjectivity Analysis 259

7.1 Introduction 259

7.2 Definitions 260

7.3 Sentiment and Subjectivity Analysis on English 262

7.4 Word- and Phrase-Level Annotations 264

7.5 Sentence-Level Annotations 270

7.6 Document-Level Annotations 272

7.7 What Works, What Doesn’t 274

7.8 Summary 277

Part II: In Practice 283

Chapter 8: Entity Detection and Tracking 285

8.1 Introduction 285

8.2 Mention Detection 287

8.3 Coreference Resolution 296

8.4 Summary 303

Chapter 9: Relations and Events 309

9.1 Introduction 309

9.2 Relations and Events 310

9.3 Types of Relations 311

9.4 Relation Extraction as Classification 312

9.5 Other Approaches to Relation Extraction 317

9.6 Events 320

9.7 Event Extraction Approaches 320

9.8 Moving Beyond the Sentence 323

9.9 Event Matching 323

9.10 Future Directions for Event Extraction 326

9.11 Summary 326

Chapter 10: Machine Translation 331

10.1 Machine Translation Today 331

10.2 Machine Translation Evaluation 332

10.3 Word Alignment 337

10.4 Phrase-Based Models 343

10.5 Tree-Based Models 350

10.6 Linguistic Challenges 354

10.7 Tools and Data Resources 356

10.8 Future Directions 358

10.9 Summary 359

Chapter 11: Multilingual Information Retrieval 365

11.1 Introduction 366

11.2 Document Preprocessing 366

11.3 Monolingual Information Retrieval 372

11.4 CLIR 378

11.5 MLIR 382

11.6 Evaluation in Information Retrieval 386

11.7 Tools, Software, and Resources 391

11.8 Summary 393

Chapter 12: Multilingual Automatic Summarization 397

12.1 Introduction 397

12.2 Approaches to Summarization 399

12.3 Evaluation 412

12.4 How to Build a Summarizer 420

12.5 Competitions and Datasets 424

12.6 Summary 426

Chapter 13: Question Answering 433

13.1 Introduction and History 433

13.2 Architectures 435

13.3 Source Acquisition and Preprocessing 437

13.4 Question Analysis 440

13.5 Search and Candidate Extraction 443

13.6 Answer Scoring 450

13.7 Crosslingual Question Answering 454

13.8 A Case Study 455

13.9 Evaluation 460

13.10 Current and Future Challenges 464

13.11 Summary and Further Reading 465

Chapter 14: Distillation 475

14.1 Introduction 475

14.2 An Example 476

14.3 Relevance and Redundancy 477

14.4 The Rosetta Consortium Distillation System 479

14.5 Other Distillation Approaches 488

14.6 Evaluation and Metrics 491

14.7 Summary 495

Chapter 15: Spoken Dialog Systems 499

15.1 Introduction 499

15.2 Spoken Dialog Systems 499

15.3 Forms of Dialog 509

15.4 Natural Language Call Routing 510

15.5 Three Generations of Dialog Applications 510

15.6 Continuous Improvement Cycle 512

15.7 Transcription and Annotation of Utterances 513

15.8 Localization of Spoken Dialog Systems 513

15.9 Summary 520

Chapter 16: Combining Natural Language Processing Engines 523

16.1 Introduction 523

16.2 Desired Attributes of Architectures for Aggregating Speech and NLP Engines 524

16.3 Architectures for Aggregation 527

16.4 Case Studies 531

16.5 Lessons Learned 540

16.6 Summary 542

16.7 Sample UIMA Code 542

Index 551



Updates

Submit Errata



More Information



InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address

Corporate, Academic, and Employee Purchases