Computational Methods for Corpus Annotation and Analysis

Computational Methods for Corpus Annotation and Analysis

Abstract This introductory chapter provides a brief overview of the objectives and rationale of the book, the need for corpus annotation, the key concepts and issues involved in corpus annotation and in using annotated corpora for ...

Author: Xiaofei Lu

Publisher: Springer

ISBN: 9789401786454

Category: Language Arts & Disciplines

Page: 186

View: 646

In the past few decades the use of increasingly large text corpora has grown rapidly in language and linguistics research. This was enabled by remarkable strides in natural language processing (NLP) technology, technology that enables computers to automatically and efficiently process, annotate and analyze large amounts of spoken and written text in linguistically and/or pragmatically meaningful ways. It has become more desirable than ever before for language and linguistics researchers who use corpora in their research to gain an adequate understanding of the relevant NLP technology to take full advantage of its capabilities. This volume provides language and linguistics researchers with an accessible introduction to the state-of-the-art NLP technology that facilitates automatic annotation and analysis of large text corpora at both shallow and deep linguistic levels. The book covers a wide range of computational tools for lexical, syntactic, semantic, pragmatic and discourse analysis, together with detailed instructions on how to obtain, install and use each tool in different operating systems and platforms. The book illustrates how NLP technology has been applied in recent corpus-based language studies and suggests effective ways to better integrate such technology in future corpus linguistics research. This book provides language and linguistics researchers with a valuable reference for corpus annotation and analysis.
Categories: Language Arts & Disciplines

Corpus based Language Studies

Corpus based Language Studies

А A4.2 CORPUS ANNOTATION = ADDED VALUE Like corpus mark - up , annotation adds value to a corpus . Leech ( 1997a : 2 ) maintains that corpus annotation is a crucial contribution to the benefit a corpus brings , since it enriches the ...

Author: Tony McEnery

Publisher: Taylor & Francis

ISBN: 0415286220

Category: Language Arts & Disciplines

Page: 386

View: 905

Covering the major approaches to the use of corpus data, this work gathers together influential readings from leading names in the discipline, including Biber, Widdowson, Sinclair, Carter and McCarthy.
Categories: Language Arts & Disciplines

Corpus Linguistics and Linguistically Annotated Corpora

Corpus Linguistics and Linguistically Annotated Corpora

In section 2.3, we gave three examples where we searched for specific linguistic phenomena and explained what types of linguistic annotation are required to be able to find these phenomena. Then we looked at best practices in corpus ...

Author: Sandra Kuebler

Publisher: Bloomsbury Publishing

ISBN: 9781441119803

Category: Language Arts & Disciplines

Page: 288

View: 586

Linguistically annotated corpora are becoming a central part of the corpus linguistics field. One of their main strengths is the level of searchability they offer, but with the annotation come problems of the initial complexity of queries and query tools. This book gives a full, pedagogic account of this burgeoning field. Beginning with an overview of corpus linguistics, its prerequisites and goals, the book then introduces linguistically annotated corpora. It explores the different levels of linguistic annotation, including morphological, parts of speech, syntactic, semantic and discourse-level, as well as advantages and challenges for such annotations. It covers the main annotated corpora for English, the Penn Treebank, the International Corpus of English, and OntoNotes, as well as a wide range of corpora for other languages. In its third part, search strategies required for different types of data are explored. All chapters are accompanied by exercises and by sections on further reading.
Categories: Language Arts & Disciplines

News Discourse and Digital Currents

News Discourse and Digital Currents

of given tags encoded in the corpus, without browsing through the corpus in order to verify that the tags do reveal certain patterns. This should always be done since, in the specific case of automatic corpus annotation, for instance, ...

Author: Antonio Fruttaldo

Publisher: Cambridge Scholars Publishing

ISBN: 9781443893404

Category: Language Arts & Disciplines

Page: 250

View: 364

In recent years, journalistic practices have undergone a radical change due to the increasing pressure of new digital media on the professional practice. The ever-growing development of new technologies and the ceaseless fluctuation of social practices have challenged some of the traditional genres found in these professional contexts. On the basis of these premises, this book investigates a particular genre found in the context of TV newscasts. The genre under investigation is that of news tickers (or crawlers), that is, the graphic elements that scroll at the bottom of the screen during newscasts. The book introduces readers to this under-researched genre through a year-long collection of the news tickers displayed on BBC World News. Thanks to a corpus-based genre analysis, the generic status of news tickers is better defined by highlighting the presence of given strategies of marketization. Additionally, this volume investigates if news tickers can be seen as a mixed (sub-)genre that interdiscursively combines traditional linguistic elements of headlines and lead paragraphs to achieve, from a (Critical) Genre Analysis point of view, a specific private intention in the context of the BBC.
Categories: Language Arts & Disciplines

Discourse Analysis and the New Testament

Discourse Analysis and the New Testament

Or, as defined by Leech, '[corpus annotation is] the practice of adding interpretative, linguistic information to an ... written data'.5 He includes the word interpretative in his definition to indicate that when a corpus is annotated, ...

Author: Stanley E. Porter

Publisher: A&C Black

ISBN: 9780567559326

Category: Religion

Page: 432

View: 800

The volume contains contributions by many of the major discourse analysts of the New Testament, including E.A. Nida, W. Schenk, J.P. Louw and J. Callow. Some of these essays deal with methodology, raising necessary questions about what it means to analyse discourse. Others demonstrate an already committed approach by reading specific texts. A 'state-of-the-art' volume for all scholars interested in this increasingly important area of New Testament research.
Categories: Religion

Handbook of Linguistic Annotation

Handbook of Linguistic Annotation

Corpus: Annotation Levels and Applications”), spatial information (ISOspace, chapter “ISOspace: Annotating Static and Dynamic Spatial Information”, and SRL, chapter “Spatial Role Labeling Annotation Scheme”), time and event annotation ...

Author: Nancy Ide

Publisher: Springer

ISBN: 9789402408812

Category: Language Arts & Disciplines

Page: 1459

View: 157

This handbook offers a thorough treatment of the science of linguistic annotation. Leaders in the field guide the reader through the process of modeling, creating an annotation language, building a corpus and evaluating it for correctness. Essential reading for both computer scientists and linguistic researchers.Linguistic annotation is an increasingly important activity in the field of computational linguistics because of its critical role in the development of language models for natural language processing applications. Part one of this book covers all phases of the linguistic annotation process, from annotation scheme design and choice of representation format through both the manual and automatic annotation process, evaluation, and iterative improvement of annotation accuracy. The second part of the book includes case studies of annotation projects across the spectrum of linguistic annotation types, including morpho-syntactic tagging, syntactic analyses, a range of semantic analyses (semantic roles, named entities, sentiment and opinion), time and event and spatial analyses, and discourse level analyses including discourse structure, co-reference, etc. Each case study addresses the various phases and processes discussed in the chapters of part one.
Categories: Language Arts & Disciplines

The Functional Perspective on Language and Discourse

The Functional Perspective on Language and Discourse

Introduction The task of manual (or human-coded) corpus annotation is currently the object of extensive research in the Natural Language Processing (NLP) community for a number of computational applications.1 NLP researchers have ...

Author: María de los Ángeles Gómez González

Publisher: John Benjamins Publishing Company

ISBN: 9789027270207

Category: Language Arts & Disciplines

Page: 292

View: 727

Over the last forty years, the functionalist approach to linguistic description and explanation has given rise to several major schools of thought that share two crucial assumptions: (i) form is not independent of meaning/function or language use; and (ii) linguistic description and explanation need to take into account the communicative function of language. This volume offers readers interested in functional linguistics a selected sample of studies that jointly prove the efficacy of the analytical tools and procedures broadly accepted within the functionalist tradition in order to investigate language and discourse, with special focus on key pragmatic/discourse notions such as contextualization, grammaticalisation, reference, politeness, (in-)directness, discourse markers, speech acts, subjective evaluation and sentiment analysis in texts, among others. In addition, this volume offers specific corpus-based techniques for the objective contextualisation of linguistic data, which is crucial given the central role allotted to context in both functional linguistics and pragmatics/discourse analysis.
Categories: Language Arts & Disciplines

Collaborative Annotation for Reliable Natural Language Processing

Collaborative Annotation for Reliable Natural Language Processing

[LAN 12] LANDRAGIN F., POIBEAUT., VICTORRIB., “ANALEC: a new tool for the dynamic annotation of textual data”, International Conference on ... [LEE 93] LEECH G., “Corpus annotation schemes”, Literary and Linguistic Computing, vol.

Author: Karën Fort

Publisher: John Wiley & Sons

ISBN: 9781119307655

Category: Computers

Page: 192

View: 998

This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for worse, overwhelmingly dominant in the field, and secondly, the multiplication of evaluation campaigns or shared tasks. Both involve manually annotated corpora, for the training and evaluation of the systems. These corpora have progressively become the hidden pillars of our domain, providing food for our hungry machine learning algorithms and reference for evaluation. Annotation is now the place where linguistics hides in NLP. However, manual annotation has largely been ignored for some time, and it has taken a while even for annotation guidelines to be recognized as essential. Although some efforts have been made lately to address some of the issues presented by manual annotation, there has still been little research done on the subject. This book aims to provide some useful insights into the subject. Manual corpus annotation is now at the heart of NLP, and is still largely unexplored. There is a need for manual annotation engineering (in the sense of a precisely formalized process), and this book aims to provide a first step towards a holistic methodology, with a global view on annotation.
Categories: Computers

Natural Language Annotation for Machine Learning

Natural Language Annotation for Machine Learning

Some corpora and their uses Corpus Summary sentence PropBank For annotating verbal propositions and their arguments for examining semantic roles Manually Annotated Sub-Corpus For annotating sentence boundaries, tokens, lemma, ...

Author: James Pustejovsky

Publisher: "O'Reilly Media, Inc."

ISBN: 9781449306663

Category: Computers

Page: 344

View: 966

Includes bibliographical references (p. 305-315) and index.
Categories: Computers

Language Corpora Annotation and Processing

Language Corpora Annotation and Processing

Through the application of this process, we add representational information to a text that is included in a corpus. Extratextual annotation is important when a corpus is built with different types of texts obtained from different ...

Author: Niladri Sekhar Dash

Publisher: Springer Nature

ISBN: 9789811629600

Category: Computational linguistics

Page:

View: 263

This book addresses the research, analysis, and description of the methods and processes that are used in the annotation and processing of language corpora in advanced, semi-advanced, and non-advanced languages. It provides the background information and empirical data needed to understand the nature and depth of problems related to corpus annotation and text processing and shows readers how the linguistic elements found in texts are analyzed and applied to develop language technology systems and devices. As such, it offers valuable insights for researchers, educators, and students of linguistics and language technology.
Categories: Computational linguistics