Introduction to AntConc

Overview

Teaching: 5 min
Exercises: 0 min

Questions

What is AntConc? What is corpus linguistics? How can they be combined to analyse catalogue data?

Objectives

Explain what the AntConc software does

What is AntConc?

AntConc is a freeware software programme for working with language corpora using a graphical user interface. Within AntConc are a number of ‘tools’ that support linguistic analysis by enabling the user to - for example - search corpora, to generate lists of words in corpora, and to browse ‘concordances’ of word use in corpora.

AntConc major update

Note that these lesson materials use an older version of AntConc (3.5.9) rather than 4.0.0 and above, which represent a major shift in the aesthetic and functionality of AntConc. Please make sure you have the correct version on your computer. If you aren’t sure, check the AntConc download page.

What is corpus lingustics?

Corpus linguistics is the study of language through corpora, usually large collections of machine readable text. In order to study large collections, corpus linguistics - and those adopting their methods - use software tools to query their chosen corpora and the ‘strings’ and ‘lemma’ they contain. The outputs of that processing are typically a combination of counts of words, statistical inferences about word use, comparisons with standard language corpora, and subsets of text. These outputs are then analysed by people, from which new queries are formed, and new processing and analysis is made. Corpus lingustics is therefore an iterative study of text, where a phenomenon suggested by one output is tested and refined by the next.

For further reading on corpus lingustics see Stefanowitsch A. 2020. Corpus linguistics: A guide to the methodology. Berlin: Language Science Press. ISBN 978-3-96110-225-9, doi:10.5281/zenodo.3735822 (open access at https://langsci-press.org/catalog/book/148).

How can AntConc and corpus linguistics be combined to analyse catalogue data?

These training materials have been developed because the project team believe that corpus lingustic techniques can be usefully applied to analyse catalogue data, specifically what we call “curatorial voice”: the authorial voice of institutions produced by curatorial labour.

Having investigated this in our previous research, our current work seeks to investigate the broader applicability of our methods to the practice of cataloguers and those who maintain collection catalogues. For example, in a recent paper (Salway & Baker, 2020) we speculate that:

Given a set of guidelines for producing curatorial descriptions, corpus techniques could be used to check the extent to which guidelines are being followed at a macro-level, e.g. by identifying what aspects of objects tend to be referred to or not, and by gauging the overall extent of description versus interpretation/evaluation. Further, such analysis could form a basis for plans to edit and enhance a catalogue by providing areas to focus on and estimates of the person time required. It could also be that a corpus-based characterization of the language used in an exemplary catalogue could be used to develop or refine guidelines by identifying that catalogue’s distinctive linguistic features.

These training materials have been designed as a forum in which to test, refine and develop these ideas, and iterated as a result of substantial input from the GLAM community.

Some examples of successful work in this area includes:

Baker, James, and Andrew Salway. ‘Curatorial Labour, Voice and Legacy: Mary Dorothy George and the Catalogue of Political and Personal Satires, 1930–54’. Historical Research (2020).
Bowker, Lynne. ‘Corpus Linguistics Is Not Just for Linguists: Considering the Potential of Computer-Based Corpus Methods for Library and Information Science Research’. Library Hi Tech (2018)
Froehlich, Heather. ‘Distance-reading the feminine landscapes of The Awakening’. The CliC Dickens Blog (2018)
Tolonen, Mikko, Mark J. Hill, Ali Ijaz, Leo Lahti, and Ville Vaara. ‘Examining the Early Modern Canon: The English Short Title Catalogue and Large-Scale Patterns of Cultural Production’. Data Visualization in Enlightenment Literature and Culture (2020)

Key Points

AntConc is a tool for working with language corpora

lesson home

Computational Analysis of Catalogue Data

next episode

Introduction to AntConc

Overview

What is AntConc?

AntConc major update

What is corpus lingustics?

How can AntConc and corpus linguistics be combined to analyse catalogue data?

Key Points

lesson home

next episode