Computational Analysis of Catalogue Data

Key Points

Introduction to AntConc
  • AntConc is a tool for working with language corpora

Importing data into AntConc
  • Use the Open option to import data

  • You can import individual files or a folder

  • AntConc works only with plain text files, for example those with the file extension .txt

  • AntConc will not read common formats like .doc, .xls, or .pdf. You will need to convert these into .txt files to use AntConc.

Layout of AntConc
  • Many options for working with data in AntConc are accessed through a menu pane below the output window

Settings in AntConc
  • In AntConc settings can be changed for individual tools and globally

  • Settings can fundamentally change your use of a tool

  • You can export and import settings

Word lists
  • Word lists are a way of getting an overview of the lingustic features of a corpus

  • AntConc provides a number of options for presenting word lists

  • When using AntConc to count things, we need to be mindful that machine readable strings are not the same as human readable words

  • Outputs from AntConc queries can be saved locally as text files

Searching concordances
  • You can search a corpus in AntConc using free text and wildcards

  • Carefully changing the search settings enables you to build better queries

  • In addition to generating precise data, AntConc can be used to get to know a corpus and make rough suggestions as to its character

Collocates
  • The collocates of a word are those words that tend to occur in proximity to that word more than they occur in proximity to all other words in the corpus

Next Steps 1: comparing corpora
  • If used creatively, AntConc tools offer a number of ways to compare corpora

  • Comparison to baseline corpora can provide greater confidence in findings

Next Steps 2: Named Entity Recognition
  • NER tools create tagged datasets

  • Used creatively, AntConc can be used to analyse tagged datasets

BM-MDG.zip: Word lists
  • Word lists are a way of getting an overview of the lingustic features of a corpus

  • AntConc provides a number of options for presenting word lists

  • When using AntConc to count things, we need to be mindful that machine readable strings are not the same as human readable words

  • Outputs from AntConc queries can be saved locally as text files

BM-MDG.zip: Searching concordances
  • You can search a corpus in AntConc using free text and wildcards

  • Carefully changing the search settings enables you to build better queries

  • In addition to generating precise data, AntConc can be used get to know a corpus and make rough suggestions as to its character

BM-MDG.zip: Collocates
  • The collocates of a word are those words that tend to occur in proximity to that word more than they occur in proximity to all other words in the corpus

LWL-prints: Word lists
  • Word lists are a way of getting an overview of the lingustic features of a corpus

  • AntConc provides a number of options for presenting word lists

  • When using AntConc to count things, we need to be mindful that machine readable strings are not the same as human readable words

  • Outputs from AntConc queries can be saved locally as text files

LWL-prints: Searching concordances
  • You can search a corpus in AntConc using free text and wildcards

  • Carefully changing the search settings enables you to build better queries

  • In addition to generating precise data, AntConc can be used get to know a corpus and make rough suggestions as to its character

LWL-prints: Collocates
  • The collocates of a word are those words that tend to occur in proximity to that word more than they occur in proximity to all other words in the corpus