BAWE – British Academic Written English corpus

The British Academic Written English (BAWE) is a British Academic corpus of academic works written at universities in the UK. It represents a pattern of British Academic English with fairly evenly distributed disciplinary areas (Arts and Humanities, Social Sciences, Life Sciences and Physical Sciences) and levels of study (undergraduate and taught masters level).

The whole corpus consists of 2761 pieces of proficient assessed student writing with lengths in the range of 500–5000 words. The BAWE corpus contains 6.9 million words in total. The corpus has been prepared for Sketch Engine by Dr Paul Thompson and Dr Alois Heuboeck at the University of Reading.

Part-of-speech tagset and lemmatization

The BAWE corpus is part-of-speech tagged by Paul Rayson (Lancaster University) with the following CLAWS version 7 tagset summary indicating the part of speech and grammatical category including semantic category (annotated with the WMatrix tool). The corpus texts contain lemmatization when each word form from the corpus is assigned to its base form (lemma).

Tools work with the British English corpus

A complete set of Sketch Engine tools is available to work with this BAWE corpus to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywords – terminology extraction of one-word and multi-word units
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

For more information about citing the BAWE corpus, please visit www.coventry.ac.uk/bawe

A list of all corpus metadata is available in the document BAWE Corpus Holdings.

Search the BAWE corpus

Sketch Engine offers a range of tools to work with the BAWE corpus.

Concordance from the BAWE corpus

Try a 30-day free trial

or

Other English corpora

Explore our largest Timestamped English corpus with 70+ billion words.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.