Corpus Info (main statistics and info)

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language of British English from the later part of the 20th century. The BNC consists of the bigger written part (90 %, e.g. newspapers, academic books, letters, essays, etc.) and the smaller spoken part (remaining 10 %, e.g. informal conversations, radio shows, etc.).

The official website: http://www.natcorp.ox.ac.uk/

The tags legend (see whole tagset)

[wptg_comparison_table id=”1″]

Special attributes

  • ambtag: the ambivalent part of speech tag (all tags before disambiguation)
  • pos: shortened form for the part of speech (only second part of lempos)

Changelog

v2.0 (8th November 2010)

  • replaced SGML entities (such as " with correspondent unicode characters)
  • added <stext> tags (spoken texts)

Bibliographic references

How to reference Sketch Engine

  • The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/
  • Reference Guide for the British National Corpus (XML Edition) edited by Lou Burnard, February 2007. URL: http://www.natcorp.ox.ac.uk/XMLedition/URG/
  • The British National Corpus, version 2 (BNC World). 2001. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/
  • The British National Corpus Users Reference Guide edited by Lou Burnard, October 2000. URL: http://www.natcorp.ox.ac.uk/archive/index.xml
  • The BNC Baby, version 2. 2005. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/
  • The BNC Sampler, XML version. 2005. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/

Data from the BNCOur policy is to request that citations from the British National Corpus should include the text identifier (a 3 letter code) and sentence number. A suitable form of words for crediting the BNC would be:

  • “Examples of usage taken from the British National Corpus (BNC) were obtained under the terms of the BNC End User Licence. Copyright in the individual texts cited resides with the original IPR holders. For information and licensing conditions relating to the BNC, please see the web site at http://www.natcorp.ox.ac.uk “
  • or: “Data cited herein have been extracted from the British National Corpus, distributed by Oxford University Computing Services on behalf of the BNC Consortium. All rights in the texts cited are reserved.”