DOAJ corpora – Open Access Journals corpora

The Open Access Journals (OAJ) corpora are text corpora comprised of journals covering all areas of science, technology, medicine, social science, and humanities in dozens of languages.

The OAJ corpora contain rich metadata about journals, such as title, country, year of publication, etc. It is also possible to search by the keywords of articles.

Detailed information about Open Access Journals can be found on the original website Directory Open Acess Journals.

A list of OAJ corpora in Sketch Engine

  • Open Access Journals (English) – 2.6 billion words

More languages will be available soon.

Part-of-speech tagset

OAJ corpora are POS tagged depending on language specifications.

Tools to work with the Open Access Journals corpus

A complete set of Sketch Engine tools is available to work with this OAJ corpus to generate:

  • word sketch – collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context


Texts in DOAJ are published under Creative Commons (CC) license.

More information about the licensing can be found at

Search the Open Access Journals corpus

Sketch Engine offers a range of tools to work with the Open Access Journals corpus.


Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.