DOAJ corpora – Directory of Open Access Journals

The Directory of Open Access Journals (DOAJ) corpora are text corpora comprised of journals covering all areas of science, technology, medicine, social science, and humanities in dozens of languages.

The DOAJ corpora contain rich metadata about journals, such as title, country, year of publication, etc. It is also possible to search by the keywords of articles.

Detailed information about Directory of Open Access Journals can be found on the original website.

A list of DOAJ corpora in Sketch Engine

  • Directory of Open Access Journals (English) – 2.6 billion words

More languages will be available soon.

Part-of-speech tagset

DOAJ corpora are POS tagged depending on language specifications.

Tools to work with the Directory of Open Access Journals corpus

A complete set of Sketch Engine tools is available to work with this Directory of Open Access Journals corpus to generate:

  • word sketch – collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

Copyright

Texts in DOAJ are published under Creative Commons (CC) license.

More information about the licensing can be found at https://doaj.org/publishers#licensing

Search the Directory of Open Access Journals corpus

Sketch Engine offers a range of tools to work with the Directory of Open Access Journals corpus.

or

Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.