Corpus of Hebrew translation texts

The Hebrew translation corpus, also known as Hebrew Comparable Corpus is a language corpus made up of translated and non-translated texts of the Hebrew language. There are about fifteen books (fiction and non-fiction) in each component. The two components are matched for topic and genre: for example, there is one biography in each. It is best suited for people who want to study differences between translated and non-translated language. It can also be used in order to study language use more generally.

The corpus was compiled as part of a project funded by the Israel Science Foundation and carried out in the Department of Translation and Interpreting Studies at Bar Ilan University.

Detailed information about the corpus

Part-of-speech tagset

The Hebrew translation corpus is POS tagged with using following part-of-speech tags.

Tools to work with the Hebrew Translational corpus

A complete set of Sketch Engine tools is available to work with this Hebrew Translational corpus to generate:

  • word sketch – Hebrew collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of Hebrew nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context

A corpus attribute overview

A list of positional attributes used in the corpus

token token
transliteration (token) trans
lemma lemma
transliteration (lemma) transl
pos tag adjective adverb conjunction copula existential foreign interjection interrogative modal negation noun numberExpression numeral participle preposition pronoun properName punctuation quantifier title url verb wPrefix
pos-type postype amount and arithmetic-operation bracket-end bracket-start colon comma coordinating demonstrative determiner dot exclamation-mark gematria hyphen impersonal literal-number numeral-cardinal numeral-fractional numeral-ordinal or other partitive personal proadverb prodet pronoun question-mark quote reflexive relativizing semicolon slash subordinating yesno
prefix string prestring ב בכ ו וב ובכ וכ וכש וכשל ול ומ ומכ ומש וש ושב ושל ושמ כ ככ כש כשב כשל כשמ ל לכ לכש מ מכ מש משב משכ משל משמ ש שב שכ שכש שכשמ של שמ שמש
base string basestring
 suffix string sufstring גם ה הם הן ו י ך כם כן ם ן נו
gender gender feminine masculine masculine-and-feminine
number number dual dual-and-plural plural singular singular-and-plural
status status absolute construct
polarity polarity negative positive
person person 1 2 3 any
tense tense beinoni future imperative infinitive past
binyan binyan Hifil Hitpael Hufal Nifal Paal Piel Pual
prefix conjunction prefconj conjunction
prefix definite article prefdefinite definiteArticle
prefix interrogative prefinterrog
prefix preposition prefprep preposition
prefix subordination conjunction / relativizer relativizer  relativizer/subordinatingConjunction
prefix temporal subordinating conjunction preftemp temporalSubConj
 prefix adverb prefadv adverb
suffix function suffunction accusative-or-nominative possessive pronomial
suffix number sufnum feminine masculine masculine-and-feminine
suffix gender sufgender plural singular
suffix person sufper 1 2 3

Search the Polish Web corpus

Sketch Engine offers a range of tools to work with the Polish Web corpus.


Other text corpora in Sketch Engine

Sketch Engine offers 350+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.