Also referred to as “Hebrew Comparable Corpus”, uploaded in 2010.
The corpus comprises of two components: translated and non-translated texts in Hebrew. There are about fifteen books (fiction and non-fiction) in each component. The two components are matched for topic and genre: for example, there is one biography in each. It is best suited for people who want to study differences between translated and non-translated language. It can also be used in order to study language use more generally.
The corpus was compiled as part of a project funded by the Israel Science Foundation and carried out in the Department of Translation and Interpreting Studies at Bar Ilan University.
See the part-of-speech tagset summary.