The project continuously processes 75,000 RSS feeds which bring between 100,000 and 150,000 articles every day. more on the project»
The Timestamped JSI web corpus was tagged for parts of speech and the time stamps were used to augment the corpus with diachronic annotation. Currently the corpus covers the time period of 2014 and 2016. By combining this data with other web corpora, a total period of between 2009 and 2015 can be covered. There are plans to receive regular daily updates from Jozef Stefan Institute and regularly amend the corpus with the latest data.
The diachronic annotation is extremely valuable in connection with Sketch Engine and its trends feature. The trends feature analyses the frequency of the use of a word in time by comparing the frequency of use across a series of comparable time periods.
The corpus is accessible to all users including trial users.