The Chinese Wiki corpus is first segmented with Stanford Word Segmenter. Later tagged with Stanford Tagger using a model trained on a combination of Chinese Treebank texts from Chinese and Hong Kong sources.
The tag set used for tagging is LDC Chinese Treebank Tag set.
v1.0 (16 April 2012)
- initial version – 0.1 billion tokens