Bengali web corpus was created with Corpus Factory method. The corpus is encoded in UTF-8 and contains 11.7 million words.

This corpus uses Shallow tagging (we do not have specific tagger) and has Universal Sketch Grammar.

Tags legend:

(Shallow tagging)

frequent FREQ
word CONTENT
number CRD
punctuation PUN
other OTHER