a corpus structure refers to the segments or parts into which a corpus can be divided. Typically, a corpus is divided into sentences, paragraphs and documents but corpora can use various other structures depending on the type of corpus.
A corpus from the web can contain dozens of unnecessary structures which are remains of HTML. They can be removed during compilation.