The Philippine web corpus consists of web texts in six dialects spoken in the Philippines. It was obtained in June 2016 using WebBootCaT with seed urls and word bigrams from An Crúbadán for respective dialects. The HTML data was cleaned using Justext and an English language filter, deduplicated by Onion.