As a rule of thumb, do not worry about the advanced settings and use the default settings. Only if the results do not produce the results you need, start looking into the advanced settings.
Max tuples (100) is the number of queries (seed word combinations) to be sent to the search engine. Use these maximum values to get as much data as possible.
High number of URLs per query can result in a bigger but less relevant corpus because even links found on the 2nd, 3rd and subsequent pages of the search engine results will be included.
Finally, you can repeat the same procedure several times to enlarge the corpus. Sketch Engine will make sure no page, text or part of text is included twice (deduplication).
The white list keywords can be useful to avoid ambiguity of the seed words, i.e. you can make some of the unambiguous seed words compulsory to make sure the document matches the topic.
Black list keywords can also be used to reduce ambiguity (e.g. you might use “party” when collecting a corpus on the environment using seeds which include “green”). It is only necessary to use the whitelist and blacklists if you are getting irrelevant documents, otherwise it is not necessary.