Regular expressions are special characters used instead of unspecified letters and numbers in searches. They can find different words (or strings of characters) which have a common pattern, e.g. words which start the same way, finish the same way or contain certain characters.

This page only gives a few basic examples, please refer to Wikipedia try our regular expressions exercises or this interactive course.

Wild cards
(in simple concordance search only)

Wild cards are not regular expressions but users might be familiar with them form other software. Wild cards are only allowed in a simple concordance search.  Only the asterisk (*) and question mark (?) can be used like this:

asterisk (*) stands for zero or more characters
test* will find
test, tests, tested, testing

c*t will find
CT, cut, cat, craft, construct

question mark (?) stands for exactly 1 character
test? will find
tests, Testa, testy
but will not find

c?t will find these lemmas
cat, cut
BUT! simple search always treats each search word as a lemma, thus c?t will search for the lemmas cut, cat and cot. These lemmas will produce results which include all word forms. The final concordance will thus show: cut, cutting, cat, cats, cot, cots, etc.

To search for the asterisk and question mark , use \* and \?

Regular expressions

With all the other concordance searches and with wordlists, regular expressions are used in its standard form.

dot ‘ . ‘

A dot stands for a single unspecified character.

Video manual

This video introduces searching words with an apostrophe and using operators * and ?.

Note: for finding “cannot” type “can not” due to pre-processing texts when “not” is separated (information on this search is not complete in the video).

search querypossible result(s)
w.nwin won wen wun wan car cap cab can

asterisk ‘ * ‘

The asterisk stands for zero or more occurences of the preceding character.

search querypossible result(s)
co*lCL col cool coool cooool
hallo*hall hallo halloo hallooo halloooo
c.*ing(any number of unspecified characters between c and ing)
cycling camping cutting cooking contemplating
*oolproduces error, no character precedes the asterisk

question mark ‘ ? ‘

The question mark stands for zero or more occurrences of the preceding character

search querypossible result(s)
be?tbt bet
(but will not find beet beeet beeeet)
bet?be bet
(but will not find bets betting)
.?atat hat bat cat mat
(zero or one unspecified character at the beginning)

digits ‘ \d ‘

\d stands for a digit, i.e. characters 0-9

search querypossible result(s)
b\db1 b2 b3 b4
b\d*b b1 b12 b89 b43958
(zero or more digits after b)
\d\db58b 46b 89b
(b preceded by two digits)

range ‘ [ ] ‘

use square brackets to specify a list or range
[bmpg] stands for b OR m OR p OR g
[a-d] stands for a letter between a and d
[3-5] stands for a digit between 3 and 5

search querypossible result(s)
[mpgb]etmet pet get bet
m[2-5]m2 m3 m4 m5
m[2-5]*m m22 m52 m3425 m23453234 m222345
(m followed by zero or more digits between 2 and 5)

not ‘ ^ ‘

use ^ to indicate that the character(s) should not be included, the characters have to be enclosed in square brackets

search querypossible result(s)
[^m]etpet get bet let
(but will not find met)
[^mpg]etset let
(but will not find met pet get)

or ‘ | ‘

the pipe | is used to indicate OR

search querypossible result(s)
get|metwill find lines which contain the word pet OR the word met