Guide
What is a Regular Expression (Regex)? A Plain-English Guide
A regular expression is a pattern that describes a set of strings. Learn what regex is, how to read and write patterns, what the symbols mean, and when to use regular expressions in your code.
The simplest explanation
A regular expression (regex or regexp) is a search pattern written in a compact notation. Instead of searching for a fixed string like hello, you write a pattern that describes a shape of text — and the regex engine finds everything that fits that shape.
For example, instead of searching for one specific phone number, you can write a pattern that matches any phone number. Instead of checking for one email address, you write a pattern that validates the structure of any email.
A regex that matches any 4-digit year:
\d{4}
Matches: 2024, 1999, 0001 — any four consecutive digits
Where regex is used
- Form validation — checking that an email address, phone number, or postcode has the right format
- Search and replace — in code editors (VS Code, Sublime), find all occurrences matching a pattern and replace them at once
- Log parsing — extracting IP addresses, timestamps, or error codes from server logs
- Data extraction (scraping) — pulling structured data from unstructured text
- URL routing — web frameworks use regex to match URL patterns to handler functions
- Command line —
grep,sed, andawkall use regex
The key symbols — what they mean
| Symbol | Means | Example | Matches |
|---|---|---|---|
| . | Any single character except newline | c.t | cat, cut, c3t |
| * | 0 or more of the previous | ab*c | ac, abc, abbc |
| + | 1 or more of the previous | ab+c | abc, abbc (not ac) |
| ? | 0 or 1 of the previous (optional) | colou?r | color, colour |
| ^ | Start of string | ^Hello | Hello world (not say Hello) |
| $ | End of string | world$ | hello world (not worldwide) |
| \d | Any digit (0–9) | \d{4} | 2024, 9999 |
| \w | Any word character (letter, digit, _) | \w+ | hello, user_42 |
| \s | Any whitespace (space, tab, newline) | hello\sworld | hello world |
| [abc] | Any one of a, b, or c | [aeiou] | a, e, i, o, u |
| [^abc] | Anything except a, b, or c | [^0-9] | any non-digit |
| (abc) | Capture group — remember this match | (\d+)-(\d+) | 123-456 |
| a|b | a or b (alternation) | cat|dog | cat, dog |
Reading a real regex
Here is a regex that validates an email address, broken down piece by piece:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
[a-zA-Z0-9._%+-]+ — one or more letters, digits, dots, underscores, %, +, or -
@ — literal @ symbol
[a-zA-Z0-9.-]+ — the domain name
\. — literal dot (backslash escapes the special meaning of .)
[a-zA-Z]{2,} — TLD of 2 or more letters (.com, .co.uk, etc.)
Flags — modifying how matching works
g(global) — find all matches, not just the first onei(case-insensitive) — treat uppercase and lowercase as equalm(multiline) — make^and$match the start/end of each line, not just the whole string