Guide

What is a Regular Expression (Regex)? A Plain-English Guide

A regular expression is a pattern that describes a set of strings. Learn what regex is, how to read and write patterns, what the symbols mean, and when to use regular expressions in your code.

The simplest explanation

A regular expression (regex or regexp) is a search pattern written in a compact notation. Instead of searching for a fixed string like hello, you write a pattern that describes a shape of text — and the regex engine finds everything that fits that shape.

For example, instead of searching for one specific phone number, you can write a pattern that matches any phone number. Instead of checking for one email address, you write a pattern that validates the structure of any email.

A regex that matches any 4-digit year:
\d{4}
Matches: 2024, 1999, 0001 — any four consecutive digits

Where regex is used

  • Form validation — checking that an email address, phone number, or postcode has the right format
  • Search and replace — in code editors (VS Code, Sublime), find all occurrences matching a pattern and replace them at once
  • Log parsing — extracting IP addresses, timestamps, or error codes from server logs
  • Data extraction (scraping) — pulling structured data from unstructured text
  • URL routing — web frameworks use regex to match URL patterns to handler functions
  • Command linegrep, sed, and awk all use regex

The key symbols — what they mean

SymbolMeansExampleMatches
.Any single character except newlinec.tcat, cut, c3t
*0 or more of the previousab*cac, abc, abbc
+1 or more of the previousab+cabc, abbc (not ac)
?0 or 1 of the previous (optional)colou?rcolor, colour
^Start of string^HelloHello world (not say Hello)
$End of stringworld$hello world (not worldwide)
\dAny digit (0–9)\d{4}2024, 9999
\wAny word character (letter, digit, _)\w+hello, user_42
\sAny whitespace (space, tab, newline)hello\sworldhello world
[abc]Any one of a, b, or c[aeiou]a, e, i, o, u
[^abc]Anything except a, b, or c[^0-9]any non-digit
(abc)Capture group — remember this match(\d+)-(\d+)123-456
a|ba or b (alternation)cat|dogcat, dog

Reading a real regex

Here is a regex that validates an email address, broken down piece by piece:

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
[a-zA-Z0-9._%+-]+ — one or more letters, digits, dots, underscores, %, +, or -
@ — literal @ symbol
[a-zA-Z0-9.-]+ — the domain name
\. — literal dot (backslash escapes the special meaning of .)
[a-zA-Z]{2,} — TLD of 2 or more letters (.com, .co.uk, etc.)

Flags — modifying how matching works

  • g (global) — find all matches, not just the first one
  • i (case-insensitive) — treat uppercase and lowercase as equal
  • m (multiline) — make ^ and $ match the start/end of each line, not just the whole string

Test regex patterns live

Paste any pattern and see matches highlighted instantly

Open RegEx Tester →

Full syntax reference

Every symbol, quantifier, and common pattern in one place

Regex Cheat Sheet →
What is a Regular Expression (Regex)? A Plain-English Guide | DataToolkit | DataToolkit