2025-01-14 Programming, Productivity

The Fundamentals of Regex

By O. Wolfson

Regex is a powerful tool for searching, matching, and manipulating strings. Understanding regular expressions (regex) requires knowing the basic building blocks used to match patterns in text.

Use the tool below to test your regex patterns:

Regex Tester

1. Literals

  • Matches exact text.
  • Example:
    • Pattern: apple
    • Matches: "apple" in "I like apple pie."

2. Metacharacters

  • Special characters with specific meanings. To match them literally, escape them with a backslash (\).
MetacharacterMeaning
.Matches any single character except newline
^Matches the start of a string (a)
$Matches the end of a string (b)
*Matches 0 or more of the preceding token
+Matches 1 or more of the preceding token
?Matches 0 or 1 of the preceding token (optional)
\Escapes a special character
|Logical OR (alternation)
()Groups expressions
[]Matches a set or range of characters
  • (a) - ^ Matches the start of a string, must be no characters before the string to match. ^apple matches "apple" in "apple pie"
  • (b) - $ Matches the end of a string, must be no characters after the string to match. apple$ matches "apple" in "I like apple pie"

Examples:

  • (apple|banana|orange) matches "apple" in "apple pie" and "apple" in "pineapple"

3. Character Classes

  • Used to match specific sets of characters.
  • Example: [abc] matches 'a', 'b', or 'c'.
ClassMeaning
[abc]Matches any character in the set
[^abc]Matches any character NOT in the set
[a-z]Matches any lowercase letter
[0-9]Matches any digit
\dMatches any digit ([0-9])
\DMatches any non-digit
\wMatches any word character ([a-zA-Z0-9_])
\WMatches any non-word character
\sMatches any whitespace ([ \t\n\r\f\v])
\SMatches any non-whitespace

4. Quantifiers

  • Specify how many times a token can be repeated.
QuantifierMeaning
*Matches 0 or more times
+Matches 1 or more times
?Matches 0 or 1 time (optional)
{n}Matches exactly n times
{n,}Matches n or more times
{n,m}Matches between n and m times
  • Example:
    • Pattern: a{2,4}
    • Matches: "aa", "aaa", "aaaa"

5. Anchors

  • Match positions within a string.
AnchorMeaning
^Start of a string
$End of a string
\bWord boundary
\BNon-word boundary
  • Example:

    • Pattern: ^apple$
    • Matches: "apple" (only if it's the entire string).
    • \b(apple|banana|orange)\b matches "apple", "banana", or "orange" in "I like apple pie" but not "apple" in "pineapple"

6. Groups and Capturing

  • Used to group expressions or capture parts of a match.

  • Example:

    • Pattern: (cat|dog)
    • Matches: "cat" or "dog"
  • Capturing:

    • Pattern: (\d{3})-(\d{4})
    • Matches: "123-4567" and captures "123" and "4567"

7. Lookaheads and Lookbehinds

  • Lookahead: Matches a group if it's followed by a specific pattern.

    • Positive: (?=pattern)
    • Negative: (?!pattern)
  • Lookbehind: Matches a group if it's preceded by a specific pattern.

    • Positive: (?<=pattern)
    • Negative: (?<!pattern)
  • Example:

    • Pattern: \d(?= dollars)
    • Matches: "5" in "5 dollars"

8. Flags/Modifiers

  • Modify how the regex is applied.
  • Common flags:
    • i: Case-insensitive
    • g: Global (match all occurrences)
    • m: Multiline (treat ^ and $ as start/end of a line)

9. Escaping

  • Use \ to escape metacharacters to match them literally.
  • Example:
    • Pattern: \$5
    • Matches: "The price is $5"

10. Practical Examples

  • Email validation: ^\w+@\w+\.\w{2,}$
  • Phone number: ^\(\d{3}\) \d{3}-\d{4}$
  • URL: https?://(www\.)?\w+\.\w+

Summary

Regex is all about recognizing patterns using combinations of these elements. Start simple, build incrementally, and test as you go. There are many online tools like regex101 to practice and visualize matches.