[concept]String & File Ops
Regex Basics
# theory
what regex is
Regular expressions (regex) are patterns for matching text. They're powerful for finding, extracting, and replacing text.
import re
# Find all matches
re.findall(r"\d+", "Order 123 has 45 items") # ['123', '45']
# Search for pattern
re.search(r"\d+", "Order 123") # Match object or None
# Replace pattern
re.sub(r"\d+", "X", "Order 123") # "Order X"
groups
# Parentheses create capture groups
match = re.search(r"(\d+)-(\d+)", "Phone: 555-1234")
if match:
area = match.group(1) # "555"
number = match.group(2) # "1234"
pandas + regex
# Extract with regex
df["digits"] = df["text"].str.extract(r"(\d+)")
# Replace with regex
df["clean"] = df["text"].str.replace(r"\d+", "", regex=True)
# Filter rows
df[df["text"].str.contains(r"\d+", regex=True)]# examples [3]
# example 01 · finding patterns
Use findall to get all matches
1
2
3
4
5
6
7
8
9
10
11
🐍
# example 02 · search and groups
Extract specific parts of a match
1
2
3
4
5
6
7
8
9
10
11
🐍
# example 03 · substitution
Replace patterns with new text
1
2
3
4
5
6
7
8
9
10
11
🐍
# challenges [2]
# challenge 01/02todo
Use regex to find all words that start with a capital letter in 'Alice met Bob in New York'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
# challenge 02/02todo
Replace all digits in 'Order #12345 total $99.99' with 'X' and print the result.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
# project
# project-challenge
thread: Sales Performance Dashboard · reward: 50 xp
# brief
Each sale has an ID like 'S001' or 'S015'. For database integration, you need to extract just the numeric portion of each SaleID. Use regex to parse out the numbers.
# task
Extract Sale ID Numbers
# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
🐍