[practice]Data Cleaning
String Cleaning
# theory
the .str accessor
Pandas provides string methods through the .str accessor:
df["name"].str.lower()
df["name"].str.upper()
df["name"].str.title()
df["name"].str.strip()
common operations
# Remove whitespace
df["text"].str.strip() # Both ends
df["text"].str.lstrip() # Left only
df["text"].str.rstrip() # Right only
# Case conversion
df["text"].str.lower()
df["text"].str.upper()
df["text"].str.title() # Capitalize Words
# Replace patterns
df["text"].str.replace("old", "new")
df["text"].str.replace(r"\d+", "", regex=True) # Remove numbers
# Check content
df["text"].str.contains("word") # Returns True/False
df["text"].str.startswith("prefix")
df["text"].str.endswith("suffix")
extracting
# Split and get parts
df["name"].str.split(" ").str[0] # First word
df["name"].str.split(" ").str[-1] # Last word
# Extract with regex
df["text"].str.extract(r"(\d+)") # First number
# Get length
df["text"].str.len()
chaining
Chain multiple operations:
df["clean_name"] = (df["name"]
.str.strip()
.str.lower()
.str.replace(" ", "_"))# examples [3]
# example 01 · basic string cleaning
Strip whitespace and standardize case
1
2
3
4
5
6
7
8
9
10
🐍
# example 02 · replace and contains
Find and replace text patterns
1
2
3
4
5
6
7
8
9
10
11
12
🐍
# example 03 · splitting strings
Break apart strings into columns
1
2
3
4
5
6
7
8
9
10
11
12
13
🐍
# challenges [2]
# challenge 01/02todo
Clean the students 'name' column: strip whitespace and convert to title case. Print the names.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
🐍
# challenge 02/02todo
Find all students whose name contains the letter 'a' (case insensitive) and print them.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
🐍
# project
# project-challenge
thread: Survey Insights Report · reward: 50 xp
# brief
Job titles in the survey have inconsistent formatting. Clean the JobTitle column by stripping whitespace and converting to title case for consistent reporting in your recruiter dashboard.
# task
Standardize Job Titles
# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
🐍