pyodide: loading…

[practice]Data Cleaning

Duplicates & Reset Index

# theory

finding duplicates

# Check for duplicate rows
df.duplicated()                    # True/False for each row
df.duplicated().sum()              # Count of duplicates
df[df.duplicated()]                # View duplicate rows

# Check specific columns
df.duplicated(subset=["name"])     # Duplicate names only
df.duplicated(subset=["name", "email"])  # Both must match

removing duplicates

# Remove duplicate rows
df.drop_duplicates()

# Keep first or last occurrence
df.drop_duplicates(keep="first")   # Default
df.drop_duplicates(keep="last")
df.drop_duplicates(keep=False)     # Remove ALL duplicates

# Based on specific columns
df.drop_duplicates(subset=["email"])

reset_index

After filtering or sorting, the index might have gaps. Reset it:

df.reset_index()                   # Old index becomes a column
df.reset_index(drop=True)          # Discard old index

set_index

df.set_index("id")                 # Use 'id' column as index
df.set_index("id", drop=True)      # Remove from columns (default)
df.set_index(["year", "month"])    # Multi-level index

# examples [3]

# example 01 · finding and removing duplicates

Identify and remove duplicate rows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · reset index after filtering

Clean up index after removing rows

1
2
3
4
5
6
7
8
9
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · set a column as index

Use a meaningful column as the row identifier

1
2
3
4
5
6
7
8
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Check how many duplicate rows exist in the students DataFrame and print the count.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Filter students with grade 'A', reset the index (dropping the old one), and print the result.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Survey Insights Report · reward: 50 xp

# brief

Before finalizing your report, verify there are no duplicate survey submissions. Check for duplicate RespondentIDs and reset the index after any filtering operations.

# task

Check for Duplicate Respondents

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
🐍
Loading PythonSetting up pandas & numpy...