pyodide: loading…

[concept]Grouping & Combining

Merge & Join

# theory

pd.merge()

Think of merge like a SQL JOIN. You combine two tables based on matching values.

pd.merge(left_df, right_df, on="common_column")

join types

TypeKeeps
innerOnly matching rows (default)
leftAll from left + matches from right
rightAll from right + matches from left
outerAll rows from both
pd.merge(orders, customers, on="customer_id", how="left")

different column names

When the join columns have different names:

pd.merge(orders, customers,
         left_on="cust_id",
         right_on="customer_id")

duplicate column names

When both DataFrames have columns with the same name (that aren't join keys):

pd.merge(df1, df2, on="id", suffixes=("_left", "_right"))
# Columns become: value_left, value_right

patterns

# Add customer details to orders
orders_with_details = pd.merge(orders, customers, on="customer_id")

# Find orders without customers (left join, then filter NaN)
all_orders = pd.merge(orders, customers, on="customer_id", how="left")
orphan_orders = all_orders[all_orders["customer_name"].isna()]

# Find all combinations (cross join)
pd.merge(df1, df2, how="cross")

See also

The SQL equivalent is INNER JOIN ... ON (and LEFT JOIN for how="left"). The matching mental model with table-flavored examples is on damato-sql at /learn/joining-tables/inner-join.

# examples [3]

# example 01 · inner join (default)

Only keep rows that match in both tables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · left join

Keep all rows from left table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · using suffixes

Handle duplicate column names

1
2
3
4
5
6
7
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Merge students with a grades_info DataFrame containing grade descriptions. Use the 'grade' column as the key.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Perform a left join between sales and a category_info table. Show all sales even if category info is missing.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Survey Insights Report · reward: 50 xp

# brief

Your report needs regional context. Create a country_info table with regions (North America, Europe, Asia, Oceania) and merge it with the survey data to enable regional salary analysis.

# task

Enrich Survey with Region Data

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
🐍
Loading PythonSetting up pandas & numpy...