pyodide: loading…

[concept]Data Manipulation (WCTC)

Pivot Tables & Reshaping

# theory

pivot_table

If you've used pivot tables in Excel, this is the same idea but way more powerful.

pd.pivot_table(
    df,
    values="sales",        # what to aggregate
    index="region",        # rows
    columns="product",     # columns
    aggfunc="sum"          # how to aggregate
)

Multiple aggregations:

pd.pivot_table(
    df,
    values="sales",
    index="region",
    aggfunc=["sum", "mean", "count"]
)

groupby + unstack

This is another way to get pivot-style output. Sometimes it's cleaner.

# Group then unstack
df.groupby(["region", "product"])["sales"].sum().unstack()

The result looks exactly like a pivot table. I actually prefer this approach sometimes because it's more explicit about what's happening.

melt (wide to long)

Sometimes data comes in wide format and you need it long. melt fixes that.

# Wide format:
#   name  Jan  Feb  Mar
#   Alice  100  120  130

# Convert to long:
pd.melt(df, id_vars=["name"], var_name="month", value_name="sales")

# Result:
#   name  month  sales
#   Alice   Jan    100
#   Alice   Feb    120
#   Alice   Mar    130

This is super useful when you need to plot data or do time series analysis.

stack / unstack

These are for reshaping multi-index data.

# unstack; move inner index to columns
grouped = df.groupby(["region", "year"])["sales"].sum()
grouped.unstack()  # years become columns

# stack; move columns to index (opposite of unstack)
pivoted.stack()

fill_value for missing combos

When pivoting, some combinations might not exist in your data. You get NaN.

pd.pivot_table(
    df,
    values="sales",
    index="region",
    columns="product",
    fill_value=0  # fill missing with 0 instead of NaN
)

margins (totals)

pd.pivot_table(
    df,
    values="sales",
    index="region",
    columns="product",
    aggfunc="sum",
    margins=True,      # add row/column totals
    margins_name="Total"
)

Pivot tables with margins are useful for reports. You get row and column totals in one call.

# examples [3]

# example 01 · basic pivot table

Summarize sales by region and product

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · using melt to reshape

Convert wide data to long format; happens a lot with imported spreadsheets

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · pivot with margins (totals)

Add row and column totals to your pivot table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Create a pivot table from sales showing the sum of 'price' for each 'category'. Print the result.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Use groupby and unstack to show total quantity sold by category. This gives the same result as a pivot table but using a different approach.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
🐍
Loading PythonSetting up pandas & numpy...