[practice]Functions & Apply
Custom Aggregations
# theory
custom funcs in agg()
You can use custom functions with groupby().agg():
def range_func(x):
return x.max() - x.min()
df.groupby("category")["value"].agg(range_func)
multiple custom aggregations
df.groupby("category")["value"].agg([
"sum",
"mean",
("range", lambda x: x.max() - x.min()),
("cv", lambda x: x.std() / x.mean()) # coefficient of variation
])
named aggregations w/ custom funcs
df.groupby("category").agg(
total=("value", "sum"),
average=("value", "mean"),
spread=("value", lambda x: x.max() - x.min())
)
transform() vs agg()
- agg(): Returns one row per group
- transform(): Returns same shape as input
# agg - one value per group
df.groupby("category")["value"].agg("mean")
# transform - value for each row (group's mean)
df["group_mean"] = df.groupby("category")["value"].transform("mean")
See also
The SQL equivalents of transform("rank") and "percentage of group total" are window functions: RANK() OVER (PARTITION BY group ORDER BY value) and value * 100.0 / SUM(value) OVER (PARTITION BY group). Cross-reference on damato-sql at /learn/window-functions/ranking-functions.
# examples [3]
Define your own aggregation logic
Calculate group stats for each row
Calculate relative contribution within groups
# challenges [2]
# project
# project-challenge
thread: Sales Performance Dashboard · reward: 50 xp
# brief
Finance needs to understand revenue variability by category. Create a custom aggregation function that calculates the coefficient of variation (std/mean) for each category's revenue.
# task