[concept]Functions & Apply
Vectorized Operations
# theory
why vectorization matters
Slow (loop):
result = []
for i in range(len(df)):
result.append(df.iloc[i]["a"] + df.iloc[i]["b"])
df["sum"] = result
Fast (vectorized):
df["sum"] = df["a"] + df["b"]
Vectorized operations are 10-100x faster because they use optimized C code under the hood.
vectorized math
df["doubled"] = df["value"] * 2
df["squared"] = df["value"] ** 2
df["total"] = df["price"] * df["qty"]
df["pct"] = df["value"] / df["value"].sum() * 100
string ops (.str)
df["lower"] = df["name"].str.lower()
df["first_letter"] = df["name"].str[0]
df["has_a"] = df["name"].str.contains("a")
df["parts"] = df["text"].str.split(",")
conditionals (np.where, np.select)
import numpy as np
# np.where: if-else vectorized
df["label"] = np.where(df["value"] > 50, "High", "Low")
# Multiple conditions
conditions = [
df["score"] >= 90,
df["score"] >= 80,
df["score"] >= 70
]
choices = ["A", "B", "C"]
df["grade"] = np.select(conditions, choices, default="F")
# pd.cut: binning
df["bucket"] = pd.cut(df["value"], bins=[0, 25, 50, 75, 100])
vs apply()
| Operation | Use |
|---|---|
| Math on columns | Vectorized |
| String methods | .str accessor |
| Simple conditions | np.where / np.select |
| Complex row logic | apply() |
| Need multiple columns | apply(axis=1) or vectorized |
# examples [3]
# example 01 · vectorized math
Column-wise calculations without loops
1
2
3
4
5
6
7
🐍
# example 02 · np.where for conditionals
Vectorized if-else
1
2
3
4
5
6
7
8
9
10
11
12
🐍
# example 03 · pd.cut for binning
Group continuous values into bins
1
2
3
4
5
6
7
8
9
10
11
12
13
🐍
# challenges [2]
# challenge 01/02todo
Use np.where to create 'status' column: 'High' if score >= 85, else 'Normal'. Print results.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
# challenge 02/02todo
Use pd.cut to bin ages into 'Teen' (0-19), 'Young' (20-21), 'Adult' (22+). Print the distribution.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
# project
# project-challenge
thread: Sales Performance Dashboard · reward: 50 xp
# brief
Management wants to tier sales reps based on revenue performance. Use np.where to assign Star/Solid/Developing tiers based on total revenue thresholds.
# task
Assign Performance Tiers
# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
🐍