[practice]NumPy Foundations
Vectorization vs loops
# theory
the slow way
You can iterate over a NumPy array with a Python for loop. It works. It's also one to two orders of magnitude slower than the vectorized version.
# Slow: Python loop runs once per element
result = []
for x in a:
result.append(x ** 2 + 1)
result = np.array(result)
# Fast: one C-level call to do the same thing
result = a ** 2 + 1
The vectorized form is also shorter, easier to read, and impossible to get wrong on the indexing.
vectorized conditionals
np.where is the vectorized if/else. Branchless, no Python loop.
# Slow:
out = []
for x in a:
out.append(0 if x < 0 else x)
out = np.array(out)
# Fast:
out = np.where(a < 0, 0, a)
For more than two branches, np.select:
np.select(
[a < 0, a < 100, a >= 100],
["below_zero", "small", "big"],
default="unknown",
)
See also
The SQL equivalent of np.where is a CASE WHEN ... THEN ... ELSE ... END expression. Same branchless mental model, different syntax. The full SQL CASE lesson lives on damato-sql at /learn/data-analysis/case-expressions.
# examples [3]
Replace if/else inside a loop with one call.
Three or more buckets. Conditions list, choices list, default fallback.
Some sequences (running totals, cumulative ops) need a vectorized primitive, not just elementwise math.
# challenges [2]