[concept]Pandas Fundamentals
Reading & Writing Files
# theory
reading csv
The most common operation; loading data from a CSV:
df = pd.read_csv("data.csv")
Useful parameters:
pd.read_csv("data.csv",
sep=",", # Delimiter (default comma)
header=0, # Row number for headers (0 = first row)
names=["a", "b", "c"], # Custom column names
index_col="id", # Use a column as index
usecols=["a", "b"], # Only read specific columns
nrows=1000, # Only read first N rows
skiprows=5, # Skip first N rows
na_values=["", "NA", "NULL"], # Treat these as missing
dtype={"zip": str} # Force column types
)
reading from a string
import io
csv_string = """name,age
Alice,25
Bob,30"""
df = pd.read_csv(io.StringIO(csv_string))
writing csv
df.to_csv("output.csv")
df.to_csv("output.csv", index=False) # Without row index
df.to_csv("output.csv", columns=["name", "age"]) # Specific columns
other formats
# Excel
df = pd.read_excel("data.xlsx")
df.to_excel("output.xlsx", index=False)
# JSON
df = pd.read_json("data.json")
df.to_json("output.json")
# Parquet (fast, compressed)
df = pd.read_parquet("data.parquet")
df.to_parquet("output.parquet")# examples [3]
# example 01 · reading with options
Common read_csv parameters
1
2
3
4
5
6
7
8
9
10
11
🐍
# example 02 · reading specific columns
Only load the columns you need
1
2
3
4
5
6
7
8
9
10
🐍
# example 03 · handling missing values
Specify what values should be treated as NA
1
2
3
4
5
6
7
8
9
10
11
12
13
🐍
# challenges [2]
# challenge 01/02todo
Read this CSV string into a DataFrame and print the first 2 rows: 'item,qty\nApple,10\nBanana,25\nOrange,15'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
🐍
# challenge 02/02todo
Read the sales DataFrame info and print how many non-null values are in each column.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
🐍
# project
# project-challenge
thread: SF Permits Analysis · reward: 50 xp
# brief
Before building reports, you need to check the data quality. Some permits are missing their Issued Date. Find and report any missing values in the dataset.
# task
Check Data Quality
# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
🐍