Working with data is a core part of Python programming. Whether you're saving logs, loading spreadsheets, parsing JSON from APIs, or reading large datasets, everything begins with understanding how Python handles input and output.
This lesson covers text files, CSV, JSON, XML, binary formats like pickle/parquet, API requests, and a practical introduction to pandas DataFrames for real-world data analysis.
1) Why Data I/O Matters
Your programs become useful when they can talk to the outside world — files, folders, databases, or APIs. This lesson will help you:
- Read and write files safely using context managers
- Work with CSV, JSON, and XML
- Use binary formats like pickle and parquet
- Fetch JSON from web APIs using
requests - Load, explore, and filter data using pandas DataFrames
2) File Handling — Paths, Modes & Context Managers
Before dealing with structured formats, you need a solid foundation in file basics. Python works with files using the open() function and the context manager pattern:
from pathlib import Path
path = Path("data/notes.txt")
with path.open("w", encoding="utf-8") as f:
f.write("Hello, world!\n")
Why use with? It closes the file automatically — fewer bugs, cleaner code.
Common File Modes
r– read (default)w– write (overwrites)a– appendb– binary modet– text mode (default)
Reading Files
with open("data/notes.txt", "r", encoding="utf-8") as f:
for line in f:
print(line.strip())
3) Text Files — Read & Write
Write Text
with open("data/info.txt", "w", encoding="utf-8") as f:
f.write("Line 1\n")
f.write("Line 2\n")
Read Text
with open("data/info.txt", "r", encoding="utf-8") as f:
content = f.read()
print(content)
Text files are flexible and simple, but when working with structured data, CSV and JSON are much more powerful.
4) CSV Files — Using the csv Module
CSVs represent rows and columns — perfect for spreadsheets, reports, and tabular datasets.
Reading CSV
import csv
with open("data/students.csv", newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(row["name"], row["score"])
Writing CSV
import csv
rows = [
{"name": "Ali", "score": 90},
{"name": "Ammar", "score": 85},
]
with open("data/students.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=["name", "score"])
writer.writeheader()
writer.writerows(rows)
5) JSON — The Most Popular Data Format
JSON looks like Python dictionaries, which makes it easy to work with. It's the standard for modern APIs.
Write JSON
import json
data = {"name": "Ammar", "score": 95}
with open("data/user.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=4)
Read JSON
with open("data/user.json", "r", encoding="utf-8") as f:
loaded = json.load(f)
print(loaded)
6) XML Basics — Using xml.etree.ElementTree
XML appears in older APIs and enterprise systems. It’s verbose but still used in many workflows.
import xml.etree.ElementTree as ET
tree = ET.parse("data/books.xml")
root = tree.getroot()
for book in root.findall("book"):
title = book.find("title").text
print(title)
7) Binary Formats — Pickle & Parquet
a) Pickle (Python Object Storage)
Warning: Never unpickle untrusted data.
import pickle
data = {"a": [1, 2, 3]}
with open("data/data.pkl", "wb") as f:
pickle.dump(data, f)
with open("data/data.pkl", "rb") as f:
loaded = pickle.load(f)
b) Parquet (Efficient Columnar Storage)
Parquet files are great for large datasets. They're usually used through pandas:
import pandas as pd
df = pd.read_csv("data/students.csv")
df.to_parquet("data/students.parquet", index=False)
df2 = pd.read_parquet("data/students.parquet")
8) Fetching API Data — Using requests
Most modern APIs return JSON. Here's how you grab it:
import requests
url = "https://api.github.com/repos/pandas-dev/pandas"
response = requests.get(url, timeout=10)
data = response.json()
print(data["full_name"])
print(data["stargazers_count"])
Always check status codes for errors:
response.raise_for_status()
9) Introduction to Pandas & DataFrames
Pandas is the most popular library for data analysis in Python. A DataFrame is a 2D table with labeled columns — perfect for working with CSV, Excel, JSON, APIs, and more.
Reading CSV into a DataFrame
import pandas as pd
df = pd.read_csv("data/students.csv")
print(df.head())
Reading Excel
df_excel = pd.read_excel("data/sales.xlsx", sheet_name="Sheet1")
Inspecting Data
print(df.info())
print(df.describe())
Selecting & Filtering Data
# Select columns
print(df[["name", "score"]])
# Filter rows
high_scores = df[df["score"] >= 90]
print(high_scores)
Exporting Data
df.to_csv("data/students_clean.csv", index=False)
df.to_excel("data/students_clean.xlsx", index=False)
df.to_parquet("data/students.parquet", index=False)
API JSON → DataFrame
import requests
import pandas as pd
url = "https://api.coindesk.com/v1/bpi/currentprice.json"
data = requests.get(url, timeout=10).json()
df = pd.DataFrame(data["bpi"]).T
print(df)
10) Mini Project — Simple Sales Report Using Pandas
import pandas as pd
# Load data
df = pd.read_csv("data/sales.csv")
# Clean and filter
df = df.dropna()
df = df[df["quantity"] > 0]
# Compute revenue
df["total"] = df["price"] * df["quantity"]
report = df.groupby("product")["total"].sum()
# Export summary
report.to_csv("data/sales_summary.csv")
report.to_excel("data/sales_summary.xlsx")
print("Report generated!")
Key Takeaways
- Use context managers (
with open) for safe file handling - CSV is great for tables, JSON for nested/API data
- XML still appears in enterprise systems
- Pickle saves Python objects, but only for trusted data
- Parquet is efficient for large datasets
requestsmakes API calls simple- Pandas DataFrames let you filter, analyze, and export data easily
Leave a comment
Your email address will not be published. Required fields are marked *
