Python Data Input & Output: Files, JSON, CSV, APIs & Pandas

Working with data is a core part of Python programming. Whether you're saving logs, loading spreadsheets, parsing JSON from APIs, or reading large datasets, everything begins with understanding how Python handles input and output.

This lesson covers text files, CSV, JSON, XML, binary formats like pickle/parquet, API requests, and a practical introduction to pandas DataFrames for real-world data analysis.

1) Why Data I/O Matters

Your programs become useful when they can talk to the outside world — files, folders, databases, or APIs. This lesson will help you:

Read and write files safely using context managers
Work with CSV, JSON, and XML
Use binary formats like pickle and parquet
Fetch JSON from web APIs using requests
Load, explore, and filter data using pandas DataFrames

2) File Handling — Paths, Modes & Context Managers

Before dealing with structured formats, you need a solid foundation in file basics. Python works with files using the open() function and the context manager pattern:


from pathlib import Path

path = Path("data/notes.txt")

with path.open("w", encoding="utf-8") as f:
    f.write("Hello, world!\n")

Why use with? It closes the file automatically — fewer bugs, cleaner code.

Common File Modes

r – read (default)
w – write (overwrites)
a – append
b – binary mode
t – text mode (default)

Reading Files


with open("data/notes.txt", "r", encoding="utf-8") as f:
    for line in f:
        print(line.strip())

3) Text Files — Read & Write

Write Text


with open("data/info.txt", "w", encoding="utf-8") as f:
    f.write("Line 1\n")
    f.write("Line 2\n")

Read Text


with open("data/info.txt", "r", encoding="utf-8") as f:
    content = f.read()
    print(content)

Text files are flexible and simple, but when working with structured data, CSV and JSON are much more powerful.

4) CSV Files — Using the `csv` Module

CSVs represent rows and columns — perfect for spreadsheets, reports, and tabular datasets.

Reading CSV


import csv

with open("data/students.csv", newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row["name"], row["score"])

Writing CSV


import csv

rows = [
    {"name": "Ali", "score": 90},
    {"name": "Ammar", "score": 85},
]

with open("data/students.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "score"])
    writer.writeheader()
    writer.writerows(rows)

5) JSON — The Most Popular Data Format

JSON looks like Python dictionaries, which makes it easy to work with. It's the standard for modern APIs.

Write JSON


import json

data = {"name": "Ammar", "score": 95}

with open("data/user.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=4)

Read JSON


with open("data/user.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)
    print(loaded)

6) XML Basics — Using `xml.etree.ElementTree`

XML appears in older APIs and enterprise systems. It’s verbose but still used in many workflows.


import xml.etree.ElementTree as ET

tree = ET.parse("data/books.xml")
root = tree.getroot()

for book in root.findall("book"):
    title = book.find("title").text
    print(title)

7) Binary Formats — Pickle & Parquet

a) Pickle (Python Object Storage)

Warning: Never unpickle untrusted data.


import pickle

data = {"a": [1, 2, 3]}

with open("data/data.pkl", "wb") as f:
    pickle.dump(data, f)

with open("data/data.pkl", "rb") as f:
    loaded = pickle.load(f)

b) Parquet (Efficient Columnar Storage)

Parquet files are great for large datasets. They're usually used through pandas:


import pandas as pd

df = pd.read_csv("data/students.csv")
df.to_parquet("data/students.parquet", index=False)

df2 = pd.read_parquet("data/students.parquet")

8) Fetching API Data — Using `requests`

Most modern APIs return JSON. Here's how you grab it:


import requests

url = "https://api.github.com/repos/pandas-dev/pandas"
response = requests.get(url, timeout=10)
data = response.json()

print(data["full_name"])
print(data["stargazers_count"])

Always check status codes for errors:


response.raise_for_status()

9) Introduction to Pandas & DataFrames

Pandas is the most popular library for data analysis in Python. A DataFrame is a 2D table with labeled columns — perfect for working with CSV, Excel, JSON, APIs, and more.

Reading CSV into a DataFrame


import pandas as pd

df = pd.read_csv("data/students.csv")
print(df.head())

Reading Excel


df_excel = pd.read_excel("data/sales.xlsx", sheet_name="Sheet1")

Inspecting Data


print(df.info())
print(df.describe())

Selecting & Filtering Data


# Select columns
print(df[["name", "score"]])

# Filter rows
high_scores = df[df["score"] >= 90]
print(high_scores)

Exporting Data


df.to_csv("data/students_clean.csv", index=False)
df.to_excel("data/students_clean.xlsx", index=False)
df.to_parquet("data/students.parquet", index=False)

API JSON → DataFrame


import requests
import pandas as pd

url = "https://api.coindesk.com/v1/bpi/currentprice.json"
data = requests.get(url, timeout=10).json()

df = pd.DataFrame(data["bpi"]).T
print(df)

10) Mini Project — Simple Sales Report Using Pandas


import pandas as pd

# Load data
df = pd.read_csv("data/sales.csv")

# Clean and filter
df = df.dropna()
df = df[df["quantity"] > 0]

# Compute revenue
df["total"] = df["price"] * df["quantity"]
report = df.groupby("product")["total"].sum()

# Export summary
report.to_csv("data/sales_summary.csv")
report.to_excel("data/sales_summary.xlsx")

print("Report generated!")

Key Takeaways

Use context managers (with open) for safe file handling
CSV is great for tables, JSON for nested/API data
XML still appears in enterprise systems
Pickle saves Python objects, but only for trusted data
Parquet is efficient for large datasets
requests makes API calls simple
Pandas DataFrames let you filter, analyze, and export data easily

Next up: Modules, CLI Apps & Environment Variables

Ahmad Ali

Your email address will not be published. Required fields are marked *

Comment

Name

Website

Save my name, email, and website in this browser for the next time I comment.

Python Data Input & Output: Files, JSON, CSV, APIs & Pandas

1) Why Data I/O Matters

2) File Handling — Paths, Modes & Context Managers

Common File Modes

Reading Files

3) Text Files — Read & Write

Write Text

Read Text

4) CSV Files — Using the csv Module

Reading CSV

Writing CSV

5) JSON — The Most Popular Data Format

Write JSON

Read JSON

6) XML Basics — Using xml.etree.ElementTree

7) Binary Formats — Pickle & Parquet

a) Pickle (Python Object Storage)

b) Parquet (Efficient Columnar Storage)

8) Fetching API Data — Using requests

9) Introduction to Pandas & DataFrames

Reading CSV into a DataFrame

Reading Excel

Inspecting Data

Selecting & Filtering Data

Exporting Data

API JSON → DataFrame

10) Mini Project — Simple Sales Report Using Pandas

Key Takeaways

Ahmad Ali

Leave a comment

4) CSV Files — Using the `csv` Module

6) XML Basics — Using `xml.etree.ElementTree`

8) Fetching API Data — Using `requests`