Language:

Search

Python Data Input & Output: Files, JSON, CSV, APIs & Pandas

Working with data is a core part of Python programming. Whether you're saving logs, loading spreadsheets, parsing JSON from APIs, or reading large datasets, everything begins with understanding how Python handles input and output.

This lesson covers text files, CSV, JSON, XML, binary formats like pickle/parquet, API requests, and a practical introduction to pandas DataFrames for real-world data analysis.


1) Why Data I/O Matters

Your programs become useful when they can talk to the outside world — files, folders, databases, or APIs. This lesson will help you:

  • Read and write files safely using context managers
  • Work with CSV, JSON, and XML
  • Use binary formats like pickle and parquet
  • Fetch JSON from web APIs using requests
  • Load, explore, and filter data using pandas DataFrames

2) File Handling — Paths, Modes & Context Managers

Before dealing with structured formats, you need a solid foundation in file basics. Python works with files using the open() function and the context manager pattern:


from pathlib import Path

path = Path("data/notes.txt")

with path.open("w", encoding="utf-8") as f:
    f.write("Hello, world!\n")

Why use with? It closes the file automatically — fewer bugs, cleaner code.

Common File Modes

  • r – read (default)
  • w – write (overwrites)
  • a – append
  • b – binary mode
  • t – text mode (default)

Reading Files


with open("data/notes.txt", "r", encoding="utf-8") as f:
    for line in f:
        print(line.strip())

3) Text Files — Read & Write

Write Text


with open("data/info.txt", "w", encoding="utf-8") as f:
    f.write("Line 1\n")
    f.write("Line 2\n")

Read Text


with open("data/info.txt", "r", encoding="utf-8") as f:
    content = f.read()
    print(content)

Text files are flexible and simple, but when working with structured data, CSV and JSON are much more powerful.


4) CSV Files — Using the csv Module

CSVs represent rows and columns — perfect for spreadsheets, reports, and tabular datasets.

Reading CSV


import csv

with open("data/students.csv", newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row["name"], row["score"])

Writing CSV


import csv

rows = [
    {"name": "Ali", "score": 90},
    {"name": "Ammar", "score": 85},
]

with open("data/students.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "score"])
    writer.writeheader()
    writer.writerows(rows)

5) JSON — The Most Popular Data Format

JSON looks like Python dictionaries, which makes it easy to work with. It's the standard for modern APIs.

Write JSON


import json

data = {"name": "Ammar", "score": 95}

with open("data/user.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=4)

Read JSON


with open("data/user.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)
    print(loaded)

6) XML Basics — Using xml.etree.ElementTree

XML appears in older APIs and enterprise systems. It’s verbose but still used in many workflows.


import xml.etree.ElementTree as ET

tree = ET.parse("data/books.xml")
root = tree.getroot()

for book in root.findall("book"):
    title = book.find("title").text
    print(title)

7) Binary Formats — Pickle & Parquet

a) Pickle (Python Object Storage)

Warning: Never unpickle untrusted data.


import pickle

data = {"a": [1, 2, 3]}

with open("data/data.pkl", "wb") as f:
    pickle.dump(data, f)

with open("data/data.pkl", "rb") as f:
    loaded = pickle.load(f)

b) Parquet (Efficient Columnar Storage)

Parquet files are great for large datasets. They're usually used through pandas:


import pandas as pd

df = pd.read_csv("data/students.csv")
df.to_parquet("data/students.parquet", index=False)

df2 = pd.read_parquet("data/students.parquet")

8) Fetching API Data — Using requests

Most modern APIs return JSON. Here's how you grab it:


import requests

url = "https://api.github.com/repos/pandas-dev/pandas"
response = requests.get(url, timeout=10)
data = response.json()

print(data["full_name"])
print(data["stargazers_count"])

Always check status codes for errors:


response.raise_for_status()

9) Introduction to Pandas & DataFrames

Pandas is the most popular library for data analysis in Python. A DataFrame is a 2D table with labeled columns — perfect for working with CSV, Excel, JSON, APIs, and more.

Reading CSV into a DataFrame


import pandas as pd

df = pd.read_csv("data/students.csv")
print(df.head())

Reading Excel


df_excel = pd.read_excel("data/sales.xlsx", sheet_name="Sheet1")

Inspecting Data


print(df.info())
print(df.describe())

Selecting & Filtering Data


# Select columns
print(df[["name", "score"]])

# Filter rows
high_scores = df[df["score"] >= 90]
print(high_scores)

Exporting Data


df.to_csv("data/students_clean.csv", index=False)
df.to_excel("data/students_clean.xlsx", index=False)
df.to_parquet("data/students.parquet", index=False)

API JSON → DataFrame


import requests
import pandas as pd

url = "https://api.coindesk.com/v1/bpi/currentprice.json"
data = requests.get(url, timeout=10).json()

df = pd.DataFrame(data["bpi"]).T
print(df)

10) Mini Project — Simple Sales Report Using Pandas


import pandas as pd

# Load data
df = pd.read_csv("data/sales.csv")

# Clean and filter
df = df.dropna()
df = df[df["quantity"] > 0]

# Compute revenue
df["total"] = df["price"] * df["quantity"]
report = df.groupby("product")["total"].sum()

# Export summary
report.to_csv("data/sales_summary.csv")
report.to_excel("data/sales_summary.xlsx")

print("Report generated!")

Key Takeaways

  • Use context managers (with open) for safe file handling
  • CSV is great for tables, JSON for nested/API data
  • XML still appears in enterprise systems
  • Pickle saves Python objects, but only for trusted data
  • Parquet is efficient for large datasets
  • requests makes API calls simple
  • Pandas DataFrames let you filter, analyze, and export data easily

Next up: Modules, CLI Apps & Environment Variables

Ahmad Ali

Ahmad Ali

Leave a comment

Your email address will not be published. Required fields are marked *

Your experience on this site will be improved by allowing cookies Cookie Policy