Why JSON to CSV Is Non-Trivial
JSON allows nested objects and arrays:
{
"user": {
"id": 1,
"name": "Alice",
"address": {
"city": "Boston",
"country": "USA"
},
"orders": [
{ "id": 100, "amount": 50 },
{ "id": 101, "amount": 75 }
]
}
}
CSV is flat:
id,name,city,country
1,Alice,Boston,USA
Converting nested JSON to CSV requires deciding:
- Flatten nested objects (separate columns for
address.city,address.country) - Explode arrays (create one row per array element)
- Keep nested as JSON strings in cells
Different decisions produce dramatically different CSV outputs. This post covers the practical pandas workflow. For broader CSV context, see XLS vs XLSX vs CSV.
Pandas Approach to Flat JSON
For simple JSON arrays:
import pandas as pd
# JSON array of flat objects
data = [
{"id": 1, "name": "Alice", "city": "Boston"},
{"id": 2, "name": "Bob", "city": "Seattle"},
]
df = pd.DataFrame(data)
df.to_csv("output.csv", index=False)
For most "data export" JSON: flat structure converts cleanly.
Flattening Nested Objects
For JSON with nested objects (not arrays):
import pandas as pd
import json
with open("input.json") as f:
data = json.load(f)
# pandas json_normalize handles nested
df = pd.json_normalize(data)
# Nested keys become 'parent.child' columns
df.to_csv("output.csv", index=False)
Result:
id,name,address.city,address.country
1,Alice,Boston,USA
The dot notation indicates nesting. Tool-friendly but ugly. Rename for production:
df.rename(columns={"address.city": "city", "address.country": "country"}, inplace=True)
For batch processing, see Batch Processing Files Guide.
Exploding Arrays
For JSON with array fields:
import pandas as pd
data = {
"user": "Alice",
"orders": [{"id": 100, "amount": 50}, {"id": 101, "amount": 75}]
}
df = pd.json_normalize(data, record_path="orders", meta="user")
Result:
id,amount,user
100,50,Alice
101,75,Alice
The array is "exploded" into multiple rows. The non-array fields are repeated for each row.
For arrays of multiple types, pandas requires explicit definition. Complex JSON often needs custom handling.
Multi-level Nesting
For deeply nested JSON:
{
"company": "Acme",
"departments": [
{
"name": "Eng",
"employees": [{ "name": "Alice" }, { "name": "Bob" }]
},
{
"name": "Sales",
"employees": [{ "name": "Charlie" }]
}
]
}
To produce one row per employee:
data = {
"company": "Acme",
"departments": [
{"name": "Eng", "employees": [{"name": "Alice"}, {"name": "Bob"}]},
{"name": "Sales", "employees": [{"name": "Charlie"}]}
]
}
# Multi-level flatten
df = pd.json_normalize(
data,
record_path=["departments", "employees"],
meta=["company", ["departments", "name"]]
)
Result:
name,company,departments.name
Alice,Acme,Eng
Bob,Acme,Eng
Charlie,Acme,Sales
For deeply nested data, the pandas API has options for navigating the structure.
Mixed Array Types
Arrays of different types (strings + objects mixed) are tricky:
{
"user": "Alice",
"tags": ["important", { "label": "VIP", "level": 3 }]
}
Solution: pre-process to normalize types:
# Convert all elements to dicts
def normalize(item):
return item if isinstance(item, dict) else {"value": item}
data["tags"] = [normalize(t) for t in data["tags"]]
For complex mixed-type arrays: write custom handling logic.
Streaming Large JSON
For multi-GB JSON files (don't fit in memory):
import ijson
with open("large.json", "rb") as f:
for record in ijson.items(f, "items.item"):
# Process each record individually
yield record
ijson is a streaming JSON parser. Doesn't load entire file into memory.
For batch CSV writing:
import ijson
import csv
with open("large.json", "rb") as fin, open("output.csv", "w") as fout:
writer = csv.DictWriter(fout, fieldnames=["id", "name", "amount"])
writer.writeheader()
for record in ijson.items(fin, "items.item"):
writer.writerow({
"id": record["id"],
"name": record["name"],
"amount": record["amount"]
})
Streaming approach handles arbitrarily large JSON files.
CSV to JSON
For the reverse direction:
import pandas as pd
df = pd.read_csv("input.csv")
df.to_json("output.json", orient="records", indent=2)
orient="records" produces an array of objects (most common JSON format).
Other orients:
index: JSON object keyed by row indexcolumns: JSON object keyed by column namevalues: just the data, no metadatasplit: separate metadata and datatable: with schema metadata
For most CSV-to-JSON: orient="records".
Pivot and Aggregation
For JSON like:
[
{ "date": "2026-01", "product": "A", "sales": 100 },
{ "date": "2026-01", "product": "B", "sales": 150 },
{ "date": "2026-02", "product": "A", "sales": 120 },
{ "date": "2026-02", "product": "B", "sales": 180 }
]
Pivot to wide format:
df = pd.read_json("data.json")
pivoted = df.pivot(index="date", columns="product", values="sales")
pivoted.to_csv("output.csv")
Result:
date,A,B
2026-01,100,150
2026-02,120,180
For complex pivots: pandas's pivot_table with aggregation functions.
Common Issues
Numbers showing as strings: JSON has different types than CSV. Force types:
df = pd.read_json("data.json", dtype={"id": int, "amount": float})
Date format inconsistent: parse explicitly:
df["date"] = pd.to_datetime(df["date"])
df["date"] = df["date"].dt.strftime("%Y-%m-%d")
Encoding issues with special characters: ensure UTF-8 throughout:
df.to_csv("output.csv", encoding="utf-8", index=False)
Memory error on large JSON: use streaming with ijson.
Nested JSON in CSV cells: store as JSON string:
df["nested"] = df["nested"].apply(json.dumps)
df.to_csv("output.csv", index=False)
For batch CSV processing, see Batch Text Replacement in CSV.
Tools Beyond Pandas
| Tool | Use case |
|---|---|
| jq (command-line) | Quick JSON manipulation |
| miller (mlr) | CSV/JSON command-line conversion |
| csvkit | CSV-focused tools |
| jc | Convert command output to JSON |
| dasel | Multi-format query language |
For one-off conversions: jq or miller. For complex transformations or pipelines: pandas.
# jq example: extract specific field
jq -r '.users[] | [.id, .name, .email] | @csv' input.json > output.csv
# miller example: nested JSON to flat CSV
mlr --ijson --ocsv flatten input.json > output.csv
Frequently Asked Questions
Should I use pandas or jq for JSON to CSV?
For one-off command-line work: jq. For complex programmatic transformations or large data: pandas.
How do I handle null values?
Pandas treats null as NaN by default. CSV represents as empty string. Configure:
df.to_csv("output.csv", na_rep="NULL")
What about Excel format?
df.to_excel("output.xlsx", index=False, engine="openpyxl")
For Excel-specific work, see XLS vs XLSX vs CSV.
Can I convert JSON Lines (JSONL)?
Yes:
df = pd.read_json("input.jsonl", lines=True)
df.to_csv("output.csv", index=False)
JSONL has one JSON object per line. Common for log files and streaming data.
Performance for very large files?
Streaming with ijson + csv module is fastest. Pandas with chunking works for moderate sizes.
How do I keep nested structure as a single CSV cell?
df["nested"] = df["nested"].apply(json.dumps)
df.to_csv("output.csv", index=False)
Stored as JSON string. Re-parseable later.
Related Reading
Bottom Line
For JSON to CSV conversion: pandas with json_normalize for flattening nested objects, with record_path for exploding arrays. For large files: ijson streaming. For one-off work: jq or miller. Always handle types and encoding explicitly. Our document converter handles related format conversions in pipelines.



