python-json-parsing

Python JSON Parsing Best Practices

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "python-json-parsing" with this command: npx skills add basher83/lunar-claude/basher83-lunar-claude-python-json-parsing

Python JSON Parsing Best Practices

Comprehensive guide to JSON parsing in Python with focus on performance, security, and scalability.

Quick Start

Basic JSON Parsing

import json

Parse JSON string

data = json.loads('{"name": "Alice", "age": 30}')

Parse JSON file

with open("data.json", "r", encoding="utf-8") as f: data = json.load(f)

Write JSON file

with open("output.json", "w", encoding="utf-8") as f: json.dump(data, f, indent=2)

Key Rule: Always specify encoding="utf-8" when reading/writing files.

When to Use This Skill

Use this skill when:

  • Working with JSON APIs or data interchange

  • Optimizing JSON performance in high-throughput applications

  • Handling large JSON files (> 100MB)

  • Securing applications against JSON injection

  • Extracting data from complex nested JSON structures

Performance: Choose the Right Library

Library Comparison (10,000 records benchmark)

Library Serialize (s) Deserialize (s) Best For

orjson 0.42 1.27 FastAPI, web APIs (3.9x faster)

msgspec 0.49 0.93 Maximum performance (1.7x faster deserialization)

json (stdlib) 1.62 1.62 Universal compatibility

ujson 1.41 1.85 Drop-in replacement (2x faster)

Recommendation:

  • Use orjson for FastAPI/web APIs (native support, fastest serialization)

  • Use msgspec for data pipelines (fastest overall, typed validation)

  • Use json when compatibility is critical

Installation

High-performance libraries

pip install orjson msgspec ujson

Advanced querying

pip install jsonpath-ng jmespath

Streaming large files

pip install ijson

Schema validation

pip install jsonschema

Large Files: Streaming Strategies

For files > 100MB, avoid loading into memory.

Strategy 1: JSONL (JSON Lines)

Convert large JSON arrays to line-delimited format:

Stream process JSONL

with open("large.jsonl", "r") as infile, open("output.jsonl", "w") as outfile: for line in infile: obj = json.loads(line) obj["processed"] = True outfile.write(json.dumps(obj) + "\n")

Strategy 2: Streaming with ijson

import ijson

Process large JSON without loading into memory

with open("huge.json", "rb") as f: for item in ijson.items(f, "products.item"): process(item) # Handle one item at a time

See: patterns/streaming-large-json.md

Security: Prevent JSON Injection

Critical Rules:

  • Always use json.loads() , never eval()

  • Validate input with jsonschema

  • Sanitize user input before serialization

  • Escape special characters (" and
    )

Vulnerable Code:

NEVER DO THIS

username = request.GET['username'] # User input: admin", "role": "admin json_string = f'{{"user":"{username}","role":"user"}}'

Result: privilege escalation

Secure Code:

Use json.dumps for serialization

data = {"user": username, "role": "user"} json_string = json.dumps(data) # Properly escaped

See: anti-patterns/security-json-injection.md , anti-patterns/eval-usage.md

Advanced: JSONPath for Complex Queries

Extract data from nested JSON without complex loops:

import jsonpath_ng as jp

data = { "products": [ {"name": "Apple", "price": 12.88}, {"name": "Peach", "price": 27.25} ] }

Filter by price

query = jp.parse("products[?price>20].name") results = [match.value for match in query.find(data)]

Output: ["Peach"]

Key Operators:

  • $

  • Root selector

  • ..

  • Recursive descendant

  • Wildcard

  • [?<predicate>]

  • Filter (e.g., [?price > 20] )

  • [start:end:step]

  • Array slicing

See: patterns/jsonpath-querying.md

Custom Objects: Serialization

Handle datetime, UUID, Decimal, and custom classes:

from datetime import datetime import json

class CustomEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, datetime): return obj.isoformat() if isinstance(obj, set): return list(obj) return super().default(obj)

Usage

data = {"timestamp": datetime.now(), "tags": {"python", "json"}} json_str = json.dumps(data, cls=CustomEncoder)

See: patterns/custom-object-serialization.md

Performance Checklist

  • Use orjson/msgspec for high-throughput applications

  • Specify UTF-8 encoding when reading/writing files

  • Use streaming (ijson/JSONL) for files > 100MB

  • Minify JSON for production (separators=(',', ':') )

  • Pretty-print for development (indent=2 )

Security Checklist

  • Never use eval() for JSON parsing

  • Validate input with jsonschema

  • Sanitize user input before serialization

  • Use json.dumps() to prevent injection

  • Escape special characters in user data

Reference Documentation

Performance:

  • reference/python-json-parsing-best-practices-2025.md
  • Comprehensive research with benchmarks

Patterns:

  • patterns/streaming-large-json.md

  • ijson and JSONL strategies

  • patterns/custom-object-serialization.md

  • Handle datetime, UUID, custom classes

  • patterns/jsonpath-querying.md

  • Advanced nested data extraction

Security:

  • anti-patterns/security-json-injection.md

  • Prevent injection attacks

  • anti-patterns/eval-usage.md

  • Why never to use eval()

Examples:

  • examples/high-performance-parsing.py

  • orjson and msgspec code

  • examples/large-file-streaming.py

  • Streaming with ijson

  • examples/secure-validation.py

  • jsonschema validation

Tools:

  • tools/json-performance-benchmark.py
  • Benchmark different libraries

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

skill-development

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

coderabbit

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

agent-development

No summary provided by upstream source.

Repository SourceNeeds Review