moai-formats-data

Data Format Specialist

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "moai-formats-data" with this command: npx skills add rdmptv/adbautoplayer/rdmptv-adbautoplayer-moai-formats-data

Data Format Specialist

Quick Reference (30 seconds)

Advanced Data Format Management - Comprehensive data handling covering TOON encoding, JSON/YAML optimization, serialization patterns, and data validation for performance-critical applications.

Core Capabilities:

  • TOON Encoding: 40-60% token reduction vs JSON for LLM communication

  • JSON/YAML Optimization: Efficient serialization and parsing patterns

  • Data Validation: Schema validation, type checking, error handling

  • Format Conversion: Seamless transformation between data formats

  • Performance: Optimized data structures and caching strategies

  • Schema Management: Dynamic schema generation and evolution

When to Use:

  • Optimizing data transmission to LLMs within token budgets

  • High-performance serialization/deserialization

  • Schema validation and data integrity

  • Format conversion and data transformation

  • Large dataset processing and optimization

Quick Start:

TOON encoding (40-60% token reduction)

from moai_formats_data import TOONEncoder encoder = TOONEncoder() compressed = encoder.encode({"user": "John", "age": 30}) original = encoder.decode(compressed)

Fast JSON processing

from moai_formats_data import JSONOptimizer optimizer = JSONOptimizer() fast_json = optimizer.serialize_fast(large_dataset)

Data validation

from moai_formats_data import DataValidator validator = DataValidator() schema = validator.create_schema({"name": {"type": "string", "required": True}}) result = validator.validate({"name": "John"}, schema)

Implementation Guide (5 minutes)

Core Concepts

TOON (Token-Optimized Object Notation):

  • Custom binary-compatible format optimized for LLM token usage

  • Type markers: # (numbers), ! (booleans), @ (timestamps), ~ (null)

  • 40-60% size reduction vs JSON for typical data structures

  • Lossless round-trip encoding/decoding

Performance Optimization:

  • Ultra-fast JSON processing with orjson (2-5x faster than standard json)

  • Streaming processing for large datasets using ijson

  • Intelligent caching with LRU eviction and memory management

  • Schema compression and validation optimization

Data Validation:

  • Type-safe validation with custom rules and patterns

  • Schema evolution and migration support

  • Cross-field validation and dependency checking

  • Performance-optimized batch validation

Basic Implementation

from moai_formats_data import TOONEncoder, JSONOptimizer, DataValidator from datetime import datetime

1. TOON Encoding for LLM optimization

encoder = TOONEncoder() data = { "user": {"id": 123, "name": "John", "active": True, "created": datetime.now()}, "permissions": ["read", "write", "admin"] }

Encode and compare sizes

toon_data = encoder.encode(data) original_data = encoder.decode(toon_data)

2. Fast JSON Processing

optimizer = JSONOptimizer()

Ultra-fast serialization

json_bytes = optimizer.serialize_fast(data) parsed_data = optimizer.deserialize_fast(json_bytes)

Schema compression for repeated validation

schema = {"type": "object", "properties": {"name": {"type": "string"}}} compressed_schema = optimizer.compress_schema(schema)

3. Data Validation

validator = DataValidator()

Create validation schema

user_schema = validator.create_schema({ "username": {"type": "string", "required": True, "min_length": 3}, "email": {"type": "email", "required": True}, "age": {"type": "integer", "required": False, "min_value": 13} })

Validate data

user_data = {"username": "john_doe", "email": "john@example.com", "age": 30} result = validator.validate(user_data, user_schema)

if result['valid']: print("Data is valid!") sanitized = result['sanitized_data'] else: print("Validation errors:", result['errors'])

Common Use Cases

API Response Optimization:

Optimize API responses for LLM consumption

def optimize_api_response(data: Dict) -> str: encoder = TOONEncoder() return encoder.encode(data)

Parse optimized responses

def parse_optimized_response(toon_data: str) -> Dict: encoder = TOONEncoder() return encoder.decode(toon_data)

Configuration Management:

Fast YAML configuration loading

from moai_formats_data import YAMLOptimizer

yaml_optimizer = YAMLOptimizer() config = yaml_optimizer.load_fast("config.yaml")

Merge multiple configurations

merged = yaml_optimizer.merge_configs(base_config, env_config, user_config)

Large Dataset Processing:

Stream processing for large JSON files

from moai_formats_data import StreamProcessor

processor = StreamProcessor(chunk_size=8192)

Process file line by line without loading into memory

def process_item(item): print(f"Processing: {item['id']}")

processor.process_json_stream("large_dataset.json", process_item)

Advanced Features (10+ minutes)

Advanced TOON Features

Custom Type Handlers:

Extend TOON encoder with custom types

class CustomTOONEncoder(TOONEncoder): def _encode_value(self, value):

Handle UUID objects

if hasattr(value, 'hex') and len(value.hex) == 32: # UUID return f'${value.hex}'

Handle Decimal objects

elif hasattr(value, 'as_tuple'): # Decimal return f'&{str(value)}'

return super()._encode_value(value)

def _parse_value(self, s):

Parse custom UUIDs

if s.startswith('$') and len(s) == 33: import uuid return uuid.UUID(s[1:])

Parse custom Decimals

elif s.startswith('&'): from decimal import Decimal return Decimal(s[1:])

return super()._parse_value(s)

Streaming TOON Processing:

Process TOON data in streaming mode

def stream_toon_data(data_generator): encoder = TOONEncoder() for data in data_generator: yield encoder.encode(data)

Batch TOON processing

def batch_encode_toon(data_list: List[Dict], batch_size: int = 1000): encoder = TOONEncoder() results = []

for i in range(0, len(data_list), batch_size): batch = data_list[i:i + batch_size] encoded_batch = [encoder.encode(item) for item in batch] results.extend(encoded_batch)

return results

Advanced Validation Patterns

Cross-Field Validation:

Validate relationships between fields

class CrossFieldValidator: def init(self): self.base_validator = DataValidator()

def validate_user_data(self, data: Dict) -> Dict:

Base validation

schema = self.base_validator.create_schema({ "password": {"type": "string", "required": True, "min_length": 8}, "confirm_password": {"type": "string", "required": True}, "email": {"type": "email", "required": True} })

result = self.base_validator.validate(data, schema)

Cross-field validation

if data.get("password") != data.get("confirm_password"): result['errors']['password_mismatch'] = "Passwords do not match" result['valid'] = False

return result

Schema Evolution:

Handle schema changes over time

from moai_formats_data import SchemaEvolution

evolution = SchemaEvolution()

Define schema versions

v1_schema = {"name": {"type": "string"}, "age": {"type": "integer"}} v2_schema = {"full_name": {"type": "string"}, "age": {"type": "integer"}, "email": {"type": "email"}}

Register schemas

evolution.register_schema("v1", v1_schema) evolution.register_schema("v2", v2_schema)

Add migration function

def migrate_v1_to_v2(data: Dict) -> Dict: return { "full_name": data["name"], "age": data["age"], "email": None # New required field }

evolution.add_migration("v1", "v2", migrate_v1_to_v2)

Migrate data

old_data = {"name": "John Doe", "age": 30} new_data = evolution.migrate_data(old_data, "v1", "v2")

Performance Optimization

Intelligent Caching:

from moai_formats_data import SmartCache

Create cache with memory constraints

cache = SmartCache(max_memory_mb=50, max_items=10000)

@cache.cache.cache_result(ttl=1800) # 30 minutes def expensive_data_processing(data: Dict) -> Dict:

Simulate expensive computation

time.sleep(0.1) return {"processed": True, "data": data}

Cache statistics

print(cache.get_stats())

Cache warming

def warm_common_data(): common_queries = [ {"type": "user", "id": 1}, {"type": "user", "id": 2}, {"type": "config", "key": "app"} ]

for query in common_queries: expensive_data_processing(query)

warm_common_data()

Batch Processing Optimization:

Optimized batch validation

def validate_batch_optimized(data_list: List[Dict], schema: Dict) -> List[Dict]: validator = DataValidator()

Pre-compile patterns for performance

validator._compile_schema_patterns(schema)

Process in batches for memory efficiency

batch_size = 1000 results = []

for i in range(0, len(data_list), batch_size): batch = data_list[i:i + batch_size] batch_results = [validator.validate(data, schema) for data in batch] results.extend(batch_results)

return results

Integration Patterns

LLM Integration:

Prepare data for LLM consumption

def prepare_for_llm(data: Dict, max_tokens: int = 2000) -> str: encoder = TOONEncoder() toon_data = encoder.encode(data)

Check token count

estimated_tokens = len(toon_data.split())

if estimated_tokens > max_tokens:

Implement data reduction strategy

reduced_data = reduce_data_complexity(data, max_tokens) toon_data = encoder.encode(reduced_data)

return toon_data

def reduce_data_complexity(data: Dict, max_tokens: int) -> Dict: """Reduce data complexity to fit token budget."""

Implement selective field removal

priority_fields = ["id", "name", "email", "status"] reduced = {k: v for k, v in data.items() if k in priority_fields}

Further reduction if needed

encoder = TOONEncoder() while len(encoder.encode(reduced).split()) > max_tokens:

Remove least important fields

if len(reduced) <= 1: break reduced.popitem()

return reduced

Database Integration:

Optimize database queries with format conversion

def optimize_db_response(db_data: List[Dict]) -> Dict:

Convert database results to optimized format

optimizer = JSONOptimizer()

Compress and cache schema

common_schema = {"type": "object", "properties": {"id": {"type": "integer"}}} compressed_schema = optimizer.compress_schema(common_schema)

Process in batches

processor = StreamProcessor() processed_data = []

for item in db_data:

Apply validation and transformation

processed_item = transform_db_item(item) processed_data.append(processed_item)

return { "data": processed_data, "count": len(processed_data), "schema": compressed_schema }

Works Well With

  • moai-domain-backend - Backend data serialization and API responses

  • moai-domain-database - Database data format optimization

  • moai-integration-mcp - MCP data serialization and transmission

  • moai-docs-generation - Documentation data formatting

  • moai-foundation-core - Core data architecture principles

Module References

Core Implementation Modules:

  • modules/toon-encoding.md

  • TOON encoding implementation and examples

  • modules/json-optimization.md

  • High-performance JSON/YAML processing

  • modules/data-validation.md

  • Advanced validation and schema management

  • modules/caching-performance.md

  • Caching strategies and performance optimization

Usage Examples

CLI Usage

Encode data to TOON format

moai-formats encode-toon --input data.json --output data.toon

Validate data against schema

moai-formats validate --schema schema.json --data data.json

Convert between formats

moai-formats convert --input data.json --output data.yaml --format yaml

Optimize JSON structure

moai-formats optimize-json --input large-data.json --output optimized.json

Python API

from moai_formats_data import TOONEncoder, DataValidator, JSONOptimizer

TOON encoding

encoder = TOONEncoder() toon_data = encoder.encode({"user": "John", "age": 30}) original_data = encoder.decode(toon_data)

Data validation

validator = DataValidator() schema = validator.create_schema({ "name": {"type": "string", "required": True, "min_length": 2}, "email": {"type": "email", "required": True} }) result = validator.validate({"name": "John", "email": "john@example.com"}, schema)

JSON optimization

optimizer = JSONOptimizer() fast_json = optimizer.serialize_fast(large_dataset) parsed_data = optimizer.deserialize_fast(fast_json)

Technology Stack

Core Libraries:

  • orjson: Ultra-fast JSON parsing and serialization

  • PyYAML: YAML processing with C-based loaders

  • ijson: Streaming JSON parser for large files

  • python-dateutil: Advanced datetime parsing

  • regex: Advanced regular expression support

Performance Tools:

  • lru_cache: Built-in memoization

  • pickle: Object serialization

  • hashlib: Hash generation for caching

  • functools: Function decorators and utilities

Validation Libraries:

  • jsonschema: JSON Schema validation

  • cerberus: Lightweight data validation

  • marshmallow: Object serialization/deserialization

  • pydantic: Data validation using Python type hints

Status: Production Ready Last Updated: 2025-11-30 Maintained by: MoAI-ADK Data Team

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

moai-domain-adb

No summary provided by upstream source.

Repository SourceNeeds Review
General

macos-resource-optimizer

No summary provided by upstream source.

Repository SourceNeeds Review
General

decision-logic-framework

No summary provided by upstream source.

Repository SourceNeeds Review
General

moai-connector-nano-banana

No summary provided by upstream source.

Repository SourceNeeds Review