spice-acceleration

Accelerate data locally for sub-second query performance. Use when enabling data acceleration, choosing an engine (Arrow, DuckDB, SQLite, Cayenne), configuring refresh modes, setting up retention policies, creating snapshots, adding indexes, or materializing datasets.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "spice-acceleration" with this command: npx skills add spiceai/skills/spiceai-skills-spice-acceleration

Accelerate Data

Data acceleration materializes working sets of data locally, reducing query latency from seconds to milliseconds. Hot data gets materialized for instant access while cold data remains federated.

Unlike traditional caches that store query results, Spice accelerates entire datasets with configurable refresh strategies and the flexible compute of an embedded database.

Enable Acceleration

datasets:
  - from: postgres:my_table
    name: my_table
    acceleration:
      enabled: true
      engine: duckdb # arrow, duckdb, sqlite, cayenne, postgres, turso
      mode: memory # memory or file
      refresh_check_interval: 1h

Choosing an Engine

Use CaseEngineWhy
Small datasets (<1 GB), max speedarrowIn-memory, lowest latency
Medium datasets (1-100 GB), complex SQLduckdbMature SQL, memory management
Large datasets (100 GB-1+ TB), analyticscayenneBuilt on Vortex (Linux Foundation), 10-20x faster scans
Point lookups on large datasetscayenne100x faster random access vs Parquet
Simple queries, low resource usagesqliteLightweight, minimal overhead
Async operations, concurrent workloadstursoNative async, modern connection pooling
External database integrationpostgresLeverage existing PostgreSQL infra

Cayenne vs DuckDB

Choose Cayenne when datasets exceed ~1 TB, multi-file ingestion is needed, or point lookups are common. Choose DuckDB when datasets are under ~1 TB, complex SQL (window functions, CTEs) is needed, or DuckDB tooling is beneficial.

Supported Engines

EngineModeStatus
arrowmemoryStable
duckdbmemory, fileStable
sqlitememory, fileRelease Candidate
cayennefileBeta
postgresN/A (attached)Release Candidate
tursomemory, fileBeta

Refresh Modes

ModeDescriptionUse Case
fullComplete dataset replacement on each refreshSmall, slowly-changing datasets
append (batch)Adds new records based on a time_columnAppend-only logs, time-series data
append (stream)Continuous streaming without time columnReal-time event streams (Kafka, Debezium)
changesCDC-based incremental updates via Debezium or DynamoDB StreamsFrequently updated transactional data
cachingRequest-based row-level cachingAPI responses, HTTP endpoints
# Full refresh every 8 hours
acceleration:
  refresh_mode: full
  refresh_check_interval: 8h

# Append mode: check for new records from the last day every 10 minutes
acceleration:
  refresh_mode: append
  time_column: created_at
  refresh_check_interval: 10m
  refresh_data_window: 1d

# Continuous ingestion using Kafka
acceleration:
  refresh_mode: append

# CDC with Debezium or DynamoDB Streams
acceleration:
  refresh_mode: changes

Common Configurations

In-Memory with Interval Refresh

acceleration:
  enabled: true
  engine: arrow
  refresh_check_interval: 5m

File-Based with Append and Time Window

datasets:
  - from: postgres:events
    name: events
    time_column: created_at
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      refresh_mode: append
      refresh_check_interval: 1h
      refresh_data_window: 7d

Retention Policies

Prevent unbounded growth of accelerated datasets. Spice supports time-based and custom SQL-based retention:

Time-Based Retention

acceleration:
  enabled: true
  engine: duckdb
  retention_check_enabled: true
  retention_period: 30d
  retention_check_interval: 1h

SQL-Based Retention

acceleration:
  retention_check_enabled: true
  retention_check_interval: 1h
  retention_sql: "DELETE FROM logs WHERE status = 'archived'"

Constraints and Indexes

acceleration:
  enabled: true
  engine: duckdb
  primary_key: order_id # Creates non-null unique index
  indexes:
    customer_id: enabled # Single column index
    '(created_at, status)': unique # Multi-column unique index

Snapshots

Bootstrap file-based accelerations from S3 or filesystem snapshots on startup. Dramatically reduces cold-start latency in distributed deployments.

snapshots:
  enabled: true
  location: s3://my_bucket/snapshots/
  bootstrap_on_failure_behavior: warn # warn | retry | fallback
  params:
    s3_auth: iam_role

Per-dataset opt-in:

acceleration:
  enabled: true
  engine: duckdb
  mode: file
  snapshots:
    enabled: true

Snapshot triggers vary by refresh mode:

  • refresh_complete: After each refresh (full and batch-append modes)
  • time_interval: On a fixed schedule (all refresh modes)
  • stream_batches: After every N batches (streaming modes: Kafka, Debezium, DynamoDB Streams)

Engine-Specific Parameters

DuckDB

acceleration:
  engine: duckdb
  mode: file
  params:
    duckdb_file: ./data/cache.db

SQLite

acceleration:
  engine: sqlite
  mode: file
  params:
    sqlite_file: ./data/cache.sqlite

Memory Considerations

When using mode: memory (default), the dataset is loaded into RAM. Ensure sufficient memory including overhead for queries and the runtime. Use mode: file for duckdb, sqlite, turso, or cayenne to avoid memory pressure.

Documentation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

spice-data-connector

No summary provided by upstream source.

Repository SourceNeeds Review
General

spice-models

No summary provided by upstream source.

Repository SourceNeeds Review
General

spicepod-config

No summary provided by upstream source.

Repository SourceNeeds Review
General

spice-accelerators

No summary provided by upstream source.

Repository SourceNeeds Review