spice-connect-data

Connect Spice to data sources and query across them with federated SQL. Use when connecting to databases (Postgres, MySQL, DynamoDB), data lakes (S3, Delta Lake, Iceberg), warehouses (Snowflake, Databricks), files, APIs, or catalogs; configuring datasets; creating views; writing data; or setting up cross-source queries.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "spice-connect-data" with this command: npx skills add spiceai/skills/spiceai-skills-spice-connect-data

Connect to Data Sources

Spice federates SQL queries across 30+ data sources without ETL. Connect databases, data lakes, warehouses, and APIs, then query across them with standard SQL.

How Federation Works

Configure datasets pointing to different sources. Spice's query planner (built on Apache DataFusion) optimizes and routes queries with filter pushdown and column projection:

datasets:
  - from: postgres:customers
    name: customers
    params:
      pg_host: db.example.com
      pg_user: ${secrets:PG_USER}
  - from: s3://bucket/orders/
    name: orders
    params:
      file_format: parquet
  - from: snowflake:analytics.sales
    name: sales
-- Query across all three sources in one statement
SELECT c.name, o.order_total, s.region
FROM customers c
  JOIN orders o ON c.id = o.customer_id
  JOIN sales s ON o.id = s.order_id
WHERE s.region = 'EMEA';

Dataset Configuration

datasets:
  - from: <connector>:<identifier>
    name: <dataset_name>
    params:
      # connector-specific parameters
    acceleration:
      enabled: true # optional: materialize locally (see spice-acceleration)

Supported Connectors

Databases

ConnectorFrom FormatStatus
PostgreSQLpostgres:schema.tableStable (also Amazon Redshift)
MySQLmysql:schema.tableStable
DuckDBduckdb:database.tableStable
MS SQL Servermssql:db.tableBeta
DynamoDBdynamodb:tableRelease Candidate
MongoDBmongodb:collectionAlpha
ClickHouseclickhouse:db.tableAlpha

Data Warehouses

ConnectorFrom FormatStatus
Snowflakesnowflake:db.schema.tableBeta
Databricks (Delta Lake)databricks:catalog.schema.tableStable
Sparkspark:db.tableBeta

Data Lakes & Object Storage

ConnectorFrom FormatStatus
S3s3://bucket/path/Stable
Delta Lakedelta_lake:/path/to/delta/Stable
Icebergiceberg:tableBeta
Azure BlobFSabfs://container/path/Alpha
File (local)file:./path/to/dataStable

Other Sources

ConnectorFrom FormatStatus
Spice.aispice.ai:path/to/datasetStable
Dremiodremio:source.tableStable
GitHubgithub:github.com/owner/repo/issuesStable
GraphQLgraphql:endpointRelease Candidate
FlightSQLflightsql:queryBeta
ODBCodbc:connectionBeta
FTP/SFTPsftp://host/path/Alpha
HTTP/HTTPShttps://url/path/data.csvAlpha
Kafkakafka:topicAlpha
Debezium CDCdebezium:topicAlpha
SharePointsharepoint:site/pathAlpha
IMAPimap:mailboxAlpha

Common Examples

PostgreSQL

datasets:
  - from: postgres:public.users
    name: users
    params:
      pg_host: localhost
      pg_port: 5432
      pg_user: ${ env:PG_USER }
      pg_pass: ${ env:PG_PASS }
    acceleration:
      enabled: true

S3 with Parquet

datasets:
  - from: s3://my-bucket/data/sales/
    name: sales
    params:
      file_format: parquet
      s3_region: us-east-1
    acceleration:
      enabled: true
      engine: duckdb

GitHub Issues

datasets:
  - from: github:github.com/spiceai/spiceai/issues
    name: spiceai.issues
    params:
      github_token: ${ secrets:GITHUB_TOKEN }
    acceleration:
      enabled: true
      refresh_mode: append
      refresh_check_interval: 24h
      refresh_data_window: 14d

Local File

datasets:
  - from: file:./data/sales.parquet
    name: sales

File Formats

Connectors reading from object stores (S3, ABFS) or network storage (FTP, SFTP) support:

Formatfile_formatType
Apache ParquetparquetStructured
CSVcsvStructured
MarkdownmdDocument
TexttxtDocument
PDFpdfDocument
Microsoft WorddocxDocument

Document files produce a table with location and content columns:

datasets:
  - from: file:docs/decisions/
    name: my_documents
    params:
      file_format: md

Hive Partitioning

datasets:
  - from: s3://bucket/data/
    name: partitioned_data
    params:
      file_format: parquet
      hive_partitioning_enabled: true
SELECT * FROM partitioned_data WHERE year = '2024' AND month = '01';

Dataset Naming

  • name: foospice.public.foo
  • name: myschema.foospice.myschema.foo
  • Use . to organize datasets into schemas

Catalogs

Catalog connectors expose external data catalogs, preserving the source schema hierarchy. Tables are accessed as <catalog>.<schema>.<table>.

Note: Acceleration is not supported for catalog tables. Use datasets for accelerated access.

catalogs:
  - from: <connector>
    name: <catalog_name>
    params:
      # connector-specific parameters
    include:
      - 'schema.*' # optional: filter with glob patterns

Supported Catalogs

ConnectorFrom ValueStatus
Unity Catalogunity_catalogStable
DatabricksdatabricksBeta
IcebergicebergBeta
Spice.aispice.aiBeta
AWS GlueglueAlpha

Catalog Example

catalogs:
  - from: unity_catalog
    name: unity
    params:
      unity_catalog_endpoint: https://my-workspace.cloud.databricks.com
      databricks_token: ${ secrets:DATABRICKS_TOKEN }
    include:
      - 'my_schema.*'
SELECT * FROM unity.my_schema.customers LIMIT 10;

Views

Views are virtual tables defined by SQL queries — useful for pre-aggregations, transformations, and simplified access:

views:
  - name: daily_sales
    sql: |
      SELECT DATE(created_at) as date, SUM(amount) as total, COUNT(*) as orders
      FROM orders
      GROUP BY DATE(created_at)

  - name: order_details
    sql: |
      SELECT o.id, c.name as customer, p.name as product, o.quantity
      FROM orders o
      JOIN customers c ON o.customer_id = c.id
      JOIN products p ON o.product_id = p.id

Views can be accelerated:

views:
  - name: rankings
    sql: |
      SELECT product_id, SUM(quantity) as total_sold
      FROM orders GROUP BY product_id ORDER BY total_sold DESC LIMIT 100
    acceleration:
      enabled: true
      refresh_check_interval: 1h

Views are read-only and queried like regular tables: SELECT * FROM daily_sales.

Writing Data

Spice supports writing to Apache Iceberg tables and Amazon S3 Tables via INSERT INTO:

datasets:
  - from: iceberg:https://catalog.example.com/v1/namespaces/sales/tables/transactions
    name: transactions
    access: read_write # required for writes
INSERT INTO transactions SELECT * FROM staging_transactions;

Referencing Secrets

Use ${ store_name:KEY } syntax in params. See spice-secrets for full configuration:

params:
  pg_user: ${ env:PG_USER }
  pg_pass: ${ secrets:PG_PASSWORD }

Documentation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

spice-data-connector

No summary provided by upstream source.

Repository SourceNeeds Review
General

spice-models

No summary provided by upstream source.

Repository SourceNeeds Review
General

spicepod-config

No summary provided by upstream source.

Repository SourceNeeds Review
General

spice-accelerators

No summary provided by upstream source.

Repository SourceNeeds Review