spice-data-connector

Connect Spice to data sources like PostgreSQL, MySQL, S3, Databricks, Snowflake, DuckDB, GitHub, and more. Use when asked to "add a dataset", "connect to a database", "load data from S3", "configure a data source", "read files", "query external data", or "set up federated queries".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "spice-data-connector" with this command: npx skills add spiceai/skills/spiceai-skills-spice-data-connector

Spice Data Connectors

Data Connectors enable federated SQL queries across databases, data warehouses, data lakes, and files. Spice connects directly to your existing data sources and provides a unified SQL interface — no ETL pipelines required. The query planner (built on Apache DataFusion) optimizes and routes queries, including filter pushdown and column projection.

Cross-Source Federation

Query across multiple heterogeneous sources in one SQL statement:

datasets:
  - from: postgres:customers
    name: customers
    params:
      pg_host: db.example.com
      pg_user: ${secrets:PG_USER}
  - from: s3://bucket/orders/
    name: orders
    params:
      file_format: parquet
  - from: snowflake:analytics.sales
    name: sales
-- Query across all three sources in one statement
SELECT c.name, o.order_total, s.region
FROM customers c
  JOIN orders o ON c.id = o.customer_id
  JOIN sales s ON o.id = s.order_id
WHERE s.region = 'EMEA';

Without acceleration, each query fetches data directly from the underlying sources with optimized filter pushdown.

Basic Dataset Configuration

datasets:
  - from: <connector>:<identifier>
    name: <dataset_name>
    params:
      # connector-specific parameters
    acceleration:
      enabled: true # optional: enable local materialization

Supported Connectors

Databases

ConnectorFrom FormatStatus
PostgreSQLpostgres:schema.tableStable (also Amazon Redshift)
MySQLmysql:schema.tableStable
DuckDBduckdb:database.tableStable
MS SQL Servermssql:db.tableBeta
MongoDBmongodb:collectionAlpha
ClickHouseclickhouse:db.tableAlpha
DynamoDBdynamodb:tableRelease Candidate

Data Warehouses

ConnectorFrom FormatStatus
Snowflakesnowflake:db.schema.tableBeta
Databricks (Delta Lake)databricks:catalog.schema.tableStable
Sparkspark:db.tableBeta

Data Lakes & Object Storage

ConnectorFrom FormatStatus
S3s3://bucket/path/Stable
Delta Lakedelta_lake:/path/to/delta/Stable
Icebergiceberg:tableBeta
Azure BlobFSabfs://container/path/Alpha
File (local)file:./path/to/dataStable

Other Sources

ConnectorFrom FormatStatus
Spice.aispice.ai:path/to/datasetStable
Dremiodremio:source.tableStable
GitHubgithub:github.com/owner/repo/issuesStable
GraphQLgraphql:endpointRelease Candidate
FlightSQLflightsql:queryBeta
ODBCodbc:connectionBeta
FTP/SFTPsftp://host/path/Alpha
HTTP/HTTPShttps://url/path/data.csvAlpha
Kafkakafka:topicAlpha
Debezium CDCdebezium:topicAlpha
SharePointsharepoint:site/pathAlpha
IMAPimap:mailboxAlpha

Common Examples

PostgreSQL

datasets:
  - from: postgres:public.users
    name: users
    params:
      pg_host: localhost
      pg_port: 5432
      pg_user: ${ env:PG_USER }
      pg_pass: ${ env:PG_PASS }
    acceleration:
      enabled: true

S3 with Parquet

datasets:
  - from: s3://my-bucket/data/sales/
    name: sales
    params:
      file_format: parquet
      s3_region: us-east-1
    acceleration:
      enabled: true
      engine: duckdb

GitHub Issues

datasets:
  - from: github:github.com/spiceai/spiceai/issues
    name: spiceai.issues
    params:
      github_token: ${ secrets:GITHUB_TOKEN }
    acceleration:
      enabled: true
      refresh_mode: append
      refresh_check_interval: 24h
      refresh_data_window: 14d

Local File

datasets:
  - from: file:./data/sales.parquet
    name: sales

File Formats

Connectors reading from object stores (S3, ABFS, GCS) or network storage (FTP, SFTP) support:

Formatfile_formatStatusType
Apache ParquetparquetStableStructured
CSVcsvStableStructured
MarkdownmdStableDocument
TexttxtStableDocument
PDFpdfAlphaDocument
Microsoft WorddocxAlphaDocument

Document Formats

Document files (md, txt, pdf, docx) produce a table with location and content columns:

datasets:
  - from: file:docs/decisions/
    name: my_documents
    params:
      file_format: md
SELECT location, content FROM my_documents LIMIT 5;

Hive Partitioning

datasets:
  - from: s3://bucket/data/
    name: partitioned_data
    params:
      file_format: parquet
      hive_partitioning_enabled: true
SELECT * FROM partitioned_data WHERE year = '2024' AND month = '01';

Dataset Naming

  • name: foo creates spice.public.foo
  • name: myschema.foo creates spice.myschema.foo
  • Use . to organize datasets into schemas

Documentation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

spice-cli

No summary provided by upstream source.

Repository SourceNeeds Review
General

spice-models

No summary provided by upstream source.

Repository SourceNeeds Review
General

spicepod-config

No summary provided by upstream source.

Repository SourceNeeds Review
General

spice-accelerators

No summary provided by upstream source.

Repository SourceNeeds Review