exasol-udfs

Exasol UDFs & Script Language Containers

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "exasol-udfs" with this command: npx skills add exasol-labs/exasol-agent-skills/exasol-labs-exasol-agent-skills-exasol-udfs

Exasol UDFs & Script Language Containers

Trigger when the user mentions UDF, user defined function, CREATE SCRIPT, ExaIterator, SCALAR, SET EMITS, BucketFS, script language container, SLC, exaslct, custom packages, GPU UDF, ctx.emit, ctx.next, variadic script, dynamic parameters, EMITS(...), default_output_columns, or any UDF/SLC-related topic.

When to Use UDFs

Use UDFs to extend SQL with custom logic that runs inside the Exasol cluster:

  • Per-row transforms (cleaning, parsing, hashing)

  • Custom aggregation across grouped rows

  • ML model inference (load model from BucketFS, score rows)

  • Calling external APIs from within SQL

  • Batch processing with DataFrames

SCALAR vs SET Decision Guide

SCALAR SET

Input One row at a time Group of rows (via GROUP BY)

Output RETURNS <type> (single value) EMITS (col1 TYPE, ...) (zero or more rows)

Row iteration Not needed ctx.next() loop required

SQL usage SELECT udf(col) FROM t

SELECT udf(col) FROM t GROUP BY key

Use case Per-row transforms Aggregation, ML batch predict, multi-row emit

Language Selection

Language Startup Best For Expandable via SLC?

Python 3 (3.10 or 3.12) ~200ms ML, data science, pandas, string processing Yes

Java (11 or 17) ~1s Enterprise libs, type safety, Virtual Schema adapters Yes

Lua 5.4 <10ms Low-latency transforms, row-level security No (natively compiled into Exasol)

R (4.4) ~200ms Statistical modeling, R model deployment Yes

CREATE SCRIPT Syntax

Python SCALAR

CREATE OR REPLACE PYTHON3 SCALAR SCRIPT my_schema.clean_text(input VARCHAR(10000)) RETURNS VARCHAR(10000) AS import re def run(ctx): if ctx.input is None: return None return re.sub(r'[^\w\s]', '', ctx.input).strip().lower() /

SELECT clean_text(description) FROM products;

Python SET

CREATE OR REPLACE PYTHON3 SET SCRIPT my_schema.top_n( item VARCHAR(200), score DOUBLE, n INT ) EMITS (item VARCHAR(200), score DOUBLE) AS def run(ctx): rows = [] limit = ctx.n while True: rows.append((ctx.item, ctx.score)) if not ctx.next(): break rows.sort(key=lambda x: x[1], reverse=True) for item, score in rows[:limit]: ctx.emit(item, score) /

SELECT top_n(product, revenue, 5) FROM sales GROUP BY category;

Java SCALAR

CREATE OR REPLACE JAVA SCALAR SCRIPT my_schema.hash_value(input VARCHAR(2000)) RETURNS VARCHAR(64) AS import java.security.MessageDigest;

class HASH_VALUE { static String run(ExaMetadata exa, ExaIterator ctx) throws Exception { String input = ctx.getString("input"); if (input == null) return null; MessageDigest md = MessageDigest.getInstance("SHA-256"); byte[] hash = md.digest(input.getBytes("UTF-8")); StringBuilder hex = new StringBuilder(); for (byte b : hash) hex.append(String.format("%02x", b)); return hex.toString(); } } /

Java with External JARs

CREATE OR REPLACE JAVA SCALAR SCRIPT my_schema.custom(input VARCHAR(2000)) RETURNS VARCHAR(2000) AS %scriptclass com.mycompany.MyProcessor; %jar /buckets/bfsdefault/default/jars/my-lib.jar; /

Lua SCALAR

CREATE OR REPLACE LUA SCALAR SCRIPT my_schema.my_avg(a DOUBLE, b DOUBLE) RETURNS DOUBLE AS function run(ctx) if ctx.a == nil or ctx.b == nil then return null end return (ctx.a + ctx.b) / 2 end /

R SET (ML Prediction)

CREATE OR REPLACE R SET SCRIPT my_schema.predict( feature1 DOUBLE, feature2 DOUBLE ) EMITS (prediction DOUBLE) AS run <- function(ctx) { model <- readRDS("/buckets/bfsdefault/default/models/model.rds") repeat { if (!ctx$next_row(1000)) break df <- data.frame(f1 = ctx$feature1, f2 = ctx$feature2) ctx$emit(predict(model, newdata = df)) } } /

Variadic Scripts (Dynamic Parameters)

Use ... to accept any number of input columns, output columns, or both.

Dynamic Input

CREATE OR REPLACE PYTHON3 SCALAR SCRIPT schema.to_json(...) RETURNS VARCHAR(2000000) AS import simplejson def run(ctx): obj = {} for i in range(0, exa.meta.input_column_count, 2): obj[ctx[i]] = ctx[i+1] # caller passes: name, value, name, value, ... return simplejson.dumps(obj) /

SELECT to_json('fruit', fruit, 'price', price) FROM products;

  • Access by index: ctx[i] — 0-based in Python/Java, 1-based in Lua/R

  • Parameter names inside a variadic script are always 0 , 1 , 2 , ... — never the original column names

  • exa.meta.input_column_count — total number of input columns

  • exa.meta.input_columns[i].name / .sql_type — per-column metadata

Dynamic Output (EMITS(...) )

Declare EMITS(...) in CREATE SCRIPT . At call time, columns must be provided one of two ways:

Method Where specified Use when

EMITS in SELECT Caller's SQL query Output structure depends on data values

default_output_columns()

Script body Output structure derivable from input column count/types alone

-- EMITS in SELECT (required when output depends on data content) SELECT split_csv(line) EMITS (a VARCHAR(100), b VARCHAR(100), c VARCHAR(100)) FROM t;

default_output_columns() — called before run(), no ctx/data access available

def default_output_columns(): parts = [] for i in range(exa.meta.input_column_count): parts.append("c" + exa.meta.input_columns[i].name + " " + exa.meta.input_columns[i].sql_type) return ",".join(parts)

If neither is provided, the query fails with:

The script has dynamic return arguments. Either specify the return arguments in the query via EMITS or implement the method default_output_columns in the UDF.

ExaIterator API Quick Reference

Python

Method/Property SCALAR SET Description

ctx.<column>

yes yes Access input column value

return value

yes no Return single value (RETURNS)

ctx.emit(v1, v2, ...)

no yes Emit output row (EMITS)

ctx.emit(dataframe)

no yes Emit DataFrame as rows

ctx.next()

no yes Advance to next row; returns False at end

ctx.size()

no yes Number of rows in current group

ctx.reset()

no yes Reset iterator to first row

ctx.get_dataframe(num_rows, start_col)

no yes Get rows as pandas DataFrame

Important: There is no emit_dataframe() method — use ctx.emit(dataframe) to emit a DataFrame.

Java

Method Description

ctx.getString("col")

Get string value

ctx.getInteger("col")

Get integer value

ctx.getDouble("col")

Get double value

ctx.getBigDecimal("col")

Get decimal value

ctx.getDate("col")

Get date value

ctx.getTimestamp("col")

Get timestamp value

ctx.next()

Advance to next row (SET only)

ctx.emit(v1, v2, ...)

Emit output row (SET only)

ctx.size()

Row count in group (SET only)

ctx.reset()

Reset to first row (SET only)

BucketFS File Access

All languages can read files from BucketFS at /buckets/<service>/<bucket>/<path> :

Python — load a pickled ML model

import pickle with open('/buckets/bfsdefault/default/models/model.pkl', 'rb') as f: model = pickle.load(f)

// Java — reference JARs via %jar directive %jar /buckets/bfsdefault/default/jars/my-library.jar;

Performance tip: Load models/resources once (outside the row loop or in a module-level variable), not per-row.

GPU Acceleration (Exasol 2025.2+)

Exasol supports GPU-accelerated UDFs via CUDA-enabled Script Language Containers:

  • Use template-Exasol-8-python-3.{10,12}-cuda-conda flavors

  • Requires NVIDIA driver on the Exasol host

  • Install GPU libraries (PyTorch, TensorFlow, RAPIDS) via conda in the SLC

  • Standard UDF API — no code changes needed beyond importing GPU libraries

Script Language Containers (SLC) Overview

UDFs run inside Script Language Containers — Docker-based runtime environments. The default SLC includes standard libraries. When you need additional packages (e.g., scikit-learn, PyTorch, custom JARs), build a custom SLC.

When You Need a Custom SLC

  • Installing pip/conda packages not in the default container

  • Adding system libraries (apt packages)

  • Using a different Python version (3.10 vs 3.12)

  • Enabling GPU/CUDA support

  • Adding R packages from CRAN

Quick Activation

-- Activate for current session ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON3=localzmq+protobuf:///<bfs-name>/<bucket>/<path>/<container>?lang=python#buckets/<bfs-name>/<bucket>/<path>/<container>/exaudf/exaudfclient_py3';

-- Activate system-wide (requires admin) ALTER SYSTEM SET SCRIPT_LANGUAGES='...';

Install the Build Tool

pip install exasol-script-languages-container-tool

Performance Tips

  • Load once, use many: Load models/resources outside the row loop

  • Use SET for batching: Collect rows into a list/DataFrame, process in bulk

  • Lua for low latency: Avoids JVM/Python startup overhead

  • Parallelism is automatic: UDFs run on all cluster nodes simultaneously

Detailed References

  • Python patterns — context API, DataFrame pattern, type mapping, testing: references/udf-python.md

  • Java & Lua patterns — ExaMetadata API, JARs, adapters, Lua libraries: references/udf-java-lua.md

  • Building custom SLCs — exaslct CLI, flavors, customization, deployment, troubleshooting: references/slc-reference.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

exasol-database

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

Scalekit-Agent-Auth

Use this skill whenever the user asks for information from, or wants to take an action in, a third-party tool or service. This includes — but is not limited...

Registry SourceRecently Updated
Automation

ComfyUI Controller

支持批量生成10-100个修仙视频和图片,集成LTX2多版本模型与自动化浏览器及工作流管理功能。

Registry SourceRecently Updated
Automation

Cyber Horn

Turn text into spoken Feishu (Lark) voice messages. Use when the agent should speak in a Feishu group, send voice alerts or announcements, or reply with a pl...

Registry SourceRecently Updated