browser-onnx

Implements high-performance local machine learning inference in the browser using ONNX Runtime Web. Use this skill when the user needs privacy-first, low-latency, or offline AI capabilities (e.g., image classification, object detection, or NLP) without server-side processing.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "browser-onnx" with this command: npx skills add thongnt0208/browser-onnx-skills/thongnt0208-browser-onnx-skills-browser-onnx

Browser-Based ONNX Inference

This skill provides a comprehensive workflow for executing ONNX models locally in the browser using ONNX Runtime Web (ORT-Web). Local inference offers significant advantages in data privacy, reduced server costs, and unlimited scalability as each user brings their own compute power.

1. Setup and Installation

Install the required library via npm:

npm install onnxruntime-web

Note: For experimental features like WebGPU or WebNN, use the nightly version onnxruntime-web@dev.

2. Global Environment Configuration

Set global ort.env flags before creating a session to optimize the runtime environment.

  • WebAssembly (CPU): Enable multi-threading by setting ort.env.wasm.numThreads (default is half of hardware concurrency) and use a Proxy Worker (ort.env.wasm.proxy = true) to keep the UI responsive.
  • WASM Paths: If binaries are not in the same directory as the JS bundle, manually override paths using ort.env.wasm.wasmPaths to point to local assets or a CDN.
  • WebGPU (GPU): Use ort.env.webgpu.profiling = { mode: 'default' } for performance diagnosis during development.

3. Creating an Inference Session

Initialize the session by choosing the appropriate Execution Provider (EP):

import * as ort from 'onnxruntime-web';

const session = await ort.InferenceSession.create('./model.onnx', {
  executionProviders: ['webgpu', 'wasm'], // Prioritize GPU, fallback to CPU
  graphOptimizationLevel: 'all' // Enable all graph-level optimizations
});

4. Data Preprocessing

Input data must match the model's training format (e.g., NCHW for vision models).

  • Image-to-Tensor: Use libraries like JIMP or OpenCV.js to resize, normalize (divide by 255.0), and convert RGBA to RGB.
  • Tensor Creation: Use new ort.Tensor('float32', float32Data,) to prepare the input feeds.

5. Optimized Inference Patterns

  • Graph Capture: For models with static shapes on WebGPU, enable enableGraphCapture: true to reduce CPU overhead by replaying kernel executions.
  • IO Binding: For transformer models, keep data on the GPU by using ort.Tensor.fromGpuBuffer() and setting preferredOutputLocation: 'gpu-buffer' to avoid expensive memory copies.
  • Quantization: Prefer uint8 quantized models for CPU (WASM) inference to improve performance; avoid float16 on CPU as it lacks native support and is slow.

6. Large Model Handling (>2GB)

  • Platform Limits: Browsers like Chrome limit ArrayBuffer to ~2GB. Models exceeding this must be exported with external data.
  • Loading External Data: Explicitly link external weight files in the session options:
    const session = await ort.InferenceSession.create(modelUrl, {
      externalData: [{ path: './model.data', data: dataUrl }]
    });
    

7. Common Edge Cases

  • Memory Management: Explicitly call tensor.dispose() for GPU tensors to prevent memory leaks.
  • Zero-Sized Tensors: ORT-Web treats tensors with a dimension of 0 as CPU tensors regardless of the selected EP.
  • Thermal Throttling: Sustained inference on mobile devices may trigger frequency scaling, doubling latency. Use lightweight "tiny" models to maintain thermal equilibrium.

8. Examples

Multilingual Translation

Offload heavy translation tasks to a separate Web Worker using a singleton pattern to ensure the model (e.g., NLLB-200) loads only once.

Object Detection (YOLO)

Implement Non-Max Suppression (NMS). If the browser lacks support for specific NMS ops, run a separate NMS ONNX model to filter overlapping boxes locally.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ll-feishu-audio

飞书语音交互技能。支持语音消息自动识别、AI 处理、语音回复全流程。需要配置 FEISHU_APP_ID 和 FEISHU_APP_SECRET 环境变量。使用 faster-whisper 进行语音识别,Edge TTS 进行语音合成,自动转换 OPUS 格式并通过飞书发送。适用于飞书平台的语音对话场景。

Archived SourceRecently Updated
General

test_skill

import json import tkinter as tk from tkinter import messagebox, simpledialog

Archived SourceRecently Updated
General

51mee-resume-profile

简历画像。触发场景:用户要求生成候选人画像;用户想了解候选人的多维度标签和能力评估。

Archived SourceRecently Updated
General

51mee-resume-parse

简历解析。触发场景:用户上传简历文件要求解析、提取结构化信息。

Archived SourceRecently Updated