nebius-datalab-pipeline

Full DataLab end-to-end pipeline on Nebius Token Factory — from raw inference logs through synthetic data generation, fine-tuning, and model deployment. Use this skill whenever the user wants to run a complete MLOps workflow on Nebius, combine DataLab with fine-tuning, do teacher-student distillation at scale, build a data flywheel, automate the path from prompts to a deployed custom model, or orchestrate multiple Nebius services together. Trigger for phrases like "full Nebius pipeline", "DataLab workflow", "end-to-end fine-tuning on Nebius", "teacher distillation pipeline", "data flywheel", "automate Nebius training and deployment", or any question involving multiple Nebius services chained together.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "nebius-datalab-pipeline" with this command: npx skills add arindam200/nebius-skills/arindam200-nebius-skills-nebius-datalab-pipeline

Nebius DataLab — End-to-End Pipeline

Orchestrate the full MLOps loop: inference logs → batch synthesis → curate → fine-tune → deploy → serve.

What DataLab stores

Data typeSource
Inference LogsChat completions via API/Playground (unless Zero Data Retention)
Filtered DatasetsSQL queries over inference logs
Uploaded DatasetsManual upload
Batch Inference OutputsResults from batch jobs
Fine-tuning OutputsCheckpoints + fine-tuned model artifacts

Prerequisites

pip install openai requests
export NEBIUS_API_KEY="your-key"

The 8-step pipeline

For the full working implementation, see scripts/06_datalab_e2e_workflow.py.


Step 1 — Generate inference logs

Run live chat completions against your domain topics. These are automatically stored as inference logs in DataLab.

from openai import OpenAI
client = OpenAI(base_url="https://api.tokenfactory.nebius.com/v1/", api_key=API_KEY)

for topic in domain_topics:
    resp = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct",
        messages=[{"role": "user", "content": topic}],
    )
    # logged automatically in DataLab

Step 2 — Upload raw dataset to DataLab

Upload JSONL as a fine-tune file so it's accessible in DataLab for filtering/reuse.

with open("raw_dataset.jsonl", "rb") as f:
    client.files.create(file=f, purpose="fine-tune")

Step 3 — Batch inference with teacher model

Use a large teacher model to generate high-quality responses. See nebius-batch-synthetic skill for the full batch API.

# Build JSONL with teacher model (e.g., 70B)
# Upload + create batch → poll → download outputs

Key: use temperature: 0.6 and large max_tokens for rich training signal.


Step 4 — Curate outputs

Filter by quality (min length, confidence) and convert to fine-tuning conversational format:

for rec in batch_results:
    reply  = rec["response"]["body"]["choices"][0]["message"]["content"].strip()
    if len(reply) < 50:
        continue   # skip low-quality
    training_examples.append({
        "messages": [
            {"role": "user",      "content": original_prompt},
            {"role": "assistant", "content": reply},
        ]
    })

Step 5 — Upload curated training file

with open("curated_training.jsonl", "rb") as f:
    training_file = client.files.create(file=f, purpose="fine-tune")

Step 6 — Launch fine-tuning

See nebius-finetune skill for full details.

job = client.fine_tuning.jobs.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct",
    training_file=training_file.id,
    hyperparameters={"n_epochs": 2, "lora_rank": 16},
)
# poll until job.status == "succeeded"

Step 7 — Deploy fine-tuned LoRA

See nebius-deploy-lora skill for full details.

import requests

checkpoints = client.fine_tuning.jobs.checkpoints.list(job.id).data
ckpt_id = checkpoints[-1].id

requests.post("https://api.tokenfactory.nebius.com/v0/models", json={
    "source":     f"{job.id}:{ckpt_id}",
    "base_model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "name":       "my-domain-model-v1",
}, headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"})
# poll until status == "active"

Step 8 — Smoke test

resp = client.chat.completions.create(
    model=deployed_model_name,
    messages=[{"role": "user", "content": "Test your domain knowledge..."}],
)
print(resp.choices[0].message.content)

Skill cross-references

This pipeline combines all Nebius skills. For details on each step:

StepSkill
Batch inferencenebius-batch-synthetic
Fine-tuningnebius-finetune
Deploy LoRAnebius-deploy-lora
Monitor inferencenebius-observability
Dedicated endpointnebius-dedicated-endpoint

Bundled reference

Read references/datalab-overview.md when the user asks about DataLab data types, SQL filtering, or the relationship between inference logs and fine-tuning.

Reference script

Full 8-step orchestrated pipeline: scripts/06_datalab_e2e_workflow.py

Docs:

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

nebius-batch-synthetic

No summary provided by upstream source.

Repository SourceNeeds Review
Web3

Neko Futures Trader

Automated Binance Futures trading scanner with runner detection and price monitor. Features: - Runner detection (volume spike + momentum + breakout) - Real c...

Registry SourceRecently Updated
Web3

Hive Intelligence Publisher

Publish DAHR-attested intelligence to the SuperColony collective hive. Use when you want to contribute observations, analyses, or predictions on-chain, build...

Registry SourceRecently Updated
Web3

Crypto Price Alerts

Set and manage price alerts for any cryptocurrency pair and receive instant notifications when targets are reached.

Registry SourceRecently Updated