triton_inference_server

Servidor de modelos production-grade con soporte TensorRT, ONNX y PyTorch en GPU

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "triton_inference_server" with this command: npx skills add davidcastagnetoa/skills/davidcastagnetoa-skills-triton-inference-server

triton_inference_server

NVIDIA Triton Inference Server centraliza el serving de todos los modelos ML del pipeline con optimización GPU, dynamic batching y múltiples frameworks soportados simultáneamente.

When to use

Usar para servir en producción todos los modelos ML: MiniFASNet, ArcFace, YOLOv8, PaddleOCR, FaceForensics classifier.

Instructions

  1. Lanzar con Docker: docker run --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /models:/models nvcr.io/nvidia/tritonserver:23.10-py3.
  2. Estructurar repositorio de modelos: models/{model_name}/{version}/model.onnx + config.pbtxt.
  3. Configurar config.pbtxt para cada modelo: input/output shapes, instance groups (GPU/CPU), dynamic batching.
  4. Exportar modelos a ONNX antes de desplegar: torch.onnx.export(...).
  5. Aplicar TensorRT optimization donde sea posible (ver skill tensorrt).
  6. Usar el cliente gRPC para inferencia: pip install tritonclient[grpc].
  7. Health check: GET http://triton:8000/v2/health/ready.

Notes

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

traefik

No summary provided by upstream source.

Repository SourceNeeds Review
General

c4_model_structurizr

No summary provided by upstream source.

Repository SourceNeeds Review
General

fastapi

No summary provided by upstream source.

Repository SourceNeeds Review
General

exif_metadata_analyzer

No summary provided by upstream source.

Repository SourceNeeds Review