TemplatesTensorRT-LLM

TensorRT-LLM Template

Image: nvcr.io/nvidia/tritonserver:latest-trtllm Min VRAM: 24 GB | Port: 8000

NVIDIA’s optimized LLM inference with TensorRT-LLM for maximum throughput.

What’s Included

  • TensorRT-LLM
  • Triton Inference Server
  • Optimized for NVIDIA GPUs
  • INT8/FP8 quantization support

Launch

curl -X POST https://api.pulserun.dev/v1/instances \
  -H "Authorization: Bearer pr_live_xxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"gpu": "h100_80gb", "template": "tensorrt"}'
  • H100 — Best performance with FP8 support
  • A100 80GB — Excellent for INT8 models