TensorRT-LLM Template

Image: nvcr.io/nvidia/tritonserver:latest-trtllm Min VRAM: 24 GB | Port: 8000

NVIDIA’s optimized LLM inference with TensorRT-LLM for maximum throughput.

What’s Included

TensorRT-LLM
Triton Inference Server
Optimized for NVIDIA GPUs
INT8/FP8 quantization support

Launch

curl -X POST https://api.pulserun.dev/v1/instances \
  -H "Authorization: Bearer pr_live_xxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"gpu": "h100_80gb", "template": "tensorrt"}'

Recommended GPUs

H100 — Best performance with FP8 support
A100 80GB — Excellent for INT8 models

Ollama Custom Docker