TensorRT-LLM Template
Image: nvcr.io/nvidia/tritonserver:latest-trtllm
Min VRAM: 24 GB | Port: 8000
NVIDIA’s optimized LLM inference with TensorRT-LLM for maximum throughput.
What’s Included
- TensorRT-LLM
- Triton Inference Server
- Optimized for NVIDIA GPUs
- INT8/FP8 quantization support
Launch
curl -X POST https://api.pulserun.dev/v1/instances \
-H "Authorization: Bearer pr_live_xxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{"gpu": "h100_80gb", "template": "tensorrt"}'Recommended GPUs
- H100 — Best performance with FP8 support
- A100 80GB — Excellent for INT8 models