Ollama Template

Image: ollama/ollama:latest Min VRAM: 8 GB | Port: 11434

Run open-source LLMs locally with a simple API.

What’s Included

Ollama runtime
Easy model pulling (ollama pull llama2)
REST API on port 11434
GPU acceleration

Launch

curl -X POST https://api.pulserun.dev/v1/instances \
  -H "Authorization: Bearer pr_live_xxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"gpu": "rtx_4090", "template": "ollama"}'

Usage

# Pull a model
curl http://<instance-ip>:11434/api/pull -d '{"name": "llama2"}'
 
# Chat
curl http://<instance-ip>:11434/api/chat -d '{
  "model": "llama2",
  "messages": [{"role": "user", "content": "Hello!"}]
}'

vLLM TensorRT-LLM