You've got Hermes running on your VPS, answering messages on Telegram, managing your tasks, and generally being helpful. But here's the thing: you're paying for every API call to the cloud, and sometimes you just want to run a quick local model for sensitive data or offline work.

What if you could have both? Cloud models when you need power, local models when you want privacy or cost savingsโ€”and the ability to switch between them instantly?

โšก The Short Version

This guide shows you how to configure Hermes with:

  • Venice.ai cloud models โ€” Kimi K2.5, ZAI GLM series, Gemma variants
  • Ollama local models โ€” Running on your laptop via SSH reverse tunnel
  • Hermes web dashboard โ€” Access via SSH forward tunnel at localhost:9119
  • Seamless switching โ€” Use /model in Telegram to change on the fly

The secret sauce? SSH tunnels that bridge your VPS and laptop in both directions.

Why Mix Cloud and Local?

Each approach has trade-offs:

Consideration Cloud (Venice.ai) Local (Ollama)
Cost Per-token pricing Free after hardware cost
Speed Fast, GPU-backed Depends on your GPU
Privacy Data leaves your system 100% local, zero telemetry
Availability Requires internet Works offline
Model variety Access to cutting-edge Limited by your VRAM

The hybrid approach gives you the best of both worlds. Use cloud models for complex reasoning, coding, and when you need the latest capabilities. Drop down to local models for routine tasks, sensitive data, or when you just want zero latency.

The Architecture

Here's how everything connects:

๐Ÿ“ฑ Telegram
โ†’
๐Ÿ–ฅ๏ธ VPS (Hermes)
โ†™
โ˜๏ธ Venice.ai API
Cloud Models
โ†˜
๐Ÿ’ป SSH Tunnel
Port Forward
โ†“
๐Ÿ–ฅ๏ธ Your Laptop
Ollama API

The key insight: Hermes on your VPS can't directly access your laptop's Ollama instance because your laptop is behind NAT/firewall. The SSH reverse tunnel fixes this by forwarding a port from your VPS back to your laptop. Meanwhile, a forward tunnel lets you access the Hermes web dashboard from your laptop as if it were running locally.

Part 1: Setting Up Venice.ai Cloud Models

Venice.ai offers several excellent models through their API. Here's how to add them to Hermes.

Step 1: Get Your API Key

Head to venice.ai and generate an API key from your settings. You'll need this for the config file.

Step 2: Configure Hermes Providers

Edit your Hermes config file (usually at ~/.hermes/config.yaml):

# ~/.hermes/config.yaml

# Default model for new conversations
default_model: venice-kimi

# Optional: Fallback if primary fails
# fallback_model:
#   provider: venice-glm-5
#   model: zai-org/glm-5

custom_providers:
  # Venice.ai models - all use the same base URL and key
  - name: venice-kimi
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: kimi-k2-5

  - name: venice-glm-5
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: zai-org/glm-5

  - name: venice-glm-flash
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: zai-org/glm-4.7-flash

  - name: venice-gemma
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: google/gemma-4-31b-it

๐Ÿ’ก Pro Tip: Use Environment Variables

Instead of hardcoding your API key, use ${VENICE_API_KEY} and set it in your shell:

export VENICE_API_KEY="your-key-here"

Add this to your ~/.bashrc or ~/.zshrc to make it persistent.

Step 3: Test Your Cloud Models

Restart Hermes and test in Telegram:

/model venice-glm-5

You should get a confirmation message. Ask something to verify it's working.

Part 2: Setting Up Ollama on Your Laptop

Now for the fun partโ€”getting local models accessible from your VPS.

Step 1: Install Ollama

On your laptop (not the VPS), install Ollama:

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or download from https://ollama.com/download

Step 2: Pull Some Models

# Good general-purpose model (4-bit quantized, ~5GB)
ollama pull qwen2.5:7b

# Smaller, faster model for quick tasks (~2GB)
ollama pull gemma2:2b

# Larger model for complex tasks (~9GB)
ollama pull qwen2.5:14b

โš ๏ธ Ollama Must Listen on All Interfaces

By default, Ollama only listens on localhost. You need to configure it to accept external connections (which the SSH tunnel will use):

# macOS: Set environment variable
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

# Linux: Edit service or use:
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# Or create a systemd override:
sudo systemctl edit ollama.service

Step 3: The SSH Reverse Tunnel

This is the magic that makes everything work. From your laptop, run:

ssh -N -R 11434:localhost:11434 user@your-vps-ip

What this does:

Leave this terminal open. The tunnel is now active.

๐Ÿ”’ Using a Different Port?

If port 11434 is taken on your VPS, use any available port:

ssh -N -R 11435:localhost:11434 user@your-vps-ip

Just remember to update the Hermes config accordingly.

Making the Tunnel Persistent (Optional)

Using autossh for automatic reconnection:

# Install autossh
sudo apt install autossh  # Debian/Ubuntu
brew install autossh      # macOS

# Run with auto-reconnect
autossh -N -R 11434:localhost:11434 user@your-vps-ip

Or add to your SSH config (~/.ssh/config on your laptop):

Host vps-ollama
    HostName your-vps-ip
    User your-username
    RemoteForward 11434 localhost:11434
    LocalForward 9119 localhost:9119
    ServerAliveInterval 60
    ServerAliveCountMax 3

# For both OpenClaw and Hermes dashboards
Host vps-both
    HostName your-vps-ip
    User your-username
    RemoteForward 11434 localhost:11434
    LocalForward 18789 localhost:18789
    LocalForward 9119 localhost:9119
    ServerAliveInterval 60
    ServerAliveCountMax 3

Then run ssh vps-ollama or ssh vps-both to establish the tunnels.

Step 4: Accessing the Hermes Dashboard (Forward Tunnel)

The reverse tunnel lets your VPS reach Ollama on your laptop. But what if you want to access Hermes' web dashboard from your laptop? For that, you need a forward tunnel (the -L flag).

Here's the combined command to access both the Hermes dashboard and keep the Ollama tunnel active:

# Forward tunnel for Hermes dashboard + reverse tunnel for Ollama
ssh -L 9119:localhost:9119 -R 11434:localhost:11434 user@your-vps-ip

What this does:

Once connected, open http://localhost:9119 on your laptop to access the Hermes web UI.

๐Ÿค Running Both OpenClaw and Hermes?

If you run both agents on the same VPS, you can forward both dashboards in one command:

# Both dashboards + Ollama tunnel
ssh -L 18789:localhost:18789 -L 9119:localhost:9119 -R 11434:localhost:11434 user@your-vps-ip

Then access:

  • OpenClaw dashboard: http://localhost:18789
  • Hermes dashboard: http://localhost:9119
  • Ollama API: Available to both agents via the reverse tunnel

Part 3: Configuring Hermes for Local Models

Now add your local Ollama instance to the Hermes config:

# ~/.hermes/config.yaml

custom_providers:
  # Venice.ai models (from Part 1)
  - name: venice-kimi
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: kimi-k2-5

  # ... other Venice providers ...

  # Local Ollama via SSH tunnel
  - name: ollama-qwen
    base_url: http://localhost:11434
    api_key: "ollama"  # Ollama doesn't require auth, but needs a placeholder
    model: qwen2.5:7b

  - name: ollama-gemma
    base_url: http://localhost:11434
    api_key: "ollama"
    model: gemma2:2b

  - name: ollama-local
    base_url: http://localhost:11434
    api_key: "ollama"
    model: llama3.2:latest  # Change to whatever you have loaded

๐ŸŽฏ Why localhost:11434?

Because the SSH tunnel makes your laptop's Ollama appear as if it's running on the VPS itself. Hermes just sees localhost:11434 and has no idea it's actually forwarding to your laptop across the internet.

Part 4: Switching Models on the Fly

The real power comes from being able to instantly switch. In any Telegram chat with Hermes:

/model venice-kimi          # Switch to cloud Kimi K2.5
/model ollama-qwen          # Switch to local Qwen 2.5
/model venice-glm-flash     # Switch to fast cloud model
/model ollama-gemma         # Switch to local lightweight model

Each chat can have its own active model. Your coding discussion can use the heavy cloud model while your casual conversation uses a local oneโ€”without affecting each other.

Per-Chat Model Persistence

Hermes remembers which model each chat is using. Set up specialized chats:

Use Cases and Recommendations

When to Use Cloud (Venice.ai)

When to Use Local (Ollama)

My Personal Workflow

Here's what works for me:

  1. Keep the SSH tunnel running in a tmux session on my laptop
  2. Default to ollama-qwen for most day-to-day chatting
  3. Switch to venice-kimi when I need serious coding help
  4. Use venice-glm-flash when I want speed over depth
  5. Any sensitive prompts automatically go to local models

Troubleshooting

"Connection refused" when using local models

The SSH tunnel isn't active or Ollama isn't listening on the right interface:

# Test from your VPS
curl http://localhost:11434/api/tags

# If this fails, check the tunnel is running
# Then check Ollama is listening on 0.0.0.0
netstat -tlnp | grep 11434  # Should show 0.0.0.0:11434

"Model not found" errors

Make sure you've pulled the model on your laptop:

ollama list  # See what's available
ollama pull qwen2.5:7b  # Pull if missing

Tunnel disconnects frequently

Use autossh or add keepalive settings:

ssh -N -o ServerAliveInterval=60 -o ServerAliveCountMax=3 \
    -R 11434:localhost:11434 user@your-vps-ip

Local models are too slow

You're probably CPU-bound. Check GPU usage:

# nVidia
nvidia-smi

# macOS (Apple Silicon)
# Activity Monitor โ†’ Window โ†’ GPU History

If Ollama isn't using your GPU, check the GPU support docs.

Security Considerations

๐Ÿ” Lock Down Your Setup

The SSH tunnel exposes Ollama to your VPS. Consider these precautions:

  • Ensure your VPS firewall only allows SSH (port 22) from trusted IPs
  • Use SSH key authentication, not passwords
  • Consider binding Ollama to localhost only and relying solely on the tunnel
  • Monitor your VPS with fail2ban or similar

The beauty of this approach is that even if someone compromised your VPS, they'd only see the forwarded portโ€”not your actual laptop or home network.

Final Thoughts

The hybrid cloud/local setup gives you something no single approach can: flexibility. You're not locked into per-token pricing for every query, but you're also not limited by your laptop's hardware when you need serious compute.

The SSH tunnel is the unsung hero here. It's simple, secure, and lets you treat your laptop as an extension of your VPS. Combined with Hermes' per-chat model switching, you get a truly personalized AI experience.

Start with one cloud provider and one local model. Once that works, expand your roster. The config is just YAMLโ€”easy to tweak, version control, and share.

๐Ÿ“š Related Reading