Hybrid AI Setup: Venice Cloud + Ollama Local with SSH Tunnels

You've got Hermes running on your VPS, answering messages on Telegram, managing your tasks, and generally being helpful. But here's the thing: you're paying for every API call to the cloud, and sometimes you just want to run a quick local model for sensitive data or offline work.

What if you could have both? Cloud models when you need power, local models when you want privacy or cost savings—and the ability to switch between them instantly?

⚡ The Short Version

This guide shows you how to configure Hermes with:

Venice.ai cloud models — Kimi K2.5, ZAI GLM series, Gemma variants
Ollama local models — Running on your laptop via SSH reverse tunnel
Hermes web dashboard — Access via SSH forward tunnel at localhost:9119
Seamless switching — Use /model in Telegram to change on the fly

The secret sauce? SSH tunnels that bridge your VPS and laptop in both directions.

Why Mix Cloud and Local?

Each approach has trade-offs:

Consideration	Cloud (Venice.ai)	Local (Ollama)
Cost	Per-token pricing	Free after hardware cost
Speed	Fast, GPU-backed	Depends on your GPU
Privacy	Data leaves your system	100% local, zero telemetry
Availability	Requires internet	Works offline
Model variety	Access to cutting-edge	Limited by your VRAM

The hybrid approach gives you the best of both worlds. Use cloud models for complex reasoning, coding, and when you need the latest capabilities. Drop down to local models for routine tasks, sensitive data, or when you just want zero latency.

The Architecture

Here's how everything connects:

📱 Telegram

→

🖥️ VPS (Hermes)

↙

☁️ Venice.ai API
Cloud Models

↘

💻 SSH Tunnel
Port Forward

↓

🖥️ Your Laptop
Ollama API

The key insight: Hermes on your VPS can't directly access your laptop's Ollama instance because your laptop is behind NAT/firewall. The SSH reverse tunnel fixes this by forwarding a port from your VPS back to your laptop. Meanwhile, a forward tunnel lets you access the Hermes web dashboard from your laptop as if it were running locally.

Part 1: Setting Up Venice.ai Cloud Models

Venice.ai offers several excellent models through their API. Here's how to add them to Hermes.

Step 1: Get Your API Key

Head to venice.ai and generate an API key from your settings. You'll need this for the config file.

Step 2: Configure Hermes Providers

Edit your Hermes config file (usually at ~/.hermes/config.yaml):

# ~/.hermes/config.yaml

# Default model for new conversations
default_model: venice-kimi

# Optional: Fallback if primary fails
# fallback_model:
#   provider: venice-glm-5
#   model: zai-org/glm-5

custom_providers:
  # Venice.ai models - all use the same base URL and key
  - name: venice-kimi
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: kimi-k2-5

  - name: venice-glm-5
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: zai-org/glm-5

  - name: venice-glm-flash
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: zai-org/glm-4.7-flash

  - name: venice-gemma
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: google/gemma-4-31b-it

💡 Pro Tip: Use Environment Variables

Instead of hardcoding your API key, use ${VENICE_API_KEY} and set it in your shell:

export VENICE_API_KEY="your-key-here"

Add this to your ~/.bashrc or ~/.zshrc to make it persistent.

Step 3: Test Your Cloud Models

Restart Hermes and test in Telegram:

/model venice-glm-5

You should get a confirmation message. Ask something to verify it's working.

Part 2: Setting Up Ollama on Your Laptop

Now for the fun part—getting local models accessible from your VPS.

Step 1: Install Ollama

On your laptop (not the VPS), install Ollama:

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or download from https://ollama.com/download

Step 2: Pull Some Models

# Good general-purpose model (4-bit quantized, ~5GB)
ollama pull qwen2.5:7b

# Smaller, faster model for quick tasks (~2GB)
ollama pull gemma2:2b

# Larger model for complex tasks (~9GB)
ollama pull qwen2.5:14b

⚠️ Ollama Must Listen on All Interfaces

By default, Ollama only listens on localhost. You need to configure it to accept external connections (which the SSH tunnel will use):

# macOS: Set environment variable
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

# Linux: Edit service or use:
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# Or create a systemd override:
sudo systemctl edit ollama.service

Step 3: The SSH Reverse Tunnel

This is the magic that makes everything work. From your laptop, run:

ssh -N -R 11434:localhost:11434 user@your-vps-ip

What this does:

-N — Don't execute remote commands (tunnel only)
-R 11434:localhost:11434 — Forward remote port 11434 (VPS) to local port 11434 (laptop)
user@your-vps-ip — Your VPS SSH credentials

Leave this terminal open. The tunnel is now active.

🔒 Using a Different Port?

If port 11434 is taken on your VPS, use any available port:

ssh -N -R 11435:localhost:11434 user@your-vps-ip

Just remember to update the Hermes config accordingly.

Making the Tunnel Persistent (Optional)

Using autossh for automatic reconnection:

# Install autossh
sudo apt install autossh  # Debian/Ubuntu
brew install autossh      # macOS

# Run with auto-reconnect
autossh -N -R 11434:localhost:11434 user@your-vps-ip

Or add to your SSH config (~/.ssh/config on your laptop):

Host vps-ollama
    HostName your-vps-ip
    User your-username
    RemoteForward 11434 localhost:11434
    LocalForward 9119 localhost:9119
    ServerAliveInterval 60
    ServerAliveCountMax 3

# For both OpenClaw and Hermes dashboards
Host vps-both
    HostName your-vps-ip
    User your-username
    RemoteForward 11434 localhost:11434
    LocalForward 18789 localhost:18789
    LocalForward 9119 localhost:9119
    ServerAliveInterval 60
    ServerAliveCountMax 3

Then run ssh vps-ollama or ssh vps-both to establish the tunnels.

Step 4: Accessing the Hermes Dashboard (Forward Tunnel)

The reverse tunnel lets your VPS reach Ollama on your laptop. But what if you want to access Hermes' web dashboard from your laptop? For that, you need a forward tunnel (the -L flag).

Here's the combined command to access both the Hermes dashboard and keep the Ollama tunnel active:

# Forward tunnel for Hermes dashboard + reverse tunnel for Ollama
ssh -L 9119:localhost:9119 -R 11434:localhost:11434 user@your-vps-ip

What this does:

-L 9119:localhost:9119 — Forwards your laptop's port 9119 to the VPS's port 9119 (Hermes dashboard)
-R 11434:localhost:11434 — Forwards VPS port 11434 to your laptop's Ollama (as before)

Once connected, open http://localhost:9119 on your laptop to access the Hermes web UI.

🤝 Running Both OpenClaw and Hermes?

If you run both agents on the same VPS, you can forward both dashboards in one command:

# Both dashboards + Ollama tunnel
ssh -L 18789:localhost:18789 -L 9119:localhost:9119 -R 11434:localhost:11434 user@your-vps-ip

Then access:

OpenClaw dashboard: http://localhost:18789
Hermes dashboard: http://localhost:9119
Ollama API: Available to both agents via the reverse tunnel

Part 3: Configuring Hermes for Local Models

Now add your local Ollama instance to the Hermes config:

# ~/.hermes/config.yaml

custom_providers:
  # Venice.ai models (from Part 1)
  - name: venice-kimi
    base_url: https://api.venice.ai/api/v1
    api_key: ${VENICE_API_KEY}
    model: kimi-k2-5

  # ... other Venice providers ...

  # Local Ollama via SSH tunnel
  - name: ollama-qwen
    base_url: http://localhost:11434
    api_key: "ollama"  # Ollama doesn't require auth, but needs a placeholder
    model: qwen2.5:7b

  - name: ollama-gemma
    base_url: http://localhost:11434
    api_key: "ollama"
    model: gemma2:2b

  - name: ollama-local
    base_url: http://localhost:11434
    api_key: "ollama"
    model: llama3.2:latest  # Change to whatever you have loaded

🎯 Why localhost:11434?

Because the SSH tunnel makes your laptop's Ollama appear as if it's running on the VPS itself. Hermes just sees localhost:11434 and has no idea it's actually forwarding to your laptop across the internet.

Part 4: Switching Models on the Fly

The real power comes from being able to instantly switch. In any Telegram chat with Hermes:

/model venice-kimi          # Switch to cloud Kimi K2.5
/model ollama-qwen          # Switch to local Qwen 2.5
/model venice-glm-flash     # Switch to fast cloud model
/model ollama-gemma         # Switch to local lightweight model

Each chat can have its own active model. Your coding discussion can use the heavy cloud model while your casual conversation uses a local one—without affecting each other.

Per-Chat Model Persistence

Hermes remembers which model each chat is using. Set up specialized chats:

Coding Channel → /model venice-kimi (best reasoning)
Quick Questions → /model ollama-gemma (fast, free)
Sensitive Data → /model ollama-local (100% private)

Use Cases and Recommendations

When to Use Cloud (Venice.ai)

Complex coding tasks — Cloud models have better reasoning and larger context windows
Long documents — Kimi K2.5 handles 256K tokens, local models are limited by your VRAM
Tool use — Cloud models are better at function calling and multi-step workflows
When you're away from your laptop — Tunnel needs to be active for local models

When to Use Local (Ollama)

Quick lookups — "What's the syntax for..." type questions
Private data — Financial info, personal documents, proprietary code
Offline work — On a plane, spotty internet, or data-capped connection
Cost control — Running 100+ small queries daily adds up in the cloud

My Personal Workflow

Here's what works for me:

Keep the SSH tunnel running in a tmux session on my laptop
Default to ollama-qwen for most day-to-day chatting
Switch to venice-kimi when I need serious coding help
Use venice-glm-flash when I want speed over depth
Any sensitive prompts automatically go to local models

Troubleshooting

"Connection refused" when using local models

The SSH tunnel isn't active or Ollama isn't listening on the right interface:

# Test from your VPS
curl http://localhost:11434/api/tags

# If this fails, check the tunnel is running
# Then check Ollama is listening on 0.0.0.0
netstat -tlnp | grep 11434  # Should show 0.0.0.0:11434

"Model not found" errors

Make sure you've pulled the model on your laptop:

ollama list  # See what's available
ollama pull qwen2.5:7b  # Pull if missing

Tunnel disconnects frequently

Use autossh or add keepalive settings:

ssh -N -o ServerAliveInterval=60 -o ServerAliveCountMax=3 \
    -R 11434:localhost:11434 user@your-vps-ip

Local models are too slow

You're probably CPU-bound. Check GPU usage:

# nVidia
nvidia-smi

# macOS (Apple Silicon)
# Activity Monitor → Window → GPU History

If Ollama isn't using your GPU, check the GPU support docs.

Security Considerations

🔐 Lock Down Your Setup

The SSH tunnel exposes Ollama to your VPS. Consider these precautions:

Ensure your VPS firewall only allows SSH (port 22) from trusted IPs
Use SSH key authentication, not passwords
Consider binding Ollama to localhost only and relying solely on the tunnel
Monitor your VPS with fail2ban or similar

The beauty of this approach is that even if someone compromised your VPS, they'd only see the forwarded port—not your actual laptop or home network.

Final Thoughts

The hybrid cloud/local setup gives you something no single approach can: flexibility. You're not locked into per-token pricing for every query, but you're also not limited by your laptop's hardware when you need serious compute.

The SSH tunnel is the unsung hero here. It's simple, secure, and lets you treat your laptop as an extension of your VPS. Combined with Hermes' per-chat model switching, you get a truly personalized AI experience.

Start with one cloud provider and one local model. Once that works, expand your roster. The config is just YAML—easy to tweak, version control, and share.

📚 Related Reading

Meet Hermes: Your AI Agent for Multi-Platform Collaboration — Introduction to Hermes
How to Run OpenClaw and Hermes Side-by-Side — VPS setup guide
Venice.ai — Privacy-focused AI inference platform
Ollama — Run LLMs locally