Choosing the right models for your OpenClaw agents
AI agents are only as good as the models powering them. With OpenClaw, you can mix and match local models (Ollama), cloud APIs (Venice, OpenAI, Anthropic), and everything in between. But which models should you use? When should you go local vs cloud? And how do you balance cost, privacy, and performance?
This guide cuts through the noise with practical recommendations based on real-world usage.
Unlike a simple chatbot, an AI agent needs to:
Smaller models struggle with tool use and complex reasoning. Larger models are expensive. The trick is matching the model to the task.
| Factor | Local (Ollama) | Cloud (Venice, etc.) |
|---|---|---|
| Cost | Free (after hardware) | $0.01–$0.15 per 1M tokens |
| Privacy | 100% private — nothing leaves your machine | Data sent to API (varies by provider) |
| Speed | Depends on hardware (can be slow) | Fast (cloud GPUs) |
| Quality | Good for 7B–32B models; weaker at complex tasks | State-of-the-art (Claude, GPT-4, etc.) |
| Availability | Always on (no internet needed) | Requires internet connection |
| Hardware | Needs GPU or fast CPU + lots of RAM | No hardware requirements |
Use local models for routine tasks (chat, simple queries, formatting) and cloud models for complex work (research, coding, tool orchestration). OpenClaw makes switching seamless with aliases and fallbacks.
Ollama is the easiest way to run LLMs locally. Install it, pull a model, and you're running:
# Install (Linux/Mac)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull qwen2.5:7b
# Run it
ollama run qwen2.5:7b
| Model Size | Minimum RAM | Recommended |
|---|---|---|
| 3B–7B | 8 GB | 16 GB + GPU |
| 13B–14B | 16 GB | 32 GB + GPU |
| 27B–32B | 32 GB | 64 GB + GPU (8GB+ VRAM) |
Most VPS plans don't have GPUs. Running 32B models on CPU is slow (1–3 tokens/second). For production agents, either:
Venice.ai is a privacy-focused API provider with access to top-tier models — Claude, GPT-4, Grok, Qwen, and more — without the censorship of direct APIs.
Prices shown are input/output per 1M tokens. See Venice docs for current pricing.
| Model | Use Case | Input/Output (per 1M) |
|---|---|---|
| Claude Opus 4.6 | Complex reasoning, coding, research | $6.00 / $30.00 |
| Claude Sonnet 4.6 | Balanced — best value | $3.60 / $18.00 |
| Grok 4.1 Fast | Fast responses, general tasks | $0.23 / $0.57 |
| Qwen 3 6 Plus | Cost-effective, good quality | $0.625 / $3.75 |
| Kimi K2.5 | Long context (1M+ tokens) | $0.56 / $3.50 |
| GLM 5 | Strong reasoning, affordable | $1.00 / $3.20 |
| GLM 4.7 Flash | Fast, budget-friendly | $0.125 / $0.50 |
# Set high model as default
openclaw models set venice/zai-org-glm-5-1
# Create aliases for different quality levels
openclaw models aliases add local ollama/qwen2.5:7b
openclaw models aliases add high venice/zai-org-glm-5-1
openclaw models aliases add super venice/claude-opus-4-6
openclaw models aliases add grok venice/grok-41-fast
openclaw models aliases add med venice/kimi-k2-5
openclaw models aliases add low venice/zai-org-glm-4.7-flash
Here's how the most popular models stack up for AI agent workloads:
| Model | Reasoning | Tool Use | Speed | Cost | Best For |
|---|---|---|---|---|---|
| Claude Opus 4.6 | ★★★★★ | ★★★★★ | ★★★☆☆ | $$$$$ | Complex tasks, coding |
| Claude Sonnet 4.6 | ★★★★☆ | ★★★★★ | ★★★★☆ | $$ | General purpose |
| Grok 4.1 Fast | ★★★☆☆ | ★★★★☆ | ★★★★★ | $ | Fast responses |
| Qwen 3 6 Plus | ★★★☆☆ | ★★★☆☆ | ★★★★☆ | $ | Budget agent work |
| GLM 5 | ★★★★☆ | ★★★★☆ | ★★★★☆ | $$ | Balanced reasoning |
| Kimi K2.5 | ★★★★☆ | ★★★☆☆ | ★★★☆☆ | $$ | Long context |
| Qwen 2.5 7B (local) | ★★☆☆☆ | ★★☆☆☆ | ★★★★☆ | Free | Simple tasks |
| Qwen 2.5 32B (local) | ★★★☆☆ | ★★★☆☆ | ★★☆☆☆ | Free | Local + capable |
Use local models or cheap cloud models:
openclaw models set ollama/qwen2.5:7b
# or
openclaw models set venice/grok-41-fast
Examples: "What's the weather?", "Format this JSON", "Summarize this email"
Use Sonnet or GLM:
openclaw models set venice/claude-4.6-sonnet
Examples: "Research the best laptops for coding", "Draft a reply to this email"
Use Opus:
openclaw models set venice/claude-4.6-opus
Examples: "Debug this code and push a fix", "Plan a trip: search flights, book hotel, add to calendar"
Stay local:
openclaw models set ollama/qwen2.5:32b
Examples: Processing personal documents, financial data, medical info
OpenClaw supports automatic fallbacks — if your primary model fails (rate limit, timeout, error), it tries the next one:
# Primary: Claude Sonnet (best balance)
openclaw models set venice/claude-4.6-sonnet
# Fallback 1: Cheaper cloud option
openclaw models fallbacks add venice/qwen-3-6-plus
# Fallback 2: Local (always available)
openclaw models fallbacks add ollama/qwen2.5:7b
You can switch models mid-conversation without changing the global default:
/model high # Switch to highest quality
/model med # Balanced (medium quality)
/model low # Focus on speed/cost
/model local # Go back to local model
/model default # Reset to global default
With OpenClaw's multi-agent swarms, you can assign different models to different agents:
# Research Agent
- Name: research
- Model: venice/claude-4.6-opus
- Instructions: Deep research, always cite sources
# Personal Assistant
- Name: pa
- Model: venice/claude-4.6-sonnet
- Instructions: Handle emails, calendar, daily tasks
# Quick Chat Agent
- Name: chat
- Model: ollama/qwen2.5:7b
- Instructions: Casual conversation, simple queries
# Code Agent
- Name: coder
- Model: venice/claude-4.6-opus
- Skills: github, shell, file_editor
- Instructions: Write and review code, handle git operations
Now your research agent uses Opus for quality, your chat agent uses local for speed, and your PA uses Sonnet for balance.
Start with one model (Sonnet). Add complexity only when you hit real limitations. Most agents work fine with a single well-chosen model.
Share your model choices and why they work for you in the comments below!
Browse All Blog Posts →
💬 Comments