Run Local AI Models
Run OpenClaw (formerly Moltbot) completely free with local AI models. Use Ollama, LM Studio, or any OpenAI-compatible server.
- +Completely Free - No API costs, no subscription fees, forever
- +Total Privacy - All data stays on your machine, never leaves
- +Works Offline - Use OpenClaw without internet after setup
- +No Rate Limits - Send as many messages as your hardware allows
- +Model Choice - Run any open model, switch anytime
Local AI Runners
Default port: 11434
Default port: 1234
Default port: 8080
llama.cpp server
Direct llama.cpp HTTP server. Maximum control and performance.
https://github.com/ggerganov/llama.cppDefault port: 8080
Default port: 8000
Recommended Models
Llama 3.2 3B
llama3.2:3bRAM: 4 GB
Quality: Good for simple tasks
Speed: Very Fast
Llama 3.1 8B
Recommendedllama3.1:8bRAM: 8 GB
Quality: Best balance
Speed: Fast
Mistral 7B
mistral:7bRAM: 8 GB
Quality: Great for coding
Speed: Fast
Qwen2.5 14B
qwen2.5:14bRAM: 16 GB
Quality: Strong reasoning
Speed: Medium
Llama 3.1 70B
llama3.1:70bRAM: 48 GB
Quality: Near GPT-4
Speed: Slow
Quick Setup with Ollama
Install Ollama
curl -fsSL https://ollama.ai/install.sh | shFor Windows/Mac GUI: Download from ollama.ai
Download a Model
ollama pull llama3.1:8bConfigure OpenClaw
{
"agent": {
"provider": "ollama",
"model": "llama3.1:8b",
"baseUrl": "http://localhost:11434"
}
}LM Studio provides an OpenAI-compatible API:
{
"agent": {
"provider": "openai-compatible",
"model": "local-model",
"baseUrl": "http://localhost:1234/v1",
"apiKey": "not-needed"
}
}Start LM Studio, load a model, and enable the server in Settings > Local Server.
Configure multiple local models with automatic failover:
{
"agent": {
"provider": "ollama",
"model": "llama3.1:70b",
"baseUrl": "http://localhost:11434",
"fallbackModels": [
"llama3.1:8b",
"mistral:7b"
],
"fallbackOnError": true,
"fallbackOnTimeout": true,
"timeout": 30000
}
}fallbackModels- List of backup models to tryfallbackOnError- Switch on model errorsfallbackOnTimeout- Switch if model is too slowtimeout- Timeout in milliseconds before failover
Use local models primarily, fall back to cloud for complex tasks:
{
"agent": {
"provider": "ollama",
"model": "llama3.1:8b",
"baseUrl": "http://localhost:11434",
"cloudFallback": {
"provider": "anthropic",
"model": "claude-3-5-sonnet-20241022",
"apiKey": "sk-ant-...",
"triggerOn": ["complex_reasoning", "long_context"]
}
}
}Stay mostly free with local AI, use cloud only when needed.
| Model Size | RAM Needed | Best For |
|---|---|---|
| 3B models | 4 GB | Basic tasks, older hardware |
| 7-8B models | 8 GB | Most users, good balance |
| 13-14B models | 16 GB | Better quality, modern laptops |
| 70B models | 48+ GB | Near cloud quality, high-end |
Apple Silicon Macs are particularly efficient due to unified memory.
For detailed local model configuration, see the OpenClaw Local Models Documentation.