Running Ollama Locally: A Step-by-Step Guide
A complete walkthrough for installing and running Ollama on your local machine or homelab, including model selection, GPU configuration, and integration with other tools.
Ollama makes running large language models locally as easy as running a Docker container. With support for dozens of models — from lightweight Phi-3 to powerful Llama 3.3 — you can have a private, cost-free AI inference server running in minutes.
Quick Setup with better-openclaw
The fastest way to get Ollama running with a complete supporting stack is through better-openclaw. Run npx create-better-openclaw --preset ai-playground --yes and you'll get Ollama, Open WebUI (a ChatGPT-like interface), Qdrant (for RAG), and LiteLLM (for multi-provider routing) — all pre-configured and connected.
Model Selection
For general-purpose tasks, llama3.3:8b offers the best balance of quality and speed on consumer hardware. For coding, codellama:13b or deepseek-coder:6.7b are excellent choices. If you have 24+ GB VRAM, llama3.3:70b with 4-bit quantization provides near-GPT-4 quality. Pull models with ollama pull model-name.
GPU Acceleration
Ollama automatically detects NVIDIA GPUs with CUDA support. For AMD GPUs, ROCm support is available on Linux. Without a GPU, Ollama falls back to CPU inference, which works but is significantly slower. The better-openclaw-generated Docker config automatically passes through GPU devices when detected.
Integration Tips
Once Ollama is running, you can connect it to n8n workflows for automated AI processing, use it as a backend for LibreChat or Open WebUI, or integrate it with your RAG pipeline via LiteLLM. The Ollama API is OpenAI-compatible, so any tool that works with the OpenAI API can be pointed at your local Ollama instance.