How to Build a Private RAG Pipeline with Qdrant and SearXNG
Step-by-step guide to building a Retrieval-Augmented Generation pipeline that keeps all data on your infrastructure using Qdrant, SearXNG, and Ollama.
Retrieval-Augmented Generation (RAG) supercharges LLM responses by grounding them in your specific data. Instead of relying on the model's training data alone, RAG retrieves relevant documents before generating an answer. Building this pipeline privately means your documents, queries, and embeddings never leave your network.
Architecture Overview
A private RAG pipeline consists of four components: a document ingestion layer, an embedding model, a vector database, and an LLM for generation. SearXNG handles web research and document fetching. Qdrant stores embeddings with metadata for fast similarity search. Ollama runs both the embedding model and the generation model locally.
Setting Up the Stack
Use better-openclaw's Research Agent preset: npx create-better-openclaw --preset researcher --yes. This gives you Qdrant, SearXNG, Browserless (for web scraping), and Redis (for caching) — all pre-configured. Add Ollama manually or use the AI Playground preset for the complete package.
Document Ingestion
For document ingestion, chunk your documents into 512–1024 token segments with 50-token overlaps. Generate embeddings using Ollama's nomic-embed-text model, then store them in Qdrant with metadata (source, date, title). Use n8n to automate this pipeline: watch a folder for new PDFs, extract text, chunk, embed, and store.
Query Pipeline
When a user asks a question: (1) embed the query with the same model, (2) search Qdrant for the top-5 most similar chunks, (3) construct a prompt with the retrieved context, and (4) send it to your LLM. The result is a grounded, factual response based on your specific documents — with zero data leaving your infrastructure.