Monitoring Your AI Stack with Grafana and Prometheus
Set up comprehensive monitoring for your self-hosted AI infrastructure using Grafana dashboards and Prometheus metrics collection.
Running an AI stack without monitoring is flying blind. You need to know your GPU utilization, inference latency, memory pressure, disk usage, and service health — especially when running multiple resource-intensive services. Grafana and Prometheus are the gold standard for open-source monitoring.
Setting Up the Monitoring Stack
better-openclaw's DevOps preset includes Grafana, Prometheus, and pre-configured scrape targets for all services in your stack. Run npx create-better-openclaw --services grafana,prometheus,your-services --yes or use the DevOps preset. Grafana is accessible at your configured domain with auto-provisioned data sources.
Key Metrics to Track
For AI-specific monitoring, track: LLM inference latency (p50, p95, p99), tokens per second, GPU memory utilization, vector database query latency, embedding generation throughput, and queue depth for async workflows. For infrastructure, monitor CPU, memory, disk I/O, and network traffic per container.
Pre-Built Dashboards
better-openclaw generates Grafana dashboard JSON files tailored to your selected services. The default dashboard includes panels for system overview, per-container resource usage, and service-specific metrics. Import community dashboards from Grafana's library for deep dives into specific services like PostgreSQL or Redis.
Alerting
Configure Grafana alerts for critical conditions: disk usage above 85%, memory pressure on LLM containers, service health check failures, and high error rates. Pair with Gotify or ntfy (both available in better-openclaw) for push notifications to your phone when something needs attention.