guatulabs/dev

Technical walkthroughs, open-source projects, and real-world guides on AI agents, homelab infrastructure, Kubernetes, and IIoT.

GPU PCI Address Instability: When Your Card Moves Between Reboots

GPU PCI Address Instability: When Your Card Moves Between Reboots

Dealing with shifting PCI addresses in Proxmox and how to stop your GPU from disappearing or changing IDs after a reboot.

Cognitive Memory for Agents: Vector Search vs Activation-Based Recall

Cognitive Memory for Agents: Vector Search vs Activation-Based Recall

Comparing vector databases and activation-based memory for AI agents. Trade-offs in latency, scale, and interpretability.

AdGuard Home: Network-Wide DNS Filtering with Failover

AdGuard Home: Network-Wide DNS Filtering with Failover

Setting up AdGuard Home for network-wide DNS filtering with a robust failover strategy to prevent total internet outages.

Unprivileged LXC + Docker: The runc Sysctl Permission Trap

Unprivileged LXC + Docker: The runc Sysctl Permission Trap

Fixing the silent failure of sysctl settings when running Docker inside unprivileged Proxmox LXC containers.

Three-Layer Safety for Autonomous Agents: Stopping the Infinite Loop

Three-Layer Safety for Autonomous Agents: Stopping the Infinite Loop

Moving beyond prompt engineering to implement token-level schema enforcement, pre-execution gates, and shell-safe execution pipelines for AI agents.

Stop Merging Broken YAML: Kubernetes Manifest Validation in CI

Stop Merging Broken YAML: Kubernetes Manifest Validation in CI

Don't let invalid manifests break your GitOps pipeline. Learn how to use kubeconform and Kyverno exclusions to catch errors before they hit production.

GPU D3cold Power States: How to Brick Your Card Without Trying

GPU D3cold Power States: How to Brick Your Card Without Trying

Bricking GPUs with D3cold: Real-world gotchas and fixes for Proxmox users

cert-manager + Cloudflare DNS-01: Automated TLS for Everything

cert-manager + Cloudflare DNS-01: Automated TLS for Everything

Automating TLS with cert-manager and Cloudflare DNS-01 in Kubernetes

SealedSecrets Key Backup: Don't Lose Your Encryption Keys

SealedSecrets Key Backup: Don't Lose Your Encryption Keys

How to back up and recover SealedSecrets encryption keys in Kubernetes

Ollama on Kubernetes: Recreate Strategy and Single-GPU Deadlock

Ollama on Kubernetes: Recreate Strategy and Single-GPU Deadlock

Deploying Ollama on Kubernetes can lead to GPU deadlocks. Here's how to avoid them.

MQTT Broker Selection: HiveMQ vs Mosquitto for Industrial Use

MQTT Broker Selection: HiveMQ vs Mosquitto for Industrial Use

Comparing HiveMQ and Mosquitto for industrial IoT: scalability, security, and reliability

Wildcard DNS + ndots:5: The TLS Nightmare and How to Fix It

Wildcard DNS + ndots:5: The TLS Nightmare and How to Fix It

Kubernetes default DNS settings can cause TLS certificate mismatches when using wildcard DNS. Here is how to debug and fix it.

Self-Improving AI Infrastructure: How Your Homelab Wiki Updates Itself

Self-Improving AI Infrastructure: How Your Homelab Wiki Updates Itself

How to automate your homelab wiki with self-improving AI infrastructure

The 6-Layer Memory Architecture I Run for Claude Code

The 6-Layer Memory Architecture I Run for Claude Code

Open-sourcing the memory system behind my Claude Code setup: CLAUDE.md, path-scoped rules, wiki, vector search, cognitive memory. With the mistakes.

Building Karpathy's LLM Wiki: A Production Homelab Implementation

Building Karpathy's LLM Wiki: A Production Homelab Implementation

Implementing Karpathy's LLM Wiki in a homelab with real-world lessons and gotchas

Proxmox API Tokens: Bash History Expansion and the ! Character

Proxmox API Tokens: Bash History Expansion and the ! Character

Bash history expansion breaks Proxmox API tokens — here's how to fix it

AMD iGPU Stealing Your RAM: UMA Frame Buffer on Headless Servers

AMD iGPU Stealing Your RAM: UMA Frame Buffer on Headless Servers

AMD iGPU steals RAM on headless servers, here's how to fix it

Agent Credential Management: Two-Tier Service Accounts for Secure AI Agent Workflows

Agent Credential Management: Two-Tier Service Accounts for Secure AI Agent Workflows

Managing agent credentials with two-tier service accounts: a secure approach for AI agent orchestration

Infrastructure as Code, but Automated: OpenTofu and GitHub Actions

Infrastructure as Code, but Automated: OpenTofu and GitHub Actions

Stop manual applies. Learn how to build a production-ready CI/CD pipeline for your infrastructure using OpenTofu and GitHub Actions.

Equipment Health Scoring: How One Number Made My Operators Stop Checking the Dashboard

Equipment Health Scoring: How One Number Made My Operators Stop Checking the Dashboard

Turn raw sensor data into a single, actionable health score operators actually use

Pod Disruption Budgets: Why kubectl drain Gets Stuck on Longhorn

Pod Disruption Budgets: Why kubectl drain Gets Stuck on Longhorn

Pod Disruption Budgets can block kubectl drain on Longhorn. Here's how to avoid it.

Attention Residuals: How Kimi Is Rethinking Transformer Depth

Attention Residuals: How Kimi Is Rethinking Transformer Depth

Kimi's Attention Residuals replace fixed residual connections with learned layer aggregation. What it means for LLM depth.

Helm fullnameOverride: Naming Sanity in ArgoCD

Helm fullnameOverride: Naming Sanity in ArgoCD

Avoid naming chaos in ArgoCD by using Helm fullnameOverride effectively

NVIDIA Container Toolkit: Why the Default Runtime Matters

NVIDIA Container Toolkit: Why the Default Runtime Matters

Fixing default runtime misconfigurations in NVIDIA Container Toolkit for GPU workloads

DOCP/XMP: Why Your Proxmox Node Runs at Half RAM Speed

DOCP/XMP: Why Your Proxmox Node Runs at Half RAM Speed

DOCP/XMP enabled but Proxmox still runs at half RAM speed? Check your kernel and BIOS.

AMD Ryzen C-State Freezes: How `processor.max_cstate=1` Saved My Proxmox Node

AMD Ryzen C-State Freezes: How `processor.max_cstate=1` Saved My Proxmox Node

Ryzen freezes in Proxmox? Learn how to disable deep C-states and stop random system lockups.

Kubernetes Storage on Bare Metal: Longhorn in Practice

Kubernetes Storage on Bare Metal: Longhorn in Practice

How I configured Longhorn storage for a Kubernetes cluster on bare metal — what worked, what didn't, and what I'd do differently next time.

GPU Passthrough on Proxmox: A Field Guide to the Gotchas That Bit Me

GPU Passthrough on Proxmox: A Field Guide to the Gotchas That Bit Me

The documentation won't warn you about D3cold bricking, PCIe bus renumbering, or why the NVIDIA device plugin silently fails. This is that guide.

Building MCP Servers with FastMCP: Stop Writing Boilerplate, Start Writing Tools

Building MCP Servers with FastMCP: Stop Writing Boilerplate, Start Writing Tools

FastMCP makes building Model Context Protocol servers feel like FastAPI. Here's how to go from zero to a working MCP server in under an hour.

GitOps for Homelabs: How ArgoCD App-of-Apps Scales Your Cluster

GitOps for Homelabs: How ArgoCD App-of-Apps Scales Your Cluster

How the ArgoCD app-of-apps pattern brings real GitOps discipline to homelab Kubernetes — repo structure, examples, and what I'd do differently.

Multi-Agent AI Systems: Architecture Patterns That Actually Work

Multi-Agent AI Systems: Architecture Patterns That Actually Work

A practical guide to designing multi-agent AI systems — orchestrator patterns, trust boundaries, and the tradeoffs I learned running agents in production.

Building a Production Homelab: Multi-Node Proxmox Cluster with Kubernetes

Building a Production Homelab: Multi-Node Proxmox Cluster with Kubernetes

How I built a multi-node Proxmox cluster running Kubernetes with GPU passthrough, GitOps, and dozens of services — and what broke along the way.