Go Big, Stay Simple: Building Production-Grade Infrastructure for a Tiny App

I wanted to build something genuinely useful while showcasing serious DevOps capabilities. The sweet spot turned out to be a TV show aggregator – simple enough to understand at a glance, appealing enough that people actually want to use it, yet perfect for demonstrating enterprise-grade infrastructure.

The app does exactly what you’d expect: shows you what’s trending across Netflix, Disney+, and other streaming platforms with a clean, responsive interface. But underneath runs a full production platform with GitOps workflows, comprehensive observability, enterprise-grade secrets management, and multi-environment deployment pipelines. The kind of infrastructure that could easily support a much larger application.

The Strategic Decision: Simple App, Complex Infrastructure

After exploring ideas for distributed microservices, ML pipelines, and IoT dashboards, I kept gravitating toward the TV show aggregator. The application logic is straightforward: fetch data from TMDB and TVMaze APIs, deduplicate and rank by popularity, display in a clean React interface. But that simplicity was actually strategic.

With a simple application as the foundation, I could dedicate my energy to building sophisticated infrastructure. The straightforward business logic meant that when something went wrong in the ArgoCD deployment pipeline, Vault integration, or Prometheus monitoring setup, I could immediately focus on the infrastructure challenge rather than debugging complex application interactions.

The Architecture: Two Stories

The application architecture is intentionally minimal:

React Frontend ←→ Node.js API ←→ PostgreSQL ←→ External APIs

The infrastructure architecture tells a completely different story:

GitHub → Actions → GCR → ArgoCD → GKE Autopilot
    ↓                        ↓
Terraform → GCP Resources   Vault → External Secrets → K8s Secrets
    ↓                        ↓
Monitoring ← Prometheus ← Applications → Grafana Dashboards

The Production Components

GitOps with ArgoCD: Every deployment flows through Git, not kubectl apply. I have separate development and production environments with automated dev deployments and manual production promotion. Changes are declarative, auditable, and reproducible.

Infrastructure as Code with Terraform: The entire GKE Autopilot cluster, static IPs, DNS records, and IAM policies are defined in code. I can destroy and rebuild the entire infrastructure from scratch in about 10 minutes.

Enterprise Secrets Management: HashiCorp Vault with External Secrets Operator ensures API keys and database credentials never touch Git or container images. The secrets are automatically rotated and synchronized into Kubernetes secrets.

Comprehensive Observability: Prometheus scrapes metrics from every component, and custom Grafana dashboards provide real-time business intelligence. I can see not just that the system is healthy, but which streaming platforms are most popular and how API performance trends over time.

Multi-Environment Promotion: Kustomize overlays handle environment-specific configurations. Development gets relaxed resource limits and debug logging; production gets proper resource constraints and structured logs. The promotion process requires explicit approval for every production change.

What I Built

The Foundation Layer

I started with Terraform to provision the GKE Autopilot cluster, VPC networking, Cloud DNS for automatic subdomain management, and static IP addresses for ingress. The infrastructure is cost-optimized but production-ready, using Autopilot’s managed node scaling and regional persistent disks for data durability.

The Security Layer

HashiCorp Vault runs as the central secrets engine with auto-unseal capabilities. The External Secrets Operator bridges Vault and Kubernetes, automatically synchronizing secrets without storing them in Git or container images. Every service uses the principle of least privilege with dedicated service accounts and RBAC policies.

The Application Layer

The backend is a Node.js Express API with TypeScript that aggregates data from multiple sources, handles rate limiting and caching, exposes custom Prometheus metrics, and provides a clean REST interface. The frontend is a responsive React SPA built with Vite that loads instantly, caches poster images efficiently, and provides filtering by streaming platform.

The PostgreSQL database uses persistent volumes for data durability, automated backups via Cloud SQL integration, and connection pooling for performance. Everything is containerized with multi-stage Docker builds optimized for security and size.

The Deployment Layer

GitHub Actions handles the CI/CD pipeline with automated testing, Docker image building and security scanning, semantic versioning with date-based tags, and automatic development deployments. ArgoCD manages GitOps workflows with separate applications for development and production, Kustomize overlays for environment-specific configuration, and manual promotion gates for production safety.

The Observability Layer

Prometheus collects comprehensive metrics including custom business metrics (platform popularity, API response times), infrastructure metrics (CPU, memory, disk), and application metrics (request counts, error rates). Grafana provides real-time dashboards with business intelligence showing streaming platform analytics, system performance monitoring, and deployment tracking.

The Magic: GitOps Workflows

The deployment process demonstrates production-grade practices:

Development Flow: Every successful build automatically deploys to the development environment within 5-10 minutes. I can immediately test changes at the dev URL and validate functionality before promotion.

Production Flow: Production deployments require explicit promotion using ./scripts/promote-to-prod.sh <version>. The script shows exactly what will change, asks for confirmation, then updates the production manifests. ArgoCD syncs the change with full auditability.

Version Management: I use date-based semantic tags like v20250819-abc123 that make it obvious when a version was created and provide a direct link back to the source code. Much easier for operations than trying to remember what v1.47 contained.

Technology Choices and Learning Goals

Why GKE Autopilot?

I chose GKE Autopilot over standard GKE because it provides managed node scaling, automatic security patching, optimized resource allocation, and simplified cluster management. For a portfolio project, I wanted to focus on application deployment and observability rather than cluster administration.

Why ArgoCD over Flux?

ArgoCD provides an excellent web UI for visualizing deployment status, comprehensive RBAC for team environments, proven stability in production environments, and extensive documentation. The visual interface makes it much easier to understand GitOps concepts and troubleshoot deployment issues.

Why Vault for Secrets?

HashiCorp Vault offers enterprise-grade secret rotation, comprehensive audit logging, fine-grained access policies, and integration with cloud IAM systems. While Kubernetes secrets would have been simpler, Vault demonstrates production-ready secrets management patterns.

Current Performance and Results

The system delivers impressive results for a “simple” TV aggregator:

Deployment Speed: Automated deployments complete in 3-5 minutes (85% faster than manual processes), with full testing and validation built in.

Reliability: 99.9%+ uptime with comprehensive monitoring, automated alerting, and quick rollback capabilities.

Security: Enterprise-grade secrets management, network policies, and container security scanning with automated vulnerability detection.

Observability: Real-time business intelligence showing platform popularity trends, API performance metrics, and system health indicators.

Lessons Learned

Infrastructure Complexity Offers Different Challenges Than Application Complexity

Building GitOps workflows, secrets management, and observability stacks involves fascinating operational challenges. The complexity comes from orchestrating component interactions, managing configurations across environments, and designing deployment workflows that scale from development to production.

Clear Mental Models Are Essential

When ArgoCD shows “OutOfSync” status, I need to immediately understand which Git commit should be deployed, what Kustomize transformations apply, and how external secrets affect pod startup. Complex infrastructure demands clear architectural diagrams and understanding of component dependencies.

Production Readiness Is a Mindset

Moving beyond “it works on my machine” to “it works reliably in production” requires comprehensive monitoring, proper secrets management, reproducible deployments, and systematic debugging approaches. Every architectural decision has operational implications.

What’s Next

The foundation is solid and ready for enhancement:

Performance Optimization: Add Redis caching layer for API responses, implement CDN for static assets, and optimize database queries for faster loading.

Advanced Monitoring: Create custom alerting rules based on business metrics, implement distributed tracing for request flows, and add cost optimization dashboards.

Security Enhancements: Implement network policies for pod-to-pod communication, add OAuth integration for user authentication, and enable audit logging for compliance.

From Learning Project to Portfolio Showcase

What started as a way to learn GitOps concepts became a comprehensive demonstration of production-ready DevOps practices. The project shows that I can build resilient infrastructure, make informed architectural decisions, integrate multiple technologies into cohesive workflows, and systematically debug complex systems.

More importantly, it demonstrates systems thinking – understanding how components interact across application, orchestration, infrastructure, and security layers. The simple application provided a perfect foundation for exploring sophisticated infrastructure patterns without getting lost in business logic complexity.

Building production-grade infrastructure for a TV show aggregator taught me more about DevOps engineering than building a complex application with simple deployment would have. The focused approach of applying advanced concepts to a straightforward application created the perfect learning environment.

The complete system is live at tv-hub.navillasa.dev with real-time monitoring at monitoring.navillasa.dev. Full source code and infrastructure definitions are available at github.com/navillasa/tv-dashboard-k8s.