Kubernetes Everywhere: Lessons Learned From Going Multi-Cloud - Niko Smeds, Grafana Labs

CNCF [Cloud Native Computing Foundation]


Why Opt for Multi-Cloud?

  • Increased Regional Coverage:

    • Access to unique locations not available on a single provider.
    • Grafana's synthetic monitoring product needed diverse deployment locations.
  • Avoid Vendor Lock-In:

    • Flexibility to shift workloads based on cost, performance, or stability.
  • Customer Preferences:

    • Latency, data sovereignty, and vendor discounts influence cloud selection.

Grafana's Cloud Expansion Project

  • Transition from GCP to AWS:

    • Majority of services were on GCP with some presence on other providers.
    • Established foundational resources like AWS organizations, VPCs, IAM policies, and Kubernetes clusters.
  • Networking Setup:

    • Connected clusters across providers using managed VPNs.
    • Ensured private IP ranges for internal communications.
  • Managed Kubernetes for Efficiency:

    • Easier to maintain, leveraging provider expertise.
    • Installed essential workloads (e.g., Prometheus, Grafana, Flux) before product deployment.

Key Lessons Learned

  • Cloud Providers Are Similar But Not the Same:

    • Services vary between providers; adapting configurations is necessary.
      • Examples:
        • GCP VPCs are global resources, AWS VPCs are regional.
        • GCP supports larger subnets (up to /8), AWS maxes at /16.
        • Object storage rate limits and managed load balancers differ.
  • Expect Iterative Planning:

    • Initial plans are likely to fail; iteration is critical.
    • Use infrastructure as code (e.g., Terraform) and version control for:
      • Peer reviews, live project tracking, and historical documentation.
  • Prepare for Documentation Overload:

    • Multi-cloud projects involve extensive documentation ("documentation hell").
    • Practical learning occurs during implementation, not just planning.
  • Plan for Unexpected Issues:

    • Bugs and unforeseen challenges are inevitable.
    • Flexibility and quick iterations are key to progress.
  • Tailored Approaches Per Provider:

    • Each provider's specifics (e.g., networking, IP allocation) impact the implementation.
    • Refactor plans to avoid resource conflicts (e.g., overlapping private IP ranges).

Additional Recommendations

  • Start Small: Begin with proof of concept (POC) clusters to test configurations.
  • Focus on Dependencies: Avoid inter-provider dependencies to minimize cascading failures.
  • Understand Scale: Define expected cluster sizes and capacity requirements upfront.
  • Leverage Team Collaboration: Utilize tools like Git for shared learning and troubleshooting.

By recognizing and addressing these nuances, organizations can build robust multi-cloud infrastructures while minimizing disruption and inefficiencies.

All systems normal

© 2025 2023 Sanjeeb KC. All rights reserved.