Experimenting with Kubernetes Chaos Engineering

Ever felt that jolt of panic when something breaks in production? It’s a feeling most of us know all too well. Instead of just reacting to these incidents, FlowFactor, one of our DevOps competence centres, decided to take a proactive approach. They dove into the world of chaos engineering within a Kubernetes environment to learn how to build more resilient systems.

A quick word on Kubernetes (you probably already know)

Kubernetes (or K8s), if you’re working in tech, you likely already know its importance. Quite simply, it’s the go-to platform for managing containerized applications. Born from Google’s own internal container management system, it’s now an open-source standard for orchestrating applications in the cloud and beyond. For a deeper dive into the specifics of Kubernetes, Google’s blog post is an excellent introduction.

Chaos engineering – sounds bad, actually good

Chaos engineering. In essence, it’s about intentionally introducing controlled failures into your systems. Think of it as stress-testing your infrastructure to find and fix weaknesses before they cause real problems. It’s a proactive approach to building resilience, and it aligns perfectly with the robust and scalable nature of Google’s cloud technologies.

In a Kubernetes environment, this is particularly relevant because of the dynamic nature of containerized applications. Kubernetes is designed to manage interconnected systems where containers are constantly being created, destroyed, and scaled.

Essentially, chaos engineering verifies that the system can withstand the unexpected events that will occur in production. It helps ensure that the very features that make Kubernetes so powerful don’t become sources of fragility.

FlowFactor’s hands-on chaos experiment

Of course, FlowFactor didn’t jump straight into causing massive disruptions in production systems. They started with a “mini chaos engineering test” by setting up a demo online boutique application. This allowed them to safely experiment without impacting any live systems.

They then used a tool called Chaos Mesh, which is specifically designed for use with Kubernetes. Their approach was methodical: first, they established a baseline of “normal” system behavior, documenting key metrics and how the application should behave.

Then, they introduced controlled failures, like simulating the sudden termination of pods (the smallest deployable units in Kubernetes) or creating temporary service outages. These smaller-scale experiments allowed them to observe how their Kubernetes environment, built on the foundation of Google’s innovative technology, responded to unexpected stresses.

Ready to learn more?

Want to understand exactly how FlowFactor conducted their chaos engineering experiment and what they learned? We encourage you to read their detailed blog post: Getting started with Kubernetes chaos engineering: Our mini chaos engineering test.

By the way, this is just one example of the cutting-edge expertise within the GC innovate network. Ready to take the next step and explore how Google Cloud and innovative DevOps practices can benefit your organization? Contact GC innovate today. We can connect you with experts like FlowFactor and guide you on your journey to building more robust, reliable, and scalable systems.

Competence Center:

Date:

08/01/25

Length:

5 min

Tags:

App Modernization

Blogs

Want to stay in the loop?

Subscribe to our newsletter and join our community of Google Cloud enthusiasts! With our newsletter, we want to cut through the noise, delivering inspiring success stories and valuable insights on all things Google by Cronos. It is our goal to keep you informed without overwhelming your inbox. On average, you can expect to hear from us once a month.

Experimenting with Kubernetes Chaos Engineering

A quick word on Kubernetes (you probably already know)

Chaos engineering – sounds bad, actually good

FlowFactor’s hands-on chaos experiment

Ready to learn more?

Competence Center:

Date:

Length:

Tags:

Related content

How Agentic AI Transforms Insurance Sales: Automating CRM Workflows

How to Unlock AI and Analytics on Your Legacy Systems (Without the Big-Bang Migration)

Building Collaborative AI Teams with Google’s Agent Development Kit

Stop Searching, Start Doing: How Gemini Enterprise Unifies Your Company’s Knowledge

How Acen and Google turn security alerts into effective protection

UZ Leuven’s Move to Google Cloud with GC innovate

Want to read some more?

Want to stay in the loop?