Kubernetes StatefulSets for Databases: A Practical Weaviate Deployment Case Study

Kubernetes StatefulSets

At Madgical Techdom, we work extensively with Kubernetes to run production-grade systems, and Kubernetes StatefulSets play a critical role in managing stateful workloads. While stateless applications are straightforward to deploy using Deployments, databases are a completely different story.

Databases require:

  • Stable network identity
  • Persistent storage
  • Ordered scaling
  • Availability Zone (AZ) awareness

In this blog, we’ll walk through how we deployed Weaviate using Kubernetes StatefulSets and why Kubernetes StatefulSets are the backbone of reliable stateful workloads.

We’ll also break down:

  • What PV and PVC really are
  • How EBS enforces AZ constraints
  • What happens when a pod crashes
  • How Kubernetes ensures it gets the same volume again

Let’s dive in.


Why Kubernetes StatefulSets Exist?

In Kubernetes, a Deployment creates identical, replaceable pods. If one pod dies, another one is created — possibly on a different node, possibly with a new identity.

That works perfectly for:

  • APIs
  • Frontend services
  • Background workers

But not for databases.

Databases require specific characteristics that stateless workloads do not.

Why Databases Require These Capabilities?

  1. Stable pod names (e.g., db-0, db-1)
    Distributed databases rely on stable network identities for communication between nodes. A predictable hostname ensures that replicas can consistently identify and connect to the same peer in the cluster.
  2. Persistent storage tied to each replica
    Each database replica stores its own data. If a pod restarts or moves to another node, it must reattach to the same storage volume to avoid data loss and maintain data consistency.
  3. Predictable startup order
    Many databases require an ordered startup to function correctly. For example, a primary node may need to start first before replicas can connect and synchronize their data.

This is where Kubernetes StatefulSets come in.

StatefulSets guarantee:

  • Stable identity for each pod
  • Persistent storage per pod
  • Ordered deployment and scaling
  • Controlled rolling updates

These capabilities make StatefulSets the preferred approach for running databases and other stateful workloads on Kubernetes.


Case Study: Deploying Weaviate as a StatefulSet on Kubernetes

For our case study, we deployed Weaviate — a vector database designed for AI-powered search and embeddings — using Kubernetes StatefulSets to ensure reliable storage and stable pod identity.

We used Kubernetes StatefulSets because they provide persistent storage, ordered deployment, and predictable recovery, which are essential for running databases in production environments.

We followed the official Kubernetes installation guide provided by the Weaviate team and used their Helm chart to deploy the database cluster.


Deployment Approach

The official Weaviate Kubernetes deployment uses Helm charts, which package all required Kubernetes resources, such as:

  • StatefulSets
  • Services
  • PersistentVolumeClaims
  • Configurations

Helm simplifies managing upgrades, scaling, and configuration changes for the database cluster.


Prerequisites

Before deploying Weaviate, the following prerequisites were required:

  • A running Kubernetes cluster (v1.23 or later)
  • kubectl configured to access the cluster
  • Helm v3 installed
  • Storage capable of provisioning Persistent Volumes using PVCs (for example Amazon Elastic Block Store when running on Amazon Elastic Kubernetes Service)

These requirements ensure that Kubernetes can dynamically provision storage for each database replica.


StorageClass Requirement

When deploying stateful workloads in Kubernetes, the cluster must have a StorageClass capable of dynamically provisioning volumes.

During our deployment on Amazon Elastic Kubernetes Service running on Amazon Web Services, the deployment initially failed because the cluster did not have a default StorageClass configured.

As a result:

  • PersistentVolumeClaims remained in Pending state
  • Pods were unable to start

Example check:

kubectl get pvc -n weaviate

Output:

STATUS: Pending
STORAGECLASS: <unset>

After configuring a default StorageClass backed by AWS EBS, Kubernetes was able to dynamically provision the required volumes and the pods started successfully.

This highlights an important operational requirement when deploying stateful applications on Kubernetes.


Step 1: Add the Weaviate Helm Repository

First, we added the official Helm repository that contains the Weaviate deployment chart.

helm repo add weaviate https://weaviate.github.io/weaviate-helm
helm repo update

Helm charts act as templates that generate Kubernetes resources needed to run Weaviate in the cluster.


Step 2: Create a Namespace

To isolate the deployment, we created a dedicated namespace.

kubectl create namespace weaviate

Namespaces help organize workloads and apply access control policies within a Kubernetes cluster.


Step 3: Configure the Helm Values

The Helm chart uses a values.yaml file to define configuration such as:

  • Number of replicas
  • Resource limits
  • Storage settings
  • Networking configuration

Example configuration for a multi-replica setup:

replicaCount: 3
resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "2"
    memory: "4Gi"

Setting replicaCount: 3 creates three Weaviate replicas, enabling horizontal scaling and higher availability.


Step 4: Deploy Weaviate

Once the configuration was ready, we deployed Weaviate using Helm:

helm upgrade --install \
  weaviate \
  weaviate/weaviate \
  --namespace weaviate \
  --values values.yaml

This command creates all the required Kubernetes resources including the StatefulSet that manages the database pods.


Stateful Architecture

Each replica in the deployment runs as a StatefulSet pod.

Example pod structure:

PodStoragePurpose
weaviate-0pvc-weaviate-0Stores vector index data
weaviate-1pvc-weaviate-1Replica node
weaviate-2pvc-weaviate-2Replica node

Each pod receives its own Persistent Volume Claim, ensuring data remains attached to the same replica even if the pod restarts.


Failure Recovery Test

To verify persistence behavior, we manually deleted one of the pods:

kubectl delete pod weaviate-0 -n weaviate

Kubernetes automatically recreated the pod.

Key observations:

  • The pod restarted with the same identity
  • The same PersistentVolumeClaim was reattached
  • No new storage was created

This demonstrates how StatefulSets guarantee storage persistence and deterministic pod identity, which are essential for databases like Weaviate.


Why StatefulSets Were Critical?

Because Weaviate stores vector indexes and metadata on disk, losing or incorrectly attaching storage can corrupt the database.

StatefulSets ensure:

  • Stable pod identity
  • Persistent storage per replica
  • Ordered deployment and scaling
  • Reliable recovery after pod failure

This architecture allowed us to deploy Weaviate as a production-ready vector database cluster on Kubernetes.


Architecture Overview

Kubernetes StatefulSets

Each Weaviate replica:

  • Runs as a StatefulSet pod
  • Gets its own Persistent Volume Claim (PVC)
  • Binds to its own EBS volume
  • Is scheduled in a specific Availability Zone

Example:

PodAZEBS Volume
weaviate-0ap-south-1avol-a
weaviate-1ap-south-1bvol-b
weaviate-2ap-south-1cvol-c

This ensures:

  • Data locality
  • No cross-AZ attachment issues
  • Predictable recovery behavior

Persistent Storage Per Pod: Understanding PV & PVC

This is one of the most misunderstood areas in Kubernetes.

Persistent Volume (PV)

A PV is a cluster-level storage resource.

Think of it as: “Actual disk storage provisioned in the infrastructure.”

In our case, PVs are backed by AWS EBS volumes.


Persistent Volume Claim (PVC)

A PVC is: A request for storage made by a pod

The flow looks like this:

  1. Pod requests storage via PVC
  2. StorageClass provisions an EBS volume
  3. A PV is created
  4. PVC binds to that PV
  5. Pod mounts that PVC

How StatefulSets Handle This?

StatefulSets use something called volumeClaimTemplates

This means: Each replica automatically gets its own PVC.

Example:

  • weaviate-0 → pvc-weaviate-0 → pv-0 → ebs-volume-0
  • weaviate-1 → pvc-weaviate-1 → pv-1 → ebs-volume-1

Each pod owns its own storage permanently.

Even if the pod dies, the PVC remains.

That is the key.


How AZ Enforcement Works with EBS?

Now let’s talk about Availability Zones.

AWS EBS volumes are AZ-scoped.

An EBS volume created in:

  • ap-south-1a

Cannot be attached to a node in:

  • ap-south-1b

This is not a Kubernetes rule.

This is an AWS infrastructure rule.


Scenario: What Happens When a Pod Crashes?

Let’s say:

  • weaviate-1 is running in ap-south-1b
  • It is attached to EBS volume vol-b
  • The pod crashes

Step-by-Step Recovery

  1. Kubernetes detects the pod failure
  2. StatefulSet controller recreates pod weaviate-1
  3. It uses the same PVC (pvc-weaviate-1)
  4. That PVC is already bound to the same PV
  5. That PV references the same EBS volume (vol-b)

Now here’s the important part:

The EBS volume vol-b can only attach in ap-south-1b.

So the scheduler:

  • Looks for nodes in ap-south-1b
  • Schedules the pod there

If no node exists in that AZ?

The pod remains Pending.

Kubernetes will NOT move the volume.

It will NOT reassign storage.

It will wait until a node in the correct AZ becomes available.

This is how AZ enforcement works.

It is enforced by:

  • EBS topology constraints
  • Volume node affinity
  • Kubernetes scheduler logic

Not by magic.


Why This Matters in Production?

Without StatefulSets:

  • Pods could move across AZs
  • Volumes would fail to attach
  • Recovery would break

With StatefulSets + EBS:

  • Each replica has guaranteed storage
  • Recovery is deterministic
  • AZ constraints are respected
  • Data remains safe

In our Weaviate deployment, this allowed:

  • Predictable failover
  • No manual reattachment
  • Zero data loss during pod restarts

Lessons from Deploying Weaviate

Here are our real-world takeaways:

1️⃣ Never run databases as Deployments

Use StatefulSets.

2️⃣ Always use a StorageClass with topology awareness

Ensure it provisions volumes in the correct AZ.

3️⃣ Understand crash recovery behavior

Test what happens when:

  • A pod crashes
  • A node crashes
  • An AZ loses capacity

4️⃣ Monitor PVC and PV states

Storage is your real source of truth.


When Should You Use Kubernetes StatefulSets?

Use StatefulSets when:

  • Running databases (Weaviate, PostgreSQL, MongoDB, etc.)
  • Running Kafka
  • Running Elasticsearch
  • Any workload needing a stable identity + storage

Do NOT use it for:

  • Stateless APIs
  • Frontend apps
  • Short-lived jobs

Final Thoughts

Kubernetes StatefulSets are not just another Kubernetes object.

They are the backbone of reliable stateful workloads.

Our Weaviate deployment proved that:

  • Stable pod identity
  • Dedicated persistent volumes
  • AZ-aware scheduling
  • Deterministic recovery

Are absolutely critical for production-grade systems.

If you’re running databases on Kubernetes and still using Deployments, it’s time to rethink your architecture.

Stateful workloads demand StatefulSets.


Thank you for reading!

If this blog helped clarify how StatefulSets enforce storage consistency and AZ constraints, feel free to mention them in the comment section below or contact us.

See you in the next deep-dive.

References