Nakama on GKE: A Complete Kubernetes Setup Guide

At our company, we specialize in designing scalable backend solutions for modern digital platforms, including multiplayer games and real-time applications. One of our recent projects involved deploying Nakama, a powerful open-source game server by Heroic Labs, within a Kubernetes environment on Google Kubernetes Engine (GKE).

Our client wanted a stable, low-latency infrastructure for their real-time multiplayer features. However, they were facing WebSocket connection drops, inconsistent deployment configurations, and lacked observability into server behavior.

This guide is aimed at DevOps engineers familiar with Kubernetes and game developers who are new to GKE, providing a step-by-step, production-ready approach for hosting Nakama. Beginners will also get a brief introduction to Nakama, CockroachDB, and essential Kubernetes concepts.

Problem Statement

Our client wanted to use Nakama to power real-time multiplayer features for their game. However, they faced several challenges:

WebSocket Timeout Issues: WebSocket connections were getting terminated after 30 seconds due to idle timeouts.
Non-Production-Ready Setup: The initial deployment was done manually with basic settings, lacking scalability and observability.
No Load Balancer Usage: Due to security and cost constraints, the client wanted to run Nakama behind a ClusterIP service instead of a public LoadBalancer.
Persistent Data Management: Ensuring stable connectivity with CockroachDB, used by Nakama as its primary database.
Monitoring and Debugging: Debug logs were needed to trace and resolve issues, but logging was not set up properly.

Our Solution

To address the issues, we created a production-ready Kubernetes deployment of Nakama that included:

Custom timeout configurations for WebSocket stability
A clean YAML-based deployment using Kubernetes best practices
A BackendConfig (for GKE Ingress, if needed) and ClusterIP-based service
Prometheus metrics exposure for observability
Database migrations: We run nakama migrate up inside the container before starting the server process:

/nakama/nakama migrate up --database.address root@cockroachdb:26257 && exec /nakama/nakama ...

Since Nakama’s migrations are idempotent, this ensures that every time a pod starts, the schema is updated automatically without conflicts. This eliminates manual intervention while still keeping the database schema consistent.

By tuning Nakama’s runtime flags and Kubernetes networking setup, we eliminated idle timeouts and ensured smooth multiplayer interaction through WebSockets.

Implementation

All YAML files used in this deployment are available in our GitHub repository. Below are key snippets and explanations.

Nakama Deployment (Deployment YAML)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nakamadeploy
  annotations:
    beta.cloud.google.com/backend-config: '{"default": "nakama-backendconfig"}'
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nakama
  template:
    metadata:
      labels:
        app: nakama
    spec:
      containers:
        - name: nakama
          image: registry.heroiclabs.com/heroiclabs/nakama:3.22.0
          command:
            - "/bin/sh"
            - "-ecx"
            - |
              /nakama/nakama migrate up --database.address root@cockroachdb:26257 &&
              exec /nakama/nakama \
              --name nakama1 \
              --database.address root@cockroachdb:26257 \
              --logger.level DEBUG \
              --session.token_expiry_sec 7200 \
              --metrics.prometheus_port 9100 \
              --socket.read_timeout_ms 60000 \
              --socket.idle_timeout_ms 120000
          ports:
            - containerPort: 7349
            - containerPort: 7350
            - containerPort: 7351
            - containerPort: 9100

1. /nakama/nakama migrate up → Runs database migrations against CockroachDB before starting. Since Nakama migrations are idempotent, this ensures schema consistency across restarts.

2. exec /nakama/nakama → Starts the Nakama server process itself.

Key flags:

–database.address root@cockroachdb:26257: Connects to the CockroachDB service running in the cluster.
–logger.level DEBUG: Enables verbose logging for troubleshooting.
–session.token_expiry_sec 7200: Sets token expiry (2 hours).
–metrics.prometheus_port 9100: Exposes Prometheus metrics.
–socket.read_timeout_ms and –socket.idle_timeout_ms: Prevent WebSocket disconnects by increasing idle/read timeout thresholds.

Ports:

7349 → Nakama console (not usually exposed).
7350 → gRPC API (main client/game communication).
7351 → HTTP/REST API.
9100 → Prometheus metrics endpoint.

Why it matters:
This ensures DB migrations always run safely before Nakama starts, WebSockets don’t randomly drop, and Prometheus can scrape metrics for observability.

Nakama Service (ClusterIP)

apiVersion: v1
kind: Service
metadata:
  name: nakama
  annotations:
    cloud.google.com/backend-config: '{"default": "nakama-backendconfig"}'
spec:
  type: ClusterIP
  selector:
    app: nakama
  ports:
    - name: grpc
      port: 7350
      protocol: TCP
      targetPort: 7350
    - name: http
      port: 80
      protocol: TCP
      targetPort: 7351
    - name: https
      port: 443
      protocol: TCP
      targetPort: 7351
    - name: prometheus
      port: 9100
      protocol: TCP
      targetPort: 9100

This defines how other services (inside the cluster) reach Nakama.

Type: ClusterIP → Keeps the service internal only (not exposed to the public internet).
Ports mapped:
- grpc → Port 7350 (gRPC).
- http → Port 80 mapped to Nakama’s 7351 (REST API).
- https → Port 443 mapped to 7351.
- prometheus → Port 9100 mapped to metrics.

Why it matters:

Developers and game clients inside the same VPC/cluster can call Nakama using Kubernetes DNS (nakama.default.svc.cluster.local).
Prometheus can scrape metrics from port 9100.
No external LoadBalancer is used → saves cost and adds a layer of security.

BackendConfig for GKE Ingress

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: nakama-backendconfig
spec:
  timeoutSec: 3600
  connectionDraining:
    drainingTimeoutSec: 300
  sessionAffinity:
    affinityType: NONE

This is a GKE-only resource that fine-tunes how Google Cloud Ingress handles backend services.

timeoutSec: 3600 → Increases idle timeout to 1 hour, preventing WebSocket drops (fix for your earlier 30s timeout issue).
connectionDraining: 300s → Allows existing connections to gracefully drain for 5 minutes during pod updates.
sessionAffinity: NONE → No sticky sessions. Each client request may go to any Nakama pod (good for stateless scaling).

Why it matters:
This prevents idle WebSocket disconnects and makes rolling updates smoother.

To get started, simply clone the repository and follow the Readme file steps.

📌 Make sure your current Kubernetes context is set to your GKE cluster and you have the necessary permissions to create these resources.

Architecture Diagram

Here’s a visual overview of the deployment:

Nakama Pod: Runs the game server and performs schema migrations on startup. Metrics are exposed on port 9100.

Nakama Service (ClusterIP): Internal access point for gRPC/HTTP traffic and metrics.

CockroachDB: Primary database used by Nakama, running as a StatefulSet with persistent volumes.

Prometheus: Scrapes metrics from Nakama’s exposed 9100 endpoint, enabling monitoring and alerting.

Benefits

After deploying the updated setup:

WebSocket Stability Improved: No more disconnects after 30 seconds—sessions remained alive for the configured duration.
Production-Ready Setup: The game server is now running with Kubernetes-native features like config-based deployments and service discovery.
Cost-Effective Networking: We used ClusterIP internally, avoiding the cost of LoadBalancer services.
Extensible Architecture: This setup can now be scaled to multiple Nakama pods with a simple replica change and Ingress setup.
Better Observability: Prometheus metrics and debug logs enabled better monitoring and alerting.

Conclusion

Deploying Nakama on Kubernetes can be straightforward and production-grade—if you configure it correctly. By deploying Nakama in GKE using Kubernetes-native components and best practices, we achieved a robust, secure, and scalable real-time gaming backend setup. Here’s a snapshot of the improvements:

Reduced WebSocket timeouts by 90% by fine-tuning idle and read timeout settings at the socket level and aligning GKE’s BackendConfig appropriately.
Achieved 100% SSL coverage with GKE-managed certificates and Ingress, ensuring all client communications are encrypted.
Improved deployment speed by over 60%, thanks to automated YAML-based configuration and reusable manifests.
Increased observability with Prometheus integration, allowing the dev team to monitor live metrics and respond to issues proactively.

This setup not only ensured a smoother gameplay experience but also gave our client’s team confidence in managing their backend infrastructure without vendor lock-in or operational headaches.

If you’re planning to self-host Nakama or any real-time backend service in Kubernetes, this setup offers a production-ready baseline you can rely on.

Thank you for reading!! 🙌🏻😁📃, see you in the next blog.🤘

I hope this article proves beneficial to you. If you have any doubts or suggestions, feel free to mention them in the comment section below or contact us.

The end ✌🏻

Nakama on GKE: A Complete Kubernetes Setup Guide

Problem Statement

Our Solution

Implementation

Nakama Deployment (Deployment YAML)

Nakama Service (ClusterIP)

BackendConfig for GKE Ingress

Architecture Diagram

Benefits

Conclusion

References

How CI/CD Cut Costs & Time for a Leading Chatbot Company?

API Testing in Action: Enhanced Fintech Quality Assurance

Big Data – Hadoop and Hive 101: Everything You Need to Know

Company

Product

Legal

Follow Us

Problem Statement

Our Solution

Implementation

Nakama Deployment (Deployment YAML)

Nakama Service (ClusterIP)

BackendConfig for GKE Ingress

Architecture Diagram

Benefits

Conclusion

References

You may also like

How CI/CD Cut Costs & Time for a Leading Chatbot Company?

API Testing in Action: Enhanced Fintech Quality Assurance

Big Data – Hadoop and Hive 101: Everything You Need to Know

Company

Product

Legal

Follow Us