Docker Compose to Swarm - 6 Changes for Production

1

Add a deploy: block to every service

Replicas, restart policy, and rolling update config

What breaks without it: a crashed container stays down indefinitely. Swarm does nothing. No replicas means no HA, and no update_config means every deploy is a cold restart - 10 to 30 seconds of downtime.

↑ docker-compose.yml

services:
  web:
    image: myapp:1.4.0
    env_file: .env
    restart: unless-stopped
    networks: [app]

✓ docker-stack.yml

services:
  web:
    image: myapp:1.4.0
    env_file: .env
    networks: [app]
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
        max_attempts: 3
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: pause
      rollback_config:
        parallelism: 0
        failure_action: pause

restart: is a Compose-only key - Swarm ignores it. Use deploy.restart_policy instead. update_config.parallelism: 1 replaces containers one at a time, so traffic keeps flowing during deploys. failure_action: pause stops a broken rollout and waits for you, rather than silently replacing every container with a broken image.

Note: if your app runs database migrations on startup, keep failure_action: pause (not rollback). A rolled-back container running against a migrated schema typically crashes immediately, leaving you stuck with neither version starting.

2

Move passwords to Docker Secrets

Replace .env credentials with encrypted Swarm secrets

What breaks without it: your database password is visible in docker inspect, docker service inspect, and any tool that reads container environment variables. Anyone with Docker socket access - including compromised containers - can read it.

↑ docker-compose.yml

services:
  db:
    image: postgres:16-alpine
    env_file: .env
    environment:
      # plaintext in env
      POSTGRES_PASSWORD: mysecretpassword

✓ docker-stack.yml

services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password

secrets:
  db_password:
    external: true

Swarm stores secrets encrypted at rest on manager nodes and delivers them as files mounted at /run/secrets/<name>. They never appear in docker inspect, environment dumps, or container metadata.

Create the secret once before deploy: echo "mysecretpassword" | docker secret create db_password -. Most databases support a _FILE variant of their password env var (Postgres: POSTGRES_PASSWORD_FILE, MySQL: MYSQL_ROOT_PASSWORD_FILE). For apps that read env vars directly, mount the secret and read the file at startup with a small entrypoint wrapper.

3

Upgrade health checks with a start_period

Prevents Swarm from routing traffic before the app is actually ready

What breaks without it: Swarm marks a container healthy as soon as it starts - before your app has finished initializing, running migrations, or warming caches. Traffic hits a 500-returning container during every deploy.

↑ docker-compose.yml

services:
  web:
    healthcheck:
      test: ["CMD", "curl",
             "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      # no start_period

✓ docker-stack.yml

services:
  web:
    healthcheck:
      test: ["CMD", "curl",
             "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

start_period gives the container a grace window at startup during which health check failures do not count toward retries. Set it to however long your slowest startup takes: a Django app running migrations might need 30-90 seconds; a Rails app with a warm cache might need 120s.

Swarm uses health check status to decide when to move to the next container during a rolling update. Without start_period, a slow-starting container fails its health checks, triggers failure_action, and stalls your deploy.

4

Add a reverse proxy via Traefik labels

TLS termination, load balancing across replicas, zero-downtime deploys

What breaks without it: with no proxy in front, every deploy takes the app offline for 10-30 seconds while containers restart. You also have no TLS termination, no rate limiting, and traffic hits whichever container happens to be up - not the healthy one.

↑ docker-compose.yml

services:
  web:
    image: myapp:1.4.0
    ports:
      - "3000:3000"
    networks: [app]

✓ docker-stack.yml

services:
  web:
    image: myapp:1.4.0
      # no ports: needed - Traefik routes internally
    networks: [app, traefik_public]
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.web.rule=Host(`app.example.com`)"
      - "traefik.http.routers.web.tls.certresolver=le"
      - "traefik.http.services.web.loadbalancer.server.port=3000"
      - "traefik.http.services.web.loadbalancer.healthcheck.path=/health"

Traefik watches Docker for label changes and auto-configures routing. It load-balances across all replicas, only routing to healthy ones, which is what makes rolling deploys zero-downtime: the old container keeps serving until the new one passes its health check, then Traefik switches traffic.

The traefik_public network is the shared overlay network where Traefik listens. Your app joins both it and your internal app network. Databases stay on the internal network only and are never reachable from the proxy.

5

Set resource limits in the deploy: block

Prevents one leaking container from OOM-killing the entire host

What breaks without it: a memory leak or traffic spike in one container consumes all available host memory. The OOM killer terminates processes unpredictably - often taking out the database or Swarm manager in addition to your app.

↑ docker-compose.yml

services:
  web:
    image: myapp:1.4.0
    deploy:
      replicas: 2
        # no resource limits

✓ docker-stack.yml

services:
  web:
    image: myapp:1.4.0
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: "1.0"
          memory: 1G
        reservations:
          cpus: "0.25"
          memory: 256M

limits is the hard cap - the container is OOM-killed if it exceeds this. reservations is what Swarm uses to decide which node has room to schedule the container. Set reservations to your typical baseline and limits to your worst-case spike.

Start with numbers from your local load tests, then tighten them over a week of production monitoring. A common starting point: 1G limit / 256M reservation for a typical web process, 512M limit / 128M reservation for a Redis-backed worker.

6

Declare named volumes and have a backup plan

Data persists across service updates and host reboots

What breaks without it: docker stack rm followed by docker stack deploy - the standard way to clean up a stuck Swarm deployment - removes anonymous volumes, taking your database with it. This is a common cause of data loss in staging and production environments.

↑ docker-compose.yml

services:
  db:
    image: postgres:16-alpine
    volumes:
      - ./data:/var/lib/postgresql/data
      # bind-mounted host path

✓ docker-stack.yml

services:
  db:
    image: postgres:16-alpine
    volumes:
      - db_data:/var/lib/postgresql/data
    deploy:
      placement:
        constraints:
          - node.role == manager

volumes:
  db_data:
        # survives stack rm, persists across updates

Named volumes survive docker stack rm because Docker tracks them separately. Bind mounts (./data:/path) work on single-host Compose but behave unpredictably in multi-host Swarm where the scheduler can place your container on any node.

The placement constraint pins the database to the same node where the volume exists. In a multi-node Swarm, this is critical - without it, Swarm might schedule the database on a different host on the next deploy, and it will start with an empty data directory.

Named volumes are not a backup. Use an automated backup tool (pg_dump to object storage, restic, or your platform's snapshot system) on a schedule you're comfortable with.

#	Change	What it prevents
1	`deploy:` block	Crashed container stays down; cold restarts on every deploy
2	Docker Secrets	Passwords visible in `docker inspect` and env dumps
3	`start_period` on health checks	Traffic routed to containers still initializing
4	Traefik labels	10-30s downtime on every deploy; no TLS; no load balancing
5	Resource limits in `deploy:`	Memory leak OOM-kills the entire host
6	Named volumes + placement constraints	Data loss on `docker stack rm`; DB scheduled on wrong host

Your Compose file is one step
from production.

All 6 changes at a glance

We run this stack - and watch it - for you

Your Compose file is one stepfrom production.

All 6 changes at a glance

We run this stack - and watch it - for you

Your Compose file is one step
from production.