Docker Compose → Swarm

Your Compose file is one step
from production.

Docker Swarm uses the same YAML syntax as Compose. Six targeted changes unlock rolling deploys, multi-host HA, Docker Secrets, and zero-downtime updates.

Starting from scratch? Build a solid, production-ready Compose file first at vmfarms.com/compose - it covers Bad, Good, Better, and Strictest security tiers with annotated diffs. Come back here when you're ready to scale.

1
Add a deploy: block to every service
Replicas, restart policy, and rolling update config
What breaks without it: a crashed container stays down indefinitely. Swarm does nothing. No replicas means no HA, and no update_config means every deploy is a cold restart - 10 to 30 seconds of downtime.
↑ docker-compose.yml
services:
  web:
    image: myapp:1.4.0
    env_file: .env
    restart: unless-stopped
    networks: [app]
✓ docker-stack.yml
services:
  web:
    image: myapp:1.4.0
    env_file: .env
    networks: [app]
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
        max_attempts: 3
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: pause
      rollback_config:
        parallelism: 0
        failure_action: pause

restart: is a Compose-only key - Swarm ignores it. Use deploy.restart_policy instead. update_config.parallelism: 1 replaces containers one at a time, so traffic keeps flowing during deploys. failure_action: pause stops a broken rollout and waits for you, rather than silently replacing every container with a broken image.

Note: if your app runs database migrations on startup, keep failure_action: pause (not rollback). A rolled-back container running against a migrated schema typically crashes immediately, leaving you stuck with neither version starting.

2
Move passwords to Docker Secrets
Replace .env credentials with encrypted Swarm secrets
What breaks without it: your database password is visible in docker inspect, docker service inspect, and any tool that reads container environment variables. Anyone with Docker socket access - including compromised containers - can read it.
↑ docker-compose.yml
services:
  db:
    image: postgres:16-alpine
    env_file: .env
    environment:
      # plaintext in env
      POSTGRES_PASSWORD: mysecretpassword
✓ docker-stack.yml
services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password

secrets:
  db_password:
    external: true

Swarm stores secrets encrypted at rest on manager nodes and delivers them as files mounted at /run/secrets/<name>. They never appear in docker inspect, environment dumps, or container metadata.

Create the secret once before deploy: echo "mysecretpassword" | docker secret create db_password -. Most databases support a _FILE variant of their password env var (Postgres: POSTGRES_PASSWORD_FILE, MySQL: MYSQL_ROOT_PASSWORD_FILE). For apps that read env vars directly, mount the secret and read the file at startup with a small entrypoint wrapper.

3
Upgrade health checks with a start_period
Prevents Swarm from routing traffic before the app is actually ready
What breaks without it: Swarm marks a container healthy as soon as it starts - before your app has finished initializing, running migrations, or warming caches. Traffic hits a 500-returning container during every deploy.
↑ docker-compose.yml
services:
  web:
    healthcheck:
      test: ["CMD", "curl",
             "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      # no start_period
✓ docker-stack.yml
services:
  web:
    healthcheck:
      test: ["CMD", "curl",
             "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

start_period gives the container a grace window at startup during which health check failures do not count toward retries. Set it to however long your slowest startup takes: a Django app running migrations might need 30-90 seconds; a Rails app with a warm cache might need 120s.

Swarm uses health check status to decide when to move to the next container during a rolling update. Without start_period, a slow-starting container fails its health checks, triggers failure_action, and stalls your deploy.

4
Add a reverse proxy via Traefik labels
TLS termination, load balancing across replicas, zero-downtime deploys
What breaks without it: with no proxy in front, every deploy takes the app offline for 10-30 seconds while containers restart. You also have no TLS termination, no rate limiting, and traffic hits whichever container happens to be up - not the healthy one.
↑ docker-compose.yml
services:
  web:
    image: myapp:1.4.0
    ports:
      - "3000:3000"
    networks: [app]
✓ docker-stack.yml
services:
  web:
    image: myapp:1.4.0
      # no ports: needed - Traefik routes internally
    networks: [app, traefik_public]
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.web.rule=Host(`app.example.com`)"
      - "traefik.http.routers.web.tls.certresolver=le"
      - "traefik.http.services.web.loadbalancer.server.port=3000"
      - "traefik.http.services.web.loadbalancer.healthcheck.path=/health"

Traefik watches Docker for label changes and auto-configures routing. It load-balances across all replicas, only routing to healthy ones, which is what makes rolling deploys zero-downtime: the old container keeps serving until the new one passes its health check, then Traefik switches traffic.

The traefik_public network is the shared overlay network where Traefik listens. Your app joins both it and your internal app network. Databases stay on the internal network only and are never reachable from the proxy.

5
Set resource limits in the deploy: block
Prevents one leaking container from OOM-killing the entire host
What breaks without it: a memory leak or traffic spike in one container consumes all available host memory. The OOM killer terminates processes unpredictably - often taking out the database or Swarm manager in addition to your app.
↑ docker-compose.yml
services:
  web:
    image: myapp:1.4.0
    deploy:
      replicas: 2
        # no resource limits
✓ docker-stack.yml
services:
  web:
    image: myapp:1.4.0
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: "1.0"
          memory: 1G
        reservations:
          cpus: "0.25"
          memory: 256M

limits is the hard cap - the container is OOM-killed if it exceeds this. reservations is what Swarm uses to decide which node has room to schedule the container. Set reservations to your typical baseline and limits to your worst-case spike.

Start with numbers from your local load tests, then tighten them over a week of production monitoring. A common starting point: 1G limit / 256M reservation for a typical web process, 512M limit / 128M reservation for a Redis-backed worker.

6
Declare named volumes and have a backup plan
Data persists across service updates and host reboots
What breaks without it: docker stack rm followed by docker stack deploy - the standard way to clean up a stuck Swarm deployment - removes anonymous volumes, taking your database with it. This is a common cause of data loss in staging and production environments.
↑ docker-compose.yml
services:
  db:
    image: postgres:16-alpine
    volumes:
      - ./data:/var/lib/postgresql/data
      # bind-mounted host path
✓ docker-stack.yml
services:
  db:
    image: postgres:16-alpine
    volumes:
      - db_data:/var/lib/postgresql/data
    deploy:
      placement:
        constraints:
          - node.role == manager

volumes:
  db_data:
        # survives stack rm, persists across updates

Named volumes survive docker stack rm because Docker tracks them separately. Bind mounts (./data:/path) work on single-host Compose but behave unpredictably in multi-host Swarm where the scheduler can place your container on any node.

The placement constraint pins the database to the same node where the volume exists. In a multi-node Swarm, this is critical - without it, Swarm might schedule the database on a different host on the next deploy, and it will start with an empty data directory.

Named volumes are not a backup. Use an automated backup tool (pg_dump to object storage, restic, or your platform's snapshot system) on a schedule you're comfortable with.

All 6 changes at a glance

# Change What it prevents
1 deploy: block Crashed container stays down; cold restarts on every deploy
2 Docker Secrets Passwords visible in docker inspect and env dumps
3 start_period on health checks Traffic routed to containers still initializing
4 Traefik labels 10-30s downtime on every deploy; no TLS; no load balancing
5 Resource limits in deploy: Memory leak OOM-kills the entire host
6 Named volumes + placement constraints Data loss on docker stack rm; DB scheduled on wrong host
Ready to hand off ops?

We run this stack - and watch it - for you

vmfarms manages Docker Swarm clusters on hardware we own, so you're not sharing a hypervisor with strangers. All of the above is configured, monitored, and backed up automatically.

Solo plan
Single server
One VM, fully managed. Docker Compose on a dedicated Proxmox host with all ops included - backups, WAF, scanning, monitoring. Best for teams that don't need HA yet.
Cluster plan
Multi-server Swarm
Multiple VMs in a managed Docker Swarm. Rolling deploys, replica scheduling, and failover built in. For teams that need the six changes on this page handled for them.
Hardware we own in Canadian data centres
Automated backups
Trivy + Wazuh security scanning
WAF + CrowdSec rate limiting
Free migration from DigitalOcean, Hetzner, AWS