Docker Swarm uses the same YAML syntax as Compose. Six targeted changes unlock rolling deploys, multi-host HA, Docker Secrets, and zero-downtime updates.
Starting from scratch? Build a solid, production-ready Compose file first at vmfarms.com/compose - it covers Bad, Good, Better, and Strictest security tiers with annotated diffs. Come back here when you're ready to scale.
deploy: block to every serviceupdate_config means every deploy is a cold restart - 10 to 30 seconds of downtime.
services: web: image: myapp:1.4.0 env_file: .env restart: unless-stopped networks: [app]
services: web: image: myapp:1.4.0 env_file: .env networks: [app] deploy: replicas: 2 restart_policy: condition: on-failure max_attempts: 3 update_config: parallelism: 1 delay: 10s failure_action: pause rollback_config: parallelism: 0 failure_action: pause
restart: is a Compose-only key - Swarm ignores it. Use deploy.restart_policy instead. update_config.parallelism: 1 replaces containers one at a time, so traffic keeps flowing during deploys. failure_action: pause stops a broken rollout and waits for you, rather than silently replacing every container with a broken image.
Note: if your app runs database migrations on startup, keep failure_action: pause (not rollback). A rolled-back container running against a migrated schema typically crashes immediately, leaving you stuck with neither version starting.
.env credentials with encrypted Swarm secretsdocker inspect, docker service inspect, and any tool that reads container environment variables. Anyone with Docker socket access - including compromised containers - can read it.
services: db: image: postgres:16-alpine env_file: .env environment: # plaintext in env POSTGRES_PASSWORD: mysecretpassword
services: db: image: postgres:16-alpine environment: POSTGRES_PASSWORD_FILE: /run/secrets/db_password secrets: - db_password secrets: db_password: external: true
Swarm stores secrets encrypted at rest on manager nodes and delivers them as files mounted at /run/secrets/<name>. They never appear in docker inspect, environment dumps, or container metadata.
Create the secret once before deploy: echo "mysecretpassword" | docker secret create db_password -. Most databases support a _FILE variant of their password env var (Postgres: POSTGRES_PASSWORD_FILE, MySQL: MYSQL_ROOT_PASSWORD_FILE). For apps that read env vars directly, mount the secret and read the file at startup with a small entrypoint wrapper.
start_periodservices: web: healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 # no start_period
services: web: healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 start_period: 60s
start_period gives the container a grace window at startup during which health check failures do not count toward retries. Set it to however long your slowest startup takes: a Django app running migrations might need 30-90 seconds; a Rails app with a warm cache might need 120s.
Swarm uses health check status to decide when to move to the next container during a rolling update. Without start_period, a slow-starting container fails its health checks, triggers failure_action, and stalls your deploy.
services: web: image: myapp:1.4.0 ports: - "3000:3000" networks: [app]
services: web: image: myapp:1.4.0 # no ports: needed - Traefik routes internally networks: [app, traefik_public] labels: - "traefik.enable=true" - "traefik.http.routers.web.rule=Host(`app.example.com`)" - "traefik.http.routers.web.tls.certresolver=le" - "traefik.http.services.web.loadbalancer.server.port=3000" - "traefik.http.services.web.loadbalancer.healthcheck.path=/health"
Traefik watches Docker for label changes and auto-configures routing. It load-balances across all replicas, only routing to healthy ones, which is what makes rolling deploys zero-downtime: the old container keeps serving until the new one passes its health check, then Traefik switches traffic.
The traefik_public network is the shared overlay network where Traefik listens. Your app joins both it and your internal app network. Databases stay on the internal network only and are never reachable from the proxy.
deploy: blockservices: web: image: myapp:1.4.0 deploy: replicas: 2 # no resource limits
services: web: image: myapp:1.4.0 deploy: replicas: 2 resources: limits: cpus: "1.0" memory: 1G reservations: cpus: "0.25" memory: 256M
limits is the hard cap - the container is OOM-killed if it exceeds this. reservations is what Swarm uses to decide which node has room to schedule the container. Set reservations to your typical baseline and limits to your worst-case spike.
Start with numbers from your local load tests, then tighten them over a week of production monitoring. A common starting point: 1G limit / 256M reservation for a typical web process, 512M limit / 128M reservation for a Redis-backed worker.
docker stack rm followed by docker stack deploy - the standard way to clean up a stuck Swarm deployment - removes anonymous volumes, taking your database with it. This is a common cause of data loss in staging and production environments.
services: db: image: postgres:16-alpine volumes: - ./data:/var/lib/postgresql/data # bind-mounted host path
services: db: image: postgres:16-alpine volumes: - db_data:/var/lib/postgresql/data deploy: placement: constraints: - node.role == manager volumes: db_data: # survives stack rm, persists across updates
Named volumes survive docker stack rm because Docker tracks them separately. Bind mounts (./data:/path) work on single-host Compose but behave unpredictably in multi-host Swarm where the scheduler can place your container on any node.
The placement constraint pins the database to the same node where the volume exists. In a multi-node Swarm, this is critical - without it, Swarm might schedule the database on a different host on the next deploy, and it will start with an empty data directory.
Named volumes are not a backup. Use an automated backup tool (pg_dump to object storage, restic, or your platform's snapshot system) on a schedule you're comfortable with.
| # | Change | What it prevents |
|---|---|---|
| 1 | deploy: block |
Crashed container stays down; cold restarts on every deploy |
| 2 | Docker Secrets | Passwords visible in docker inspect and env dumps |
| 3 | start_period on health checks |
Traffic routed to containers still initializing |
| 4 | Traefik labels | 10-30s downtime on every deploy; no TLS; no load balancing |
| 5 | Resource limits in deploy: |
Memory leak OOM-kills the entire host |
| 6 | Named volumes + placement constraints | Data loss on docker stack rm; DB scheduled on wrong host |
vmfarms manages Docker Swarm clusters on hardware we own, so you're not sharing a hypervisor with strangers. All of the above is configured, monitored, and backed up automatically.