Debug Docker Swarm Services
The current stable version of Docker Swarm has an important feature missing, and is the same control over logs available without orchestratrion.
It can be really painful to debug a docker swarm create
command that ends in a "task: non-zero exit (1)"
kind of error.
But, the docker team is working in a solution, they started developing it when Docker Swarm Mode appeared, and it is already available as an experimental feature, so in this article I will explain how to enable experimental features on Docker Swarm and how to use the new docker service log
command.
$docker service logs px8fv06fft8s
only supported with experimental daemon
How to enable Docker Daemon experimental features
You can follow the official documentation here , but I will briefly explain the necessary steps.
-
Create the file /etc/docker/daemon.json with the following content:
{ "experimental": true }
-
Restart the docker daemon
systemctl restart docker
-
Test if the experimental features are enabled:
docker version -f ‘{{.Server.Experimental}}’
true
docker service logs
is not the only experimental feature being backed for Docker Swarm, there are other very interesting ones like Metrics for Prometheus for basic container, image, and daemon operations.
Using Docker Service Logs
This new feature is very straightforward in it’s use, specially if you are familiarized with debuging problems using logs on non orchestrated docker installations (without the swarm).
After any docker service create ...
an ID for this service is shown:
$docker service create --name jenkins -p 8082:8080 -p 50000:50000 -e JENKINS_OPTS="--prefix=/jenkins" --mount "type=bind,source=/mnt/swarm-nfs/jenkins-fg,target=/var/jenkins_home" --reserve-memory 300m jenkins
v6oq1shzs51fw6q8d2bstkd2f
So in this example we had created a service to run Jenkins, this service is using NFS for data persistance, and after the command is executed we obtain the ID of the service, that in this case is v6oq1shzs51fw6q8d2bstkd2f
To check the status of the service, you can use the docker service ps
command
$docker service ps jenkins
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
j33856j3oxt1 jenkins-fg.1 jenkins:latest mc-swarm-app-place01-live.mcon-group.systems Ready Ready 2 seconds ago
aaig0ummuhnq \_ jenkins-fg.1 jenkins:latest mc-swarm-app-place03-live.mcon-group.systems Shutdown Failed less than a second ago "task: non-zero exit (1)"
fqsp16ixgft3 \_ jenkins-fg.1 jenkins:latest mc-swarm-app-place01-live.mcon-group.systems Shutdown Failed 10 seconds ago "task: non-zero exit (1)"
kn9c09vy06x1 \_ jenkins-fg.1 jenkins:latest mc-swarm-app-place02-live.mcon-group.systems Shutdown Failed 11 seconds ago "task: non-zero exit (1)"
t48zk8hmn2ga \_ jenkins-fg.1 jenkins:latest mc-swarm-app-place03-live.mcon-group.systems Shutdown Failed 18 seconds ago "task: non-zero exit (1)"
Here we can see the run process for the container failed to succeed, we also are informed in which member of the swarm the container was trying to start, but we don’t get any information of why this error happened.
So to find out, we check the logs of the starting process:
$docker service logs j4kawvh4b41h
jenkins-fg.1.r7qd0204i7sz@mc-swarm-app-place01-live.mcon-group.systems | touch: cannot touch ‘/var/jenkins_home/copy_reference_file.log’: Permission denied
jenkins-fg.1.60jtwoggnm0q@mc-swarm-app-place03-live.mcon-group.systems | touch: cannot touch ‘/var/jenkins_home/copy_reference_file.log’: Permission denied
jenkins-fg.1.kkvqtiyek4wq@mc-swarm-app-place01-live.mcon-group.systems | touch: cannot touch ‘/var/jenkins_home/copy_reference_file.log’: Permission denied
jenkins-fg.1.r7qd0204i7sz@mc-swarm-app-place01-live.mcon-group.systems | Can not write to /var/jenkins_home/copy_reference_file.log. Wrong volume permissions?
jenkins-fg.1.kkvqtiyek4wq@mc-swarm-app-place01-live.mcon-group.systems | Can not write to /var/jenkins_home/copy_reference_file.log. Wrong volume permissions?
The logs clearly show that the problem on this example was wrong permissions on a folder that was mounted during the service creation using an NFS filesystem, problem that we can easily solve using chmod
or mapping on NFS.