Well life catches up with you fast I guess. I haven’t posted any updates because I was posting as I was building and I hit a roadblock. I’ll first talk about that in case Google gets you here while you’re running into the same problem and then give the pieces I did figure out.
Docker Swarm in Proxmox LXC Containers
Docker swarm can run real easily in VM’s. Install docker, create a swarm, add nodes to the swarm, toss some stuff in it, you’re basically done. Within an LXC container, some restrictions are going to give you problems. For the pct.conf file, I ended up with this configuration that works.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
arch: amd64
cores: 2
hostname: swarm-wrkr3-lxc
memory: 2048
mp0: /lib/modules/4.15.18-11-pve,mp=/lib/modules/4.15.18-11-pve,ro=1
net0: name=eth0,bridge=vmbr0,gw=192.168.1.1,hwaddr=C6:1F:1A:5A:A2:83,ip=192.168.1.4/22,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-111-disk-0,size=25G
swap: 512
unprivileged: 0
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:
The important bits are the ones at the end. Proxmox provides a handful of app armor profiles that allow you to run certain applications that would be blocked from running inside a container. Unconfined isn’t great from a security perspective, but since docker will still isolate processes from the LXC container, you do preserve some level of isolation. This profile allows you to get a container running to begin with.
After running my docker-ce and swarm install playbooks, things would look up and running, but containers would always fail to launch on lxc worker nodes. My docker-compose file spun up traefik and another network to allow things to talk to each other without being exposed publicly. A docker network ls
on my worker showed:
1
2
3
4
5
6
root@swarm-wrkr3-lxc:~# docker network ls
NETWORK ID NAME DRIVER SCOPE
3bca5c1794b6 bridge bridge local
6ec1975c097b host host local
f6c66b4943e1 none null local
root@swarm-wrkr3-lxc:~#
But listing the networks on the manager, running in a virtual machine, showed:
1
2
3
4
5
6
7
8
9
10
root@swarm-mgr-vm1:~# docker network ls
NETWORK ID NAME DRIVER SCOPE
6cee66aec471 bridge bridge local
a7cd74d3722a docker_gwbridge bridge local
mq2gpjfvhgz0 exposed overlay swarm
b4ef8f4e3714 host host local
1mkakpah4inc ingress overlay swarm
fa853d2b4f5e none null local
7ozgltfb5q9j sample overlay swarm
xwyi8zbxy488 traefik overlay swarm
A significant number of networks were not being created. A docker info
on the worker showed why.
1
2
3
4
5
6
7
8
9
<SNIP>
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
I have ipv6 disabled on my proxmox hosts, so the ipv4 bit is the one causing problems. Docker needs iptables to route traffic appropriately and manage your network traffic. The problem is with the linux kernel, not anything I was doing (which was frustrating to find out now). The problem is out of my area of expertise, but the br_netfilter kernel module doesn’t handle namespaces correctly.
This issue in lxd is where I finally found someone describing it and the fix. That leads you here and this discussion in lkml. The short of it is theres a proposed solution, some back and forth on it, and more work to do.
So what now?
I have my swarm running VMs now. It works, traefik serves the traffic for all my hosts, I watch stuff with portainer, and just have a few services in there to do random things. I will continue to run it in my VM’s until the changes are merged into the kernl and migrate to LXC from there. Fortunately, I can add nodes and remove them as I scale services up and down. I’ll update this when the feature is available.