Docker的原生overlay网络的实现原理
系统环境
manager node: CentOS Linux release 7.4.1708 (Core)
网站建设哪家好,找成都创新互联公司!专注于网页设计、网站建设、微信开发、微信小程序、集团企业网站建设等服务项目。为回馈新老客户创新互联还提供了岗巴免费建站欢迎大家使用!
workr node: CentOS Linux release 7.5.1804 (Core)
Docker版本信息
manager node: Docker version 18.09.4, build d14af54266
worker node: Docker version 19.03.1, build 74b1e89
Docker Swarm系统环境
manager node: 192.168.246.194
worker node: 192.168.246.195
创建 docker swarm集群前的网络
manager node:
# docker network ls
NETWOrk ID NAME DRIVER SCOPE
e01d59fe00e5 bridge bridge local
15426f623c37 host host local
dd5d570ac60e none null local
worker node:
# docker network ls
NETWOrk ID NAME DRIVER SCOPE
70ed15a24acd bridge bridge local
e2da5d928935 host host local
a7dbda3b96e8 none null local
创建 docker swarm 集群
初始化 docker swarm 集群
manager node执行: docker swarm init
worker node执行: docker swarm join --token SWMTKN-1-0p3g6ijmphmw5xrikh9e3asg5n3yzan0eomnsx1xuvkovvgfsp-enrmg2lj1dejg5igmnpoaywr1 192.168.246.194:2377
说明⚠️:
如果遗忘了docker swarm join的命令,可以使用下面命令查找:
(1)对于 worker 节点:docker swarm join-token worker
(2)对于 manager 节点:docker swarm join-token manager
查看集群节点信息
manager node:
# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
hplz9lawfpjx6fpz0j1bevocp MyTest03 Ready Active 19.03.1
q5af6b67bmho8z0d7**m2yy5j * MySQL-nginx Ready Active Leader 18.09.4
查看集群网络信息
manager node:
# docker network ls
NETWOrk ID NAME DRIVER SCOPE
e01d59fe00e5 bridge bridge local
7c90d1bf0f62 docker_gwbridge bridge local
15426f623c37 host host local
8lyfiluksqu0 ingress overlay swarm
dd5d570ac60e none null local
worker node:
# docker network ls
NETWOrk ID NAME DRIVER SCOPE
70ed15a24acd bridge bridge local
985367037d3b docker_gwbridge bridge local
e2da5d928935 host host local
8lyfiluksqu0 ingress overlay swarm
a7dbda3b96e8 none null local
说明⚠️:
在docker swarm集群创建的开始,docker 会给每台host创建除了docker0以外的两个网络,分是bridge类型(docker_gwbridge网桥
)和overlay类型(ingress
)的网络,以及一个过渡的命名空间ingress_sbox
,我们可以使用如下命令在 manager节点自建overlay网络,结果如下:
docker network create -d overlay uber-svc
再次查看 manager 和 worker 两台主机 docker swarm 集群网络:
manager node:
# docker network ls
NETWOrk ID NAME DRIVER SCOPE
e01d59fe00e5 bridge bridge local
7c90d1bf0f62 docker_gwbridge bridge local
15426f623c37 host host local
8lyfiluksqu0 ingress overlay swarm
dd5d570ac60e none null local
kzxwwwtunpqe uber-svc overlay swarm ===> 这个 network 就是我们刚新建的 uber-svc
worker node:
# docker network ls
NETWOrk ID NAME DRIVER SCOPE
70ed15a24acd bridge bridge local
985367037d3b docker_gwbridge bridge local
e2da5d928935 host host local
8lyfiluksqu0 ingress overlay swarm
a7dbda3b96e8 none null local
说明⚠️:
我们会发现在 worker node上并没有 uber-svc 网络。这是因为只有当运行中的容器连接到覆盖网络的时候,该网络才变为可用状态。这种延迟生效策略通过减少网络梳理,提升了网络的扩展性。
查看网络命名空间信息
manager node:
# ip netns
1-8lyfiluksq (id: 0)
ingress_sbox (id: 1)
worker node:
# ip netns
1-8lyfiluksq (id: 0)
ingress_sbox (id: 1)
说明⚠️:
(1)由于容器和overlay的网络的网络命名空间文件不再操作系统默认的/var/run/netns下,只能手动通过软连接的方式查看。ln -s /var/run/docker/netns /var/run/netns
。
(2)有时候网络的网络命名空间名称前面会带上1-、2-等序号,有时候不带。但不影响网络的通信和操作。
查看网络IPAM(IP Address Management)信息
(1)ingress网络的IPAM( IP Address Management)分配如下:
manager node 和 worker node 是相同的:
# docker network inspect ingress
[
{
"Name": "ingress",
"Id": "8lyfiluksqu09jfdjndhj68hl",
"Created": "2019-09-09T17:59:06.326723762+08:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.255.0.0/16", ===> ingress子网
"Gateway": "10.255.0.1" ===> ingress网关
}
(2)uber-svc自建的overlay会使用docker自动分配的IPAM:
# docker network inspect uber-svc
[
{
"Name": "uber-svc",
"Id": "kzxwwwtunpqeucnrhmirg6rhm",
"Created": "2019-09-09T10:14:06.606521342Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24", ===> uber-svc子网
"Gateway": "10.0.0.1" ===> uber-svc网关
}
Docker swarm 中的LB分为两种情况
(1)Ingress Load Balancing
(2)Internal Load Balancing
说明⚠️:我们本节重点聊聊 LB 的第二种情况,即Internal Load Balancing~
定义 shell 脚本
在开始下面的实践之前,我们先编辑以下两个脚本。对于脚本的使用,我会给出具体实例~
第一个脚本docker_netns.sh:
#!/bin/bash
NAMESPACE=$1
if [[ -z $NAMESPACE ]];then
ls -1 /var/run/docker/netns/
exit 0
fi
NAMESPACE_FILE=/var/run/docker/netns/${NAMESPACE}
if [[ ! -f $NAMESPACE_FILE ]];then
NAMESPACE_FILE=$(docker inspect -f "{{.NetworkSettings.SandboxKey}}" $NAMESPACE 2>/dev/null)
fi
if [[ ! -f $NAMESPACE_FILE ]];then
echo "Cannot open network namespace '$NAMESPACE': No such file or directory"
exit 1
fi
shift
if [[ $# -lt 1 ]]; then
echo "No command specified"
exit 1
fi
nsenter --net=${NAMESPACE_FILE} $@
说明⚠️:
(1)该脚本通过指定容器id、name或者namespace快速进入容器的network namespace并执行相应的shell命令。
(2)如果不指定任何参数,则列举所有Docker容器相关的network namespaces。
执行脚本结果如下:
# sh docker_netns.sh ==> 列出所有的网络命名空间
1-ycqv46f5tl
8402c558c13c
ingress_sbox
# sh docker_netns.sh deploy_nginx_nginx_1 ip r ==> 进入查看名为deploy_nginx_nginx_1容器ip信息
default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2
# sh docker_netns.sh 8402c558c13c ip r ==> 进入和查看网络命名空间为8402c558c13c容器ip信息
default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2
第二个脚本find_links.sh:
#!/bin/bash
DOCKER_NETNS_SCRIPT=./docker_netns.sh
IFINDEX=$1
if [[ -z $IFINDEX ]];then
for namespace in $($DOCKER_NETNS_SCRIPT);do
printf "\e[1;31m%s:\e[0m" $namespace
$DOCKER_NETNS_SCRIPT $namespace ip -c -o link
printf " "
done
else
for namespace in $($DOCKER_NETNS_SCRIPT);do
if $DOCKER_NETNS_SCRIPT $namespace ip -c -o link | grep -Pq "^$IFINDEX: ";then
printf "\e[1;31m%s:\e[0m" $namespace
$DOCKER_NETNS_SCRIPT $namespace ip -c -o link | grep -P "^$IFINDEX: ";
printf " "
fi
done
fi
该脚本根据ifindex查找虚拟网络设备所在的namespace,脚本不同情况下执行结果如下:
# sh find_links.sh ==> 不指定ifindex,则列出所有namespaces的link设备。
# sh find_links.sh
1-3gt8phomoc:1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ip_vti0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1\ link/ipip 0.0.0.0 brd 0.0.0.0
3: br0: mtu 1450 qdisc noqueue state UP mode DEFAULT group default \ link/ether e6:c5:04:ad:7b:31 brd ff:ff:ff:ff:ff:ff
74: vxlan0: mtu 1450 qdisc noqueue master br0 state UNKNOWN mode DEFAULT group default \ link/ether e6:c5:04:ad:7b:31 brd ff:ff:ff:ff:ff:ff link-netnsid 0
76: veth0@if75: mtu 1450 qdisc noqueue master br0 state UP mode DEFAULT group default \ link/ether e6:fa:db:53:40:fd brd ff:ff:ff:ff:ff:ff link-netnsid 1
ingress_sbox:1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ip_vti0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1\ link/ipip 0.0.0.0 brd 0.0.0.0
75: eth0@if76: mtu 1450 qdisc noqueue state UP mode DEFAULT group default \ link/ether 02:42:0a:ff:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
78: eth2@if79: mtu 1500 qdisc noqueue state UP mode DEFAULT group default \ link/ether 02:42:ac:14:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1
# sh find_links.sh 76 ==> 指定ifindex=76
1-3gt8phomoc:76: veth0@if75: mtu 1450 qdisc noqueue master br0 state UP mode DEFAULT group default \ link/ether e6:fa:db:53:40:fd brd ff:ff:ff:ff:ff:ff link-netnsid 1
实战 -- Internal Load Balancing
部署一个 service 使用我们自己创建的 uber-svc 网络
docker service create --name uber-svc --network uber-svc -p 80:80 --replicas 2 nigelpoulton/tu-demo:v1
部署的这两个容器分别处于 manager 和 worker 节点上:
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
pfnme5ytk59w uber-svc replicated 2/2 nigelpoulton/tu-demo:v1 *:80->80/tcp
# docker service ps uber-svc
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
kh8zs9a2umwf uber-svc.1 nigelpoulton/tu-demo:v1 mysql-nginx Running Running 57 seconds ago
31p0rgg1f59w uber-svc.2 nigelpoulton/tu-demo:v1 MyTest03 Running Running 49 seconds ago
说明⚠️:
-p
当然你也可以使用--publish
代替-p
,在这里的用意是将容器内部的服务暴露到host上,这样我们就可以访问这个services。
一般情况下我们在swarm中部署service后容器中的网络只有一张网卡使用的是docker0网络,当我们将服务发布出去后,swarm会做如下操作:
(1)给容器添加三块网卡eth0和eth2,eth3,eth0连接overlay类型网络名为ingress用于在不同主机间通信,eth2连接bridge类网络名为docker_gwbridge,用于让容器能访问外网。eth3连接到我们自己创建的mynet网络上,同样的作用也是用于容器之间的访问(区别于eth3网络存在DNS解析即服务发现功能
)。
(2)swarm各节点会利用ingress overlay网络负载均衡将服务发布到集群之外。
查看 uber-svc.1 容器和 uber-svc 网络命名空间的网卡情况
(1)先查看 uber-svc.1 容器
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a2a763734e42 nigelpoulton/tu-demo:v1 "python app.py" About a minute ago Up About a minute 80/tcp uber-svc.1.kh8zs9a2umwf9cix381zr9x38
(2)查看 uber-svc.1 容器中网卡情况
# sh docker_netns.sh uber-svc.1.kh8zs9a2umwf9cix381zr9x38 ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ip_vti0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
54: eth0@if55: mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:ff:00:05 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.255.0.5/16 brd 10.255.255.255 scope global eth0
valid_lft forever preferred_lft forever
56: eth3@if57: mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:13:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet 172.19.0.3/16 brd 172.19.255.255 scope global eth3
valid_lft forever preferred_lft forever
58: eth2@if59: mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:00:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 10.0.0.3/24 brd 10.0.0.255 scope global eth2
valid_lft forever preferred_lft forever
当然,你也可以直接使用下面命令查看:
docker exec uber-svc.1.kh8zs9a2umwf9cix381zr9x38 ip addr
(3)查看 uber-svc 网络命名空间的网卡
# ip netns ==> 查看 manager 网络命名空间
d2feb68e3183 (id: 3)
1-kzxwwwtunp (id: 2)
lb_kzxwwwtun
1-8lyfiluksq (id: 0)
ingress_sbox (id: 1)
# docker network ls ==> 查看 manager 集群网络
NETWOrk ID NAME DRIVER SCOPE
e01d59fe00e5 bridge bridge local
7c90d1bf0f62 docker_gwbridge bridge local
15426f623c37 host host local
8lyfiluksqu0 ingress overlay swarm
dd5d570ac60e none null local
kzxwwwtunpqe uber-svc overlay swarm
sh docker_netns.sh 1-kzxwwwtunp ip addr ==> 查看 uber-svc 网络命名空间的网卡
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ip_vti0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
3: br0: mtu 1450 qdisc noqueue state UP group default
link/ether 3e:cb:12:d3:a3:cb brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1/24 brd 10.0.0.255 scope global br0
valid_lft forever preferred_lft forever
51: vxlan0: mtu 1450 qdisc noqueue master br0 state UNKNOWN group default
link/ether e2:8e:35:4c:a3:7b brd ff:ff:ff:ff:ff:ff link-netnsid 0
53: veth0@if52: mtu 1450 qdisc noqueue master br0 state UP group default
link/ether 3e:cb:12:d3:a3:cb brd ff:ff:ff:ff:ff:ff link-netnsid 1
59: veth2@if58: mtu 1450 qdisc noqueue master br0 state UP group default
link/ether 9e:b4:8c:72:4e:74 brd ff:ff:ff:ff:ff:ff link-netnsid 2
当然,你也可以使用下面的命令:
ip netns exec 1-kzxwwwtunp ip addr
# ip netns exec 1-kzxwwwtunp brctl show ==> 查看 uber-svc 网络命名空间的接口情况
bridge name bridge id STP enabled interfaces
br0 8000.3ecb12d3a3cb no veth0
veth2
vxlan0
说明⚠️:
<1> docker exec uber-svc.1.kh8zs9a2umwf9cix381zr9x38 ip addr
这条命令可以看到 manager 节点上容器的网络有四张网卡,分别是:lo、eth0、eth2 和 eth3。
其中,eth2 对应的 veth pair为 uber-svc 网络中的veth2,eth3 对应的 veth pair为 host 上的vethef74971。
<2> ip netns exec 1-kzxwwwtunp brctl show
查看 uber-svc 网络空间下网桥挂载情况可以看出veth2挂到了br0网桥上.
(4)查看 uber-svc 网络的vxlan-id
ip netns exec 1-kzxwwwtunp ip -o -c -d link show vxlan0
***** vxlan id 4097 *****
uber-svc 网络命名空间与 service 容器之间的网络连接图
获取 ingress 命名空间信息
主要步骤如下:
(1)获取 ingress 的network信息
# docker network ls
NETWOrk ID NAME DRIVER SCOPE
8lyfiluksqu0 ingress overlay swarm
(2)获取取 ingress 的命名空间信息
# ip netns
1-8lyfiluksq (id: 0)
(3)获取 ingress 的命名空间中ip信息
# sh docker_netns.sh 1-8lyfiluksq ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ip_vti0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
3: br0: mtu 1450 qdisc noqueue state UP group default
link/ether 6e:5c:bd:c0:95:ea brd ff:ff:ff:ff:ff:ff
inet 10.255.0.1/16 brd 10.255.255.255 scope global br0
valid_lft forever preferred_lft forever
45: vxlan0: mtu 1450 qdisc noqueue master br0 state UNKNOWN group default
link/ether e6:f3:7a:00:85:e1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
47: veth0@if46: mtu 1450 qdisc noqueue master br0 state UP group default
link/ether fa:98:37:aa:83:2a brd ff:ff:ff:ff:ff:ff link-netnsid 1
55: veth2@if54: mtu 1450 qdisc noqueue master br0 state UP group default
link/ether 6e:5c:bd:c0:95:ea brd ff:ff:ff:ff:ff:ff link-netnsid 2
(4)获取 ingress 的命名空间中vxlan0的ID信息
# sh docker_netns.sh 1-8lyfiluksq ip -d link show vxlan0
***** vxlan id 4096 *****
(5)获取 ingress 的命名空间中对应 veth pair 信息
# sh find_links.sh 46
ingress_sbox:46: eth0@if47: mtu 1450 qdisc noqueue state UP mode DEFAULT group default \ link/ether 02:42:0a:ff:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
ingress 网络命名空间与 service 容器之间的网络连接图
获取 ingress_sbox 网络命名空间信息
主要步骤如下:
(1)获取 ingress_sbox 的ip信息
# sh docker_netns.sh ingress_sbox ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ip_vti0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
46: eth0@if47: mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:ff:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.255.0.2/16 brd 10.255.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet 10.255.0.4/32 brd 10.255.0.4 scope global eth0
valid_lft forever preferred_lft forever
49: eth2@if50: mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 172.19.0.2/16 brd 172.19.255.255 scope global eth2
valid_lft forever preferred_lft forever
(2)获取 ingress_sbox 的veth pair 接口信息
# sh find_links.sh 47
1-8lyfiluksq:47: veth0@if46: mtu 1450 qdisc noqueue master br0 state UP mode DEFAULT group default \ link/ether fa:98:37:aa:83:2a brd ff:ff:ff:ff:ff:ff link-netnsid 1
(3)获取 manager 主机上veth pair 接口信息
# ip link show
1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens37: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:25:8b:ac brd ff:ff:ff:ff:ff:ff
3: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:cf:31:ee:03 brd ff:ff:ff:ff:ff:ff
14: ip_vti0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
48: docker_gwbridge: mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:9c:aa:15:e6 brd ff:ff:ff:ff:ff:ff
50: vetheaa661b@if49: mtu 1500 qdisc noqueue master docker_gwbridge state UP mode DEFAULT group default
link/ether 8a:3e:01:ab:db:75 brd ff:ff:ff:ff:ff:ff link-netnsid 1
57: vethef74971@if56: mtu 1500 qdisc noqueue master docker_gwbridge state UP mode DEFAULT group default
link/ether 82:5c:65:e1:9c:e8 brd ff:ff:ff:ff:ff:ff link-netnsid 3
ingress 网络命名空间与 ingree_sbox 网络命名空间之间的网络连接图
说明:swarm worker 节点上的情况与 manager 基本思路一样~
Swarm 总体的网络连接图
说明⚠️:
(1)可以看到这里ingress_sbox和创建容器的ns共用一个ingress网络空间。
(2)通过使用docker exec [container ID/name] ip r
会更加直观的看到网络流动情况,如下:
# docker exec uber-svc.1.kh8zs9a2umwf9cix381zr9x38 ip r
default via 172.19.0.1 dev eth3
10.0.0.0/24 dev eth2 proto kernel scope link src 10.0.0.3
10.255.0.0/16 dev eth0 proto kernel scope link src 10.255.0.5
172.19.0.0/16 dev eth3 proto kernel scope link src 172.19.0.3
由此可知容器默认网关为172.19.0.1,也就是说容器是通过eth3出去的~
最后
关于 Docker Swarm 底层网络问题还有很多的知识点需要去探究,本节对最近学习到的docker network 做了一个基础总结,如有错误或不足,请各位大佬指正,感谢!
另:参考文档如有侵权,请及时与我联系,立删~。
最后,感谢开源,拥抱开源!
参考文档
(1)Docker swarm中的LB和服务发现详解
(2)万字长文:聊聊几种主流Docker网络的实现原理
(3)Docker跨主机网络——overlay
(4)Docker 跨主机网络 overlay(十六)
(5)Docker overlay覆盖网络及VXLAN详解
当前文章:Docker的原生overlay网络的实现原理
本文路径:http://scjbc.cn/article/ieiggo.html