news 2026/6/10 10:58:59

【技术】一文看懂Kubernetes之Calico 网络实现(二)

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
【技术】一文看懂Kubernetes之Calico 网络实现(二)

【技术】一文看懂Kubernetes之Calico 网络实现(二)

📌 本系列文章主要探讨云计算领域Kubernetes中CNI Calico组件的架构以及网络实现,本文主要介绍calico的ipip网络模式下的通信实现

一、Calico 网络模式

模式数据包封装是否overlay
BGP
IPIP
VXLAN

二、关于 IPIP

ipip是Linux内核原生支持的一种三层隧道协议,全称为IPv4 in IPv4。其核心原理是在原始IPv4报文的基础上再封装一个IPv4报文头,从而实现报文在不同网络间的透明传输。

三、Calico之IPIP

查看当前calico 运行在哪个网络模式下:

# kubectl -n kube-system get ippool default-ipv4-ippool -o yaml apiVersion: crd.projectcalico.org/v1 kind: IPPool metadata: annotations: projectcalico.org/metadata: '{"creationTimestamp":"2025-12-09T07:33:15Z"}' creationTimestamp: "2025-12-09T07:33:15Z" generation: 1 name: default-ipv4-ippool resourceVersion: "941" uid: bfa1b297-4402-4fa7-bfdf-1e1dc629e2cd spec: allowedUses: - Workload - Tunnel blockSize: 26 cidr: 192.168.0.0/16 ipipMode: Always natOutgoing: true nodeSelector: all() vxlanMode: Never

根据 ipipMode: Always 可以看出,calico 这里安装后默认使用的ipip模式。

3.1. 同节点之间POD通信

同一node下 192.168.79.67 和 192.168.79.68 通信

# kubectl get pods -o wide -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default nginx-deployment-bf744486c-8ppjm 1/1 Running 0 5d17h 192.168.79.68 host-10-16-217-141 <none> <none> default nginx-deployment-bf744486c-cvhhz 1/1 Running 0 5d17h 192.168.79.67 host-10-16-217-141 <none> <none>
a. 查看网卡

查看 192.168.79.67的pod,对应宿主机 veth网卡 8: cali88e3e62ccbf@if3

# ip netns exec cni-d84e16a0-941d-7bca-80af-3e8914638e88 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 1a:39:0f:40:34:09 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.67/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::1839:fff:fe40:3409/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-d84e16a0-941d-7bca-80af-3e8914638e88 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link

查看 192.168.79.68的pod,对应宿主机veth网卡 9: califc335b22756@if3

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether ea:b2:2d:44:f1:73 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.68/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::e8b2:2dff:fe44:f173/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-6f3c9c59-8654-b2c4-756e-eca53e0db3b4 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link
b. 网络分析

由 192.168.79.67 的路由可以看出,pod 网卡出向路由直接送给了 169.254.1.1,这里的169.254.1.1是个假地址,因为pod网卡是通过veth到宿主机的caliXXX上,所以即使网关IP是假的,也能从eth0发出送到宿主机的caliXXX上。

[ Pod netns ] eth0 <====veth====> caliXXXX [ Node netns ]

所以,流量从pod 192.168.79.67 从eth0 出去后进入了宿主机,再经宿主机路由联通另外的本机pod

宿主机上对于本机pod有静态路由

# ip r 192.168.79.67 dev cali88e3e62ccbf scope link 192.168.79.68 dev califc335b22756 scope link

所以真实的网络通信路径是

POD1 ====veth====> caliXXX1 [ Node ] ====node===> caliXXX2 [ Node ] ====veth====> POD2
c. 抓包确认

在本宿主机 host-10-16-217-141 抓包,发现caliXXX网卡收到了pod的包

# tcpdump -i any -nnee icmp tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 10:58:25.977391 cali88e3e62ccbf In ifindex 8 1a:39:0f:40:34:09 ethertype IPv4 (0x0800), length 104: 192.168.79.67 > 192.168.79.68: ICMP echo request, id 4, seq 1, length 64 10:58:25.977492 califc335b22756 Out ifindex 9 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.67 > 192.168.79.68: ICMP echo request, id 4, seq 1, length 64 10:58:25.977515 califc335b22756 In ifindex 9 ea:b2:2d:44:f1:73 ethertype IPv4 (0x0800), length 104: 192.168.79.68 > 192.168.79.67: ICMP echo reply, id 4, seq 1, length 64 10:58:25.977534 cali88e3e62ccbf Out ifindex 8 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.68 > 192.168.79.67: ICMP echo reply, id 4, seq 1, length 64

综上,由抓包可以看出,流量和如上分析结果一致

3.2 跨节点之间POD通信

不同一node下 192.168.79.72 和 192.168.232.194 通信

# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-bf744486c-8ppjm 1/1 Running 0 5d17h 192.168.79.68 host-10-16-217-141 <none> <none> nginx-deployment-bf744486c-cvhhz 1/1 Running 0 5d17h 192.168.79.67 host-10-16-217-141 <none> <none> nginx-deployment2-fb46746f5-5w77x 1/1 Running 0 12s 192.168.79.72 host-10-16-217-141 <none> <none> nginx-deployment2-fb46746f5-cddm6 1/1 Running 0 9s 192.168.232.194 host-10-16-217-208 <none> <none> nginx-deployment2-fb46746f5-j2p6k 1/1 Running 0 28s 192.168.232.193 host-10-16-217-208 <none> <none>
a. 查看网卡

查看 192.168.79.72 的pod,对应宿主机 veth网卡 13: cali58644df6687@if3

# ip netns exec cni-338ada74-f1bd-25c7-a223-c0d2c0cbf7e1 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 5a:49:52:d3:d0:b7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.72/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5849:52ff:fed3:d0b7/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-338ada74-f1bd-25c7-a223-c0d2c0cbf7e1 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link

查看 192.168.232.194 的pod,对应宿主机 veth网卡 7: caliab6a8d8a743@if3

# ip netns exec cni-41842320-511e-c641-304f-1f58c4fdb0bf ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 4a:30:47:7d:15:c7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.232.194/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::4830:47ff:fe7d:15c7/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-41842320-511e-c641-304f-1f58c4fdb0bf ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link
b. 网络分析

和上文一样,pod 192.168.79.72 从eth0 出去进入宿主机 host-10-16-217-141之后,,再经宿主机 host-10-16-217-141 本机路由联通其他node的pod

宿主机host-10-16-217-141上对于目标192.168.232.194 命中静态路由

# ip r 192.168.232.192/26 via 172.22.3.64 dev tunl0 proto bird onlink

所以,真实的网络路径是:

POD1 ====veth====> caliXXX1 [ Node1 ] ====node1===> tunl0 [ Node1 ] ====ipip====> eth0 [ Node2 ]

包进入目标宿主机host-10-16-217-208之后

eth0 [ Node2 ] ===node2===> caliXXX2 [ Node2 ] ====veth====> POD2
c. 抓包确认

在 目标宿主机 host-10-16-217-208 抓包,发现eth0网卡收到了源宿主机host-10-16-217-141的ipip包

# tcpdump -i any -nnee proto 4 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 11:43:22.174218 eth0 In ifindex 2 fa:16:3e:46:0b:f2 ethertype IPv4 (0x0800), length 124: 172.22.1.229 > 172.22.3.64: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 259, length 64 11:43:22.174323 eth0 Out ifindex 2 fa:16:3e:cd:14:d4 ethertype IPv4 (0x0800), length 124: 172.22.3.64 > 172.22.1.229: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 259, length 64

经tunl0解封装后,经静态路由送给了caliXXX2网卡

192.168.232.194 dev caliab6a8d8a743 src 172.22.3.64 uid 0
11:39:07.198229 tunl0 In ifindex 3 ethertype IPv4 (0x0800), length 104: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 10, length 64 11:39:07.198313 caliab6a8d8a743 Out ifindex 7 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 10, length 64 11:39:07.198345 caliab6a8d8a743 In ifindex 7 4a:30:47:7d:15:c7 ethertype IPv4 (0x0800), length 104: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 10, length 64 11:39:07.198364 tunl0 Out ifindex 3 ethertype IPv4 (0x0800), length 104: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 10, length 64

综上,由抓包可以看出,流量和如上分析结果一致

3.3 POD和SVC通信

提到svc,那自然想到了kube-proxy,本文这里kube-proxy也是基于iptables实现的

kube-proxy 在 iptables 模式下,干三件事:

  1. 截获访问 Service IP 的流量
  2. 选择一个后端 Pod
  3. 做 DNAT

那么pod 192.168.79.72 和 比如 svc 172.16.0.10 是如何通信的

# kubectl get svc -o wide -A NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 20d <none> kube-system kube-dns ClusterIP 172.16.0.10 <none> 53/UDP,53/TCP,9153/TCP 20d k8s-app=kube-dns
a.如何截获访问 Service IP 的流量

通过iptables如下链依次截获流量

PREROUTING --> KUBE-SERVICES --> KUBE-SVC-xxxx --> KUBE-SEP-xxxx

查看 宿主机上的PREROUTING链信息

iptables -t nat -nvL PREROUTING Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 55182 2870K cali-PREROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */ 55284 2875K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */

POD流量经宿主机出去后,

Chain KUBE-SERVICES (2 references) pkts bytes target prot opt in out source destination 0 0 KUBE-SVC-TCOU7JCQXEZGVUNU udp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:dns cluster IP */ 0 0 KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ 0 0 KUBE-SVC-JD5MR3NA4I4DYORP tcp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:metrics cluster IP */ 0 0 KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- * * 0.0.0.0/0 172.16.0.1 /* default/kubernetes:https cluster IP */ 65 4547 KUBE-NODEPORTS all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

KUBE-SERVICES 链中,将去往 172.16.0.10的udp包转给了KUBE-SVC-TCOU7JCQXEZGVUNU 链

Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references) pkts bytes target prot opt in out source destination 0 0 KUBE-MARK-MASQ udp -- * * !192.168.0.0/16 172.16.0.10 /* kube-system/kube-dns:dns cluster IP */ 0 0 KUBE-SEP-V3WL5PSHR6KK4LJN all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns -> 192.168.34.193:53 */ statistic mode random probability 0.50000000000 0 0 KUBE-SEP-NJ5U6PSIJNX4FJ6P all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns -> 192.168.79.66:53 */

最终在 KUBE-SEP-V3WL5PSHR6KK4LJN 链中,一方面对POD将192.168.34.193 的回包做MARK从而能够在POSTROUTING中对其进行SNAT/MARSQUERADE,同时将包的目的地址和IP改成192.168.34.193:53

Chain KUBE-SEP-V3WL5PSHR6KK4LJN (1 references) pkts bytes target prot opt in out source destination 0 0 KUBE-MARK-MASQ all -- * * 192.168.34.193 0.0.0.0/0 /* kube-system/kube-dns:dns */ 0 0 DNAT udp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */ udp to:192.168.34.193:53
c. 抓包确认

因为本质POD访问SVC也是被DNAT给了POD IP,这个就和上面POD和POD的抓包结果一样,就不贴抓包结果了。

四、结尾

calico的网络能力,主要依赖于Linux内核的overlay网络封装能力,不论是ipip抑或是vxlan等,同时借助于iptables实现细致的隔离策略,不论在openstack还是kubernetes,都是借助Linux内核的能力,这也是内核态的通用解决方案了。

参考:

https://cloud.tencent.com/developer/article/2394273

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/4 15:38:27

全网最全2026本科生AI论文平台TOP10:开题报告文献综述必备

全网最全2026本科生AI论文平台TOP10&#xff1a;开题报告文献综述必备 2026年本科生AI论文平台测评&#xff1a;如何选择最适合你的写作助手 随着人工智能技术的不断进步&#xff0c;越来越多的本科生开始借助AI论文平台提升写作效率和学术质量。然而&#xff0c;面对市场上琳琅…

作者头像 李华
网站建设 2026/6/8 9:04:22

你还在堆上分配数组?,是时候了解C#栈内联数组了

第一章&#xff1a;你还在堆上分配数组&#xff1f;是时候了解C#栈内联数组了在高性能编程场景中&#xff0c;频繁的堆内存分配会带来显著的GC压力&#xff0c;影响应用响应速度。C# 提供了栈内联数组机制&#xff0c;允许开发者将小型数组直接分配在栈上&#xff0c;从而规避堆…

作者头像 李华
网站建设 2026/6/9 22:42:56

【C#高性能编程核心】:如何安全高效地使用Lambda闭包避免内存泄漏?

第一章&#xff1a;C# Lambda闭包的本质与内存泄漏风险Lambda表达式是C#中用于创建匿名函数的简洁语法&#xff0c;当其捕获外部作用域变量时&#xff0c;便形成了闭包。闭包通过编译器生成的隐藏类来持有对外部变量的引用&#xff0c;从而延长这些变量的生命周期。闭包的工作机…

作者头像 李华
网站建设 2026/6/8 8:48:42

AutoGPT自动化调度HeyGem:AI代理帮你完成每日视频任务

AutoGPT自动化调度HeyGem&#xff1a;AI代理帮你完成每日视频任务 在内容为王的时代&#xff0c;许多企业每天都面临一个看似简单却极其耗时的任务——制作固定格式的播报视频。比如金融公司要发布早盘分析&#xff0c;教育机构要推送课程预告&#xff0c;媒体团队需更新新闻简…

作者头像 李华
网站建设 2026/6/9 14:51:22

揭秘C#跨平台权限验证难题:5步实现统一身份授权

第一章&#xff1a;C#跨平台权限系统概述随着 .NET Core 和 .NET 5 的推出&#xff0c;C# 已成为真正意义上的跨平台开发语言&#xff0c;能够在 Windows、Linux 和 macOS 上运行相同的应用程序逻辑。在构建企业级应用时&#xff0c;权限管理是保障系统安全的核心模块。一个高效…

作者头像 李华
网站建设 2026/6/10 9:38:54

揭秘C# 12顶级语句性能瓶颈:3步实现代码执行效率翻倍

第一章&#xff1a;C# 12顶级语句性能瓶颈概述C# 12 引入的顶级语句简化了程序入口点的编写&#xff0c;开发者无需显式定义 Main 方法即可运行代码。尽管这一特性提升了开发效率与代码可读性&#xff0c;但在高性能或大型应用中&#xff0c;它可能引入潜在的性能瓶颈。隐式入口…

作者头像 李华