参数配置
5 分钟阅读
简要概述
主要用于控制 iptables、ipvs 在宿主上的网络流量。
配置示例
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: "/var/lib/kube-proxy/kubeconfig.conf"
qps: 5
clusterCIDR: 192.168.201.0/24
configSyncPeriod: 15m0s
conntrack:
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: 127.0.0.1:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 1s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: rr
strictARP: false
syncPeriod: 30s
tcpFinTimeout: 0s
tcpTimeout: 0s
udpTimeout: 0s
metricsBindAddress: 127.0.0.1:10249
mode: "ipvs"
nodePortAddresses:
- 192.168.201.0/24
oomScoreAdj: -999
portRange: ""
showHiddenMetricsForVersion: ""
udpIdleTimeout: 250ms
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
参数解析
除命令行参数:–config, –write-config-to, and –cleanup 外,其他均建议通过 KubeProxyConfiguration 配置。
- 外部组件
–boot-id-file –log-backtrace-at –log-dir –log-file –log-file-max-size –log-flush-frequency –logtostderr –machine-id-file –one-output –profiling –skip-headers –skip-log-headers –stderrthreshold
- 内置组件
名称 | KubeProxyConfiguration | 默认值 | 说明 |
---|---|---|---|
add-dir-header | - | false | 同 kubelet |
alsologtostderr | - | false | 同上 |
bind-address | bindAddress | 0.0.0.0 | 服务端监听的地址 |
bind-address-hard-fail | bindAddressHardFail | false | 如果无法绑定端口,则将视为致命错误并退出 |
cleanup | - | false | 清除 iptabels 与 ipvs 规则后退出 |
cluster-cidr | clusterCIDR | "" | 集群中 Pod 的 CIDR 范围,配置后将从该范围之外发送到集群 IP 的流量被伪装,从 Pod 发送到外部负载均衡器 IP 的流量将被重定向到相应的集群 IP |
config | - | "" | 配置 KubeProxyConfiguration 文件的路径 |
config-sync-period | configSyncPeriod | 15m0s | TODO; 来自 apiserver 的配置的刷新频率 |
conntrack-max-per-core | conntrack.maxPerCore | 32768 | 每个 CPU 核跟踪的最大 NAT 连接数(0 表示保留当前限制并忽略 conntrack-min 设置) |
conntrack-min | conntrack.min | 131072 | 无论上面设置为多少,要分配的 conntrack 条目的最小数量(将 conntrack-max-per-core 设置为 0 即可保持当前的限制) |
conntrack-tcp-timeout-close-wait | conntrack.tcpCloseWaitTimeout | 1h0m0s | TODO; 处于 CLOSE_WAIT 状态的 TCP 连接的 NAT 超时 |
conntrack-tcp-timeout-established | conntrack.tcpEstablishedTimeout | 24h0m0s | TODO; 已建立的 TCP 连接的空闲超时(0 保持当前设置) |
detect-local-mode | detectLocalMode | "" | TODO; 用于检测本地流量的模式,可配置:ClusterCIDR、NodeCIDR |
feature-gates | - | "" | 开启的特性 |
healthz-bind-address | healthzBindAddress | 0.0.0.0:10256 | 服务健康状态检查的 IP 地址和端口,设置为空表示禁用 |
hostname-override | hostnameOverride | "" | 如果非空,将使用此字符串而不是实际的主机名作为标识 |
iptables-masquerade-bit | iptables.masqueradeBit | 14 | TODO; 在使用纯 iptables 代理时,用来设置 fwmark 空间的 bit,标记需要 SNAT 的数据包。必须在 [0,31] 范围内。 |
iptables-min-sync-period | iptables.minSyncPeriod | 1s | iptables 规则可以随着端点和服务的更改而刷新的最小间隔 |
iptables-sync-period | iptables.syncPeriod | 30s | 刷新 iptables 规则的最大间隔(例如 ‘5s’、‘1m’、‘2h22m’)。必须大于 0。 |
ipvs-exclude-cidrs | ipvs.excludeCIDRs | "" | 逗号分隔的 CIDR 列表,ipvs 代理在清理 IPVS 规则时不会此列表中的地址范围。 |
ipvs-min-sync-period | ipvs.minSyncPeriod | 0s | ipvs 规则可以随着端点和服务的更改而刷新的最小间隔 |
ipvs-scheduler | ipvs.scheduler | "" | ipvs 所选的调度器类型 |
ipvs-strict-arp | ipvs.strictARP | false | 通过将 arp_ignore 设置为 1 并将 arp_announce 设置为 2 启用严格的 ARP |
ipvs-sync-period | ipvs.syncPeriod | 30s | 刷新 ipvs 规则的最大间隔(例如 ‘5s’、‘1m’、‘2h22m’)。必须大于 0 |
ipvs-tcp-timeout | ipvs.tcpTimeout | 0s | 空闲 IPVS TCP 连接的超时时间,0 保持连接 |
ipvs-tcpfin-timeout | ipvs.tcpFinTimeout | 0s | 收到 FIN 数据包后,IPVS TCP 连接的超时,0 保持当前设置不变 |
ipvs-udp-timeout | ipvs.udpTimeout | 0s | IPVS UDP 数据包的超时,0 保持当前设置不变 |
kube-api-burst | clientConnection.burst | 10 | 与 kubernetes apiserver 通信的突发数量 |
kube-api-content-type | clientConnection.contentType | application/vnd.kubernetes.protobuf | 发送到 apiserver 的请求的内容类型 |
kube-api-qps | clientConnection.qps | 5 | 与 kubernetes apiserver 交互时使用的 QPS |
kubeconfig | clientConnection.kubeconfig | "" | 连接 apiserver 的鉴权文件 |
masquerade-all | iptabels.masqueradeAll | false | 如果使用纯 iptables 代理,则对通过服务集群 IP 发送的所有流量进行 SNAT(通常不需要) |
master | - | "" | Kubernetes API 服务器的地址,会覆盖 kubeconfig 中的值 |
metrics-bind-address | metricsBindAddress | 127.0.0.1:10249 | 性能数据 /metrics 的地址 |
nodeport-addresses | nodePortAddresses | "" | 指定用于 NodePort 服务的地址,一个有效的 CIDR 地址段 |
oom-score-adj | oomScoreAdj | -999 | kube-proxy 进程中的 oom-score-adj 值,必须在 [-1000,1000] 范围内 |
profiling | enableProfiling | false | 如果为 true,则通过 Web 接口 /debug/pprof 启用性能分析 |
proxy-mode | mode | "" | 四种代理模式:userspace, iptables, ipvs, kernelspace,默认为 iptables |
proxy-port-range | portRange | 0-0 | 用来代理 nodePort 的主机端口范围,格式:beginPort-endPort,默认为 30000 - 32767 |
show-hidden-metrics-for-version | showHiddenMetricsForVersion | "" | 要显示隐藏指标的先前版本,仅先前的次要版本有意义,不允许其他值,如:“1.16” |
udp-timeout | udpIdleTimeout | 250ms | 空闲 UDP 连接将保持打开的时长,必须大于 0,仅适用于 proxy-mode=userspace |
write-config-to | - | - | 将默认配置信息写入此文件并退出 |
数据结构
KubeProxyConfiguration
// KubeProxyConfiguration contains everything necessary to configure the
// Kubernetes proxy server.
type KubeProxyConfiguration struct {
metav1.TypeMeta
// featureGates is a map of feature names to bools that enable or disable alpha/experimental features.
FeatureGates map[string]bool
// bindAddress is the IP address for the proxy server to serve on (set to 0.0.0.0
// for all interfaces)
BindAddress string
// healthzBindAddress is the IP address and port for the health check server to serve on,
// defaulting to 0.0.0.0:10256
HealthzBindAddress string
// metricsBindAddress is the IP address and port for the metrics server to serve on,
// defaulting to 127.0.0.1:10249 (set to 0.0.0.0 for all interfaces)
MetricsBindAddress string
// BindAddressHardFail, if true, kube-proxy will treat failure to bind to a port as fatal and exit
BindAddressHardFail bool
// enableProfiling enables profiling via web interface on /debug/pprof handler.
// Profiling handlers will be handled by metrics server.
EnableProfiling bool
// clusterCIDR is the CIDR range of the pods in the cluster. It is used to
// bridge traffic coming from outside of the cluster. If not provided,
// no off-cluster bridging will be performed.
ClusterCIDR string
// hostnameOverride, if non-empty, will be used as the identity instead of the actual hostname.
HostnameOverride string
// clientConnection specifies the kubeconfig file and client connection settings for the proxy
// server to use when communicating with the apiserver.
ClientConnection componentbaseconfig.ClientConnectionConfiguration
// iptables contains iptables-related configuration options.
IPTables KubeProxyIPTablesConfiguration
// ipvs contains ipvs-related configuration options.
IPVS KubeProxyIPVSConfiguration
// oomScoreAdj is the oom-score-adj value for kube-proxy process. Values must be within
// the range [-1000, 1000]
OOMScoreAdj *int32
// mode specifies which proxy mode to use.
Mode ProxyMode
// portRange is the range of host ports (beginPort-endPort, inclusive) that may be consumed
// in order to proxy service traffic. If unspecified (0-0) then ports will be randomly chosen.
PortRange string
// udpIdleTimeout is how long an idle UDP connection will be kept open (e.g. '250ms', '2s').
// Must be greater than 0. Only applicable for proxyMode=userspace.
UDPIdleTimeout metav1.Duration
// conntrack contains conntrack-related configuration options.
Conntrack KubeProxyConntrackConfiguration
// configSyncPeriod is how often configuration from the apiserver is refreshed. Must be greater
// than 0.
ConfigSyncPeriod metav1.Duration
// nodePortAddresses is the --nodeport-addresses value for kube-proxy process. Values must be valid
// IP blocks. These values are as a parameter to select the interfaces where nodeport works.
// In case someone would like to expose a service on localhost for local visit and some other interfaces for
// particular purpose, a list of IP blocks would do that.
// If set it to "127.0.0.0/8", kube-proxy will only select the loopback interface for NodePort.
// If set it to a non-zero IP block, kube-proxy will filter that down to just the IPs that applied to the node.
// An empty string slice is meant to select all network interfaces.
NodePortAddresses []string
// winkernel contains winkernel-related configuration options.
Winkernel KubeProxyWinkernelConfiguration
// ShowHiddenMetricsForVersion is the version for which you want to show hidden metrics.
ShowHiddenMetricsForVersion string
// DetectLocalMode determines mode to use for detecting local traffic, defaults to LocalModeClusterCIDR
DetectLocalMode LocalMode
}
配置说明
clusterCIDR
如果 clusterCIDR 配置为 192.168.209.0/24,则会创建以下 iptables 规则:
-A KUBE-SERVICES ! -s 192.168.209.0/24 -m comment --comment "Kubernetes service cluster ip + port for masquerade purpose" -m set --match-set KUBE-CLUSTER-IP dst,dst -j KUBE-MARK-MASQ
conntrack
- 数量限制
查看当前系统支持最大的 conntrack 数量
cat /proc/sys/net/netfilter/nf_conntrack_max
或者
sysctl -a | grep net.netfilter.nf_conntrack_max
当 conntrack.maxPerCore=32768 且为 2个cpu核心的系统,则最大 nf_conntrack_max = 32768 * 2 = 65536,然后这个不应该小于 conntrack.min 设置的条目。
- 超时限制
其中 net.netfilter.nf_conntrack_tcp_timeout_close_wait 等待时间是让被动关闭方把该传的数据传完。如果程序写得不好,这里抛了未捕捉的异常,也许就走不到发 FIN 那步了,一直停在这里
最后修改 2023.02.08: feat: 更新k8s基础组件结构 (fadf0dc)