参数配置

简要概述

主要用于控制 iptables、ipvs 在宿主上的网络流量。

配置示例

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
  acceptContentTypes: ""
  burst: 10
  contentType: application/vnd.kubernetes.protobuf
  kubeconfig: "/var/lib/kube-proxy/kubeconfig.conf"
  qps: 5
clusterCIDR: 192.168.201.0/24
configSyncPeriod: 15m0s
conntrack:
  maxPerCore: 32768
  min: 131072
  tcpCloseWaitTimeout: 1h0m0s
  tcpEstablishedTimeout: 24h0m0s
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: 127.0.0.1:10256
hostnameOverride: ""
iptables:
  masqueradeAll: false
  masqueradeBit: 14
  minSyncPeriod: 1s
  syncPeriod: 30s
ipvs:
  excludeCIDRs: null
  minSyncPeriod: 0s
  scheduler: rr
  strictARP: false
  syncPeriod: 30s
  tcpFinTimeout: 0s
  tcpTimeout: 0s
  udpTimeout: 0s
metricsBindAddress: 127.0.0.1:10249
mode: "ipvs"
nodePortAddresses:
- 192.168.201.0/24
oomScoreAdj: -999
portRange: ""
showHiddenMetricsForVersion: ""
udpIdleTimeout: 250ms
winkernel:
  enableDSR: false
  networkName: ""
  sourceVip: ""

参数解析

除命令行参数:–config, –write-config-to, and –cleanup 外,其他均建议通过 KubeProxyConfiguration 配置。

  • 外部组件

–boot-id-file –log-backtrace-at –log-dir –log-file –log-file-max-size –log-flush-frequency –logtostderr –machine-id-file –one-output –profiling –skip-headers –skip-log-headers –stderrthreshold

  • 内置组件
名称 KubeProxyConfiguration 默认值 说明
add-dir-header - false 同 kubelet
alsologtostderr - false 同上
bind-address bindAddress 0.0.0.0 服务端监听的地址
bind-address-hard-fail bindAddressHardFail false 如果无法绑定端口,则将视为致命错误并退出
cleanup - false 清除 iptabels 与 ipvs 规则后退出
cluster-cidr clusterCIDR "" 集群中 Pod 的 CIDR 范围,配置后将从该范围之外发送到集群 IP 的流量被伪装,从 Pod 发送到外部负载均衡器 IP 的流量将被重定向到相应的集群 IP
config - "" 配置 KubeProxyConfiguration 文件的路径
config-sync-period configSyncPeriod 15m0s TODO; 来自 apiserver 的配置的刷新频率
conntrack-max-per-core conntrack.maxPerCore 32768 每个 CPU 核跟踪的最大 NAT 连接数(0 表示保留当前限制并忽略 conntrack-min 设置)
conntrack-min conntrack.min 131072 无论上面设置为多少,要分配的 conntrack 条目的最小数量(将 conntrack-max-per-core 设置为 0 即可保持当前的限制)
conntrack-tcp-timeout-close-wait conntrack.tcpCloseWaitTimeout 1h0m0s TODO; 处于 CLOSE_WAIT 状态的 TCP 连接的 NAT 超时
conntrack-tcp-timeout-established conntrack.tcpEstablishedTimeout 24h0m0s TODO; 已建立的 TCP 连接的空闲超时(0 保持当前设置)
detect-local-mode detectLocalMode "" TODO; 用于检测本地流量的模式,可配置:ClusterCIDR、NodeCIDR
feature-gates - "" 开启的特性
healthz-bind-address healthzBindAddress 0.0.0.0:10256 服务健康状态检查的 IP 地址和端口,设置为空表示禁用
hostname-override hostnameOverride "" 如果非空,将使用此字符串而不是实际的主机名作为标识
iptables-masquerade-bit iptables.masqueradeBit 14 TODO; 在使用纯 iptables 代理时,用来设置 fwmark 空间的 bit,标记需要 SNAT 的数据包。必须在 [0,31] 范围内。
iptables-min-sync-period iptables.minSyncPeriod 1s iptables 规则可以随着端点和服务的更改而刷新的最小间隔
iptables-sync-period iptables.syncPeriod 30s 刷新 iptables 规则的最大间隔(例如 ‘5s’、‘1m’、‘2h22m’)。必须大于 0。
ipvs-exclude-cidrs ipvs.excludeCIDRs "" 逗号分隔的 CIDR 列表,ipvs 代理在清理 IPVS 规则时不会此列表中的地址范围。
ipvs-min-sync-period ipvs.minSyncPeriod 0s ipvs 规则可以随着端点和服务的更改而刷新的最小间隔
ipvs-scheduler ipvs.scheduler "" ipvs 所选的调度器类型
ipvs-strict-arp ipvs.strictARP false 通过将 arp_ignore 设置为 1 并将 arp_announce 设置为 2 启用严格的 ARP
ipvs-sync-period ipvs.syncPeriod 30s 刷新 ipvs 规则的最大间隔(例如 ‘5s’、‘1m’、‘2h22m’)。必须大于 0
ipvs-tcp-timeout ipvs.tcpTimeout 0s 空闲 IPVS TCP 连接的超时时间,0 保持连接
ipvs-tcpfin-timeout ipvs.tcpFinTimeout 0s 收到 FIN 数据包后,IPVS TCP 连接的超时,0 保持当前设置不变
ipvs-udp-timeout ipvs.udpTimeout 0s IPVS UDP 数据包的超时,0 保持当前设置不变
kube-api-burst clientConnection.burst 10 与 kubernetes apiserver 通信的突发数量
kube-api-content-type clientConnection.contentType application/vnd.kubernetes.protobuf 发送到 apiserver 的请求的内容类型
kube-api-qps clientConnection.qps 5 与 kubernetes apiserver 交互时使用的 QPS
kubeconfig clientConnection.kubeconfig "" 连接 apiserver 的鉴权文件
masquerade-all iptabels.masqueradeAll false 如果使用纯 iptables 代理,则对通过服务集群 IP 发送的所有流量进行 SNAT(通常不需要)
master - "" Kubernetes API 服务器的地址,会覆盖 kubeconfig 中的值
metrics-bind-address metricsBindAddress 127.0.0.1:10249 性能数据 /metrics 的地址
nodeport-addresses nodePortAddresses "" 指定用于 NodePort 服务的地址,一个有效的 CIDR 地址段
oom-score-adj oomScoreAdj -999 kube-proxy 进程中的 oom-score-adj 值,必须在 [-1000,1000] 范围内
profiling enableProfiling false 如果为 true,则通过 Web 接口 /debug/pprof 启用性能分析
proxy-mode mode "" 四种代理模式:userspace, iptables, ipvs, kernelspace,默认为 iptables
proxy-port-range portRange 0-0 用来代理 nodePort 的主机端口范围,格式:beginPort-endPort,默认为 30000 - 32767
show-hidden-metrics-for-version showHiddenMetricsForVersion "" 要显示隐藏指标的先前版本,仅先前的次要版本有意义,不允许其他值,如:“1.16”
udp-timeout udpIdleTimeout 250ms 空闲 UDP 连接将保持打开的时长,必须大于 0,仅适用于 proxy-mode=userspace
write-config-to - - 将默认配置信息写入此文件并退出

数据结构

KubeProxyConfiguration

// KubeProxyConfiguration contains everything necessary to configure the
// Kubernetes proxy server.
type KubeProxyConfiguration struct {
    metav1.TypeMeta

    // featureGates is a map of feature names to bools that enable or disable alpha/experimental features.
    FeatureGates map[string]bool

    // bindAddress is the IP address for the proxy server to serve on (set to 0.0.0.0
    // for all interfaces)
    BindAddress string
    // healthzBindAddress is the IP address and port for the health check server to serve on,
    // defaulting to 0.0.0.0:10256
    HealthzBindAddress string
    // metricsBindAddress is the IP address and port for the metrics server to serve on,
    // defaulting to 127.0.0.1:10249 (set to 0.0.0.0 for all interfaces)
    MetricsBindAddress string
    // BindAddressHardFail, if true, kube-proxy will treat failure to bind to a port as fatal and exit
    BindAddressHardFail bool
    // enableProfiling enables profiling via web interface on /debug/pprof handler.
    // Profiling handlers will be handled by metrics server.
    EnableProfiling bool
    // clusterCIDR is the CIDR range of the pods in the cluster. It is used to
    // bridge traffic coming from outside of the cluster. If not provided,
    // no off-cluster bridging will be performed.
    ClusterCIDR string
    // hostnameOverride, if non-empty, will be used as the identity instead of the actual hostname.
    HostnameOverride string
    // clientConnection specifies the kubeconfig file and client connection settings for the proxy
    // server to use when communicating with the apiserver.
    ClientConnection componentbaseconfig.ClientConnectionConfiguration
    // iptables contains iptables-related configuration options.
    IPTables KubeProxyIPTablesConfiguration
    // ipvs contains ipvs-related configuration options.
    IPVS KubeProxyIPVSConfiguration
    // oomScoreAdj is the oom-score-adj value for kube-proxy process. Values must be within
    // the range [-1000, 1000]
    OOMScoreAdj *int32
    // mode specifies which proxy mode to use.
    Mode ProxyMode
    // portRange is the range of host ports (beginPort-endPort, inclusive) that may be consumed
    // in order to proxy service traffic. If unspecified (0-0) then ports will be randomly chosen.
    PortRange string
    // udpIdleTimeout is how long an idle UDP connection will be kept open (e.g. '250ms', '2s').
    // Must be greater than 0. Only applicable for proxyMode=userspace.
    UDPIdleTimeout metav1.Duration
    // conntrack contains conntrack-related configuration options.
    Conntrack KubeProxyConntrackConfiguration
    // configSyncPeriod is how often configuration from the apiserver is refreshed. Must be greater
    // than 0.
    ConfigSyncPeriod metav1.Duration
    // nodePortAddresses is the --nodeport-addresses value for kube-proxy process. Values must be valid
    // IP blocks. These values are as a parameter to select the interfaces where nodeport works.
    // In case someone would like to expose a service on localhost for local visit and some other interfaces for
    // particular purpose, a list of IP blocks would do that.
    // If set it to "127.0.0.0/8", kube-proxy will only select the loopback interface for NodePort.
    // If set it to a non-zero IP block, kube-proxy will filter that down to just the IPs that applied to the node.
    // An empty string slice is meant to select all network interfaces.
    NodePortAddresses []string
    // winkernel contains winkernel-related configuration options.
    Winkernel KubeProxyWinkernelConfiguration
    // ShowHiddenMetricsForVersion is the version for which you want to show hidden metrics.
    ShowHiddenMetricsForVersion string
    // DetectLocalMode determines mode to use for detecting local traffic, defaults to LocalModeClusterCIDR
    DetectLocalMode LocalMode
}

配置说明

clusterCIDR

如果 clusterCIDR 配置为 192.168.209.0/24,则会创建以下 iptables 规则:

-A KUBE-SERVICES ! -s 192.168.209.0/24 -m comment --comment "Kubernetes service cluster ip + port for masquerade purpose" -m set --match-set KUBE-CLUSTER-IP dst,dst -j KUBE-MARK-MASQ

conntrack

  • 数量限制

查看当前系统支持最大的 conntrack 数量

cat /proc/sys/net/netfilter/nf_conntrack_max

或者

sysctl -a | grep net.netfilter.nf_conntrack_max

当 conntrack.maxPerCore=32768 且为 2个cpu核心的系统,则最大 nf_conntrack_max = 32768 * 2 = 65536,然后这个不应该小于 conntrack.min 设置的条目。

  • 超时限制

其中 net.netfilter.nf_conntrack_tcp_timeout_close_wait 等待时间是让被动关闭方把该传的数据传完。如果程序写得不好,这里抛了未捕捉的异常,也许就走不到发 FIN 那步了,一直停在这里