清理kubernetes中未正常退出的pod

长时间运行的k8s节点可能会存在某些pod不自动退出,一直处于Terminating的状态
于是我们可以用这个脚本定时进行清理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/bash
#############################
### clean terminated pods ###
### run at you own risk ! ###
#############################
export PATH=/usr/local/cfssl/bin:/usr/local/docker/:/usr/local/kubernetes/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
getns(){
namespaces=`kubectl get namespaces|grep -v "NAME"|awk '{print $1}'`
for n in ${namespaces};
do
pods_str=`kubectl get pods -n ${n}|grep "Terminating"`
IFS=$'\n' read -rd '' -a pods <<<"$pods_str"
if [ -n "$pods" ]; then
getpod ${n} $pods;
fi
done
}
getpod(){
ns=$1;
for podinfo in $2;
do
pod=`echo $podinfo|awk '{print $1}'`
delpod $pod $ns;
done
}
delpod(){
echo "kubectl delete pods $1 -n $2 --grace-period=0 --force"
kubectl delete pods $1 -n $2 --grace-period=0 --force
}
main(){
getns
}
main

自动清理k8s中的容器、卷、镜像

镜像源码 https://github.com/meltwater/docker-cleanup

注意:这个镜像会将所有已经退出的容器、未使用的镜像和data-only的容器,除非你将他们加到保存的变量中。注意正确配置docker api的版本,以免删除所有的镜像。小心挂载 /var/lib/docker,因为如果挂载后没有使用的话,也会被当作未使用的卷删掉。

支持的变量

  • CLEAN_PERIOD=1800 - Interval in seconds to sleep after completing a cleaning run. Defaults to 1800 seconds = 30 minutes.
  • DELAY_TIME=1800 - Seconds to wait before removing exited containers and unused images. Defaults to 1800 seconds = 30 minutes.
  • KEEP_IMAGES - List of images to avoid cleaning, e.g. “ubuntu:trusty, ubuntu:latest”. Defaults to clean all unused images.
  • KEEP_CONTAINERS - List of images for exited or dead containers to avoid cleaning, e.g. “ubuntu:trusty, ubuntu:latest”.
  • KEEP_CONTAINERS_NAMED - List of names for exited or dead containers to avoid cleaning, e.g. “my-container1, persistent-data”.
  • LOOP - Add the ability to do non-looped cleanups, run it once and exit. Options are true, false. Defaults to true to run it forever in loops.
  • DEBUG - Set to 1 to enable more debugging output on pattern matches
  • DOCKER_API_VERSION - The docker API version to use. This defaults to 1.20, but you can override it here in case the docker version on your host differs from the one that is installed in this container. You can find - this on your host system by running docker version --format '{{.Client.APIVersion}}'.

对于即使已经不运行了也不想清理的镜像,使用KEEP_IMAGES变量处理,此处我们添写:

vmware/harbor-*:*,*calico:*,*registry:*,*kubernetes-dashboard-amd64:*,*nginx-ingress-controller:*,*cvallance/mongo-k8s-sidecar:*

docker-cleanup-daemonset.yaml 配置如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
name: clean-up
name: clean-up
namespace: kube-system
spec:
updateStrategy:
type: "RollingUpdate"
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: clean-up
spec:
tolerations:
- key: "LB"
operator: "Exists"
effect: "NoExecute"
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
- name: docker-directory
hostPath:
path: /data/kubernetes/docker
containers:
- image: meltwater/docker-cleanup:latest
name: clean-up
env:
- name: CLEAN_PERIOD
value: "1800"
- name: DELAY_TIME
value: "60"
- name: DOCKER_API_VERSION
value: "1.29"
- name: KEEP_IMAGES
value: "vmware/harbor-*:*,*calico:*,*registry:*,*kubernetes-dashboard-amd64:*,*nginx-ingress-controller:*,*cvallance/mongo-k8s-sidecar:*"
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock
readOnly: false
- mountPath: /var/lib/docker
name: docker-directory
readOnly: false

使用DaemonSet+Taint/Tolerations+NodeSelector部署Nginx ingress controller

使用DaemonSet+NodeSelector+Tolerations的方式定义Nginx Ingress Controller,给专门节点打上Label+Taint,使得这些专门节点只运行Nginx Ingress Controller,而不会调度和运行其他业务容器,只用来做代理节点。

  • 在Kuberntes Cluster中准备N个节点,我们称之为代理节点。在这N个节点上只部署Nginx Ingress Controller(简称NIC)实例,不会跑其他业务容器。

  • 给代理节点打上NoExecute Taint,防止业务容器调度或运行在这些节点。

    kubectl taint nodes 10.8.8.234 LB=NIC:NoExecute

  • 给代理节点打上Label,让NIC只部署在打了对应Lable的节点上。
    kubectl label nodes 10.8.8.234 LB=NIC

  • 修改calico-node配置,让calico可以在NoExecute节点上运行

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
        spec:
    ...
    spec:
    tolerations:
    - key: "LB"
    operator: "Exists"
    effect: "NoExecute"
    ```

    - 定义DaemonSet Yaml文件,注意加上Tolerations和Node Selector。(注意先创建serviceAccount、role等)

    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
    annotations:

    deployment.kubernetes.io/revision: "4"
    

    labels:

    k8s-app: nginx-ingress-controller
    

    name: nginx-ingress-controller
    namespace: kube-system
    spec:
    selector:

    matchLabels:
    k8s-app: nginx-ingress-controller
    

    template:

    metadata:
    annotations:
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
    creationTimestamp: null
    labels:
        k8s-app: nginx-ingress-controller
    spec:
    # 加上对应的Node Selector
    nodeSelector:
        LB: NIC
    # 加上对应的Tolerations
    tolerations:
    - key: "LB"
        operator: "Equal"
        value: "NIC"
        effect: "NoExecute"
    containers:
    - args:
        - /nginx-ingress-controller
        - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
        - --tcp-services-configmap=$(POD_NAMESPACE)/nginx-tcp-ingress-configmap
        - --configmap=$(POD_NAMESPACE)/nginx-configuration
        env:
        - name: POD_NAME
        valueFrom:
            fieldRef:
            apiVersion: v1
            fieldPath: metadata.name
        - name: POD_NAMESPACE
        valueFrom:
            fieldRef:
            apiVersion: v1
            fieldPath: metadata.namespace
        image: dceph02.rmz.flamingo-inc.com:8888/mynginx/nginx-ingress-controller:0.9.0-beta.11
        imagePullPolicy: IfNotPresent
        livenessProbe:
        failureThreshold: 3
        httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
        initialDelaySeconds: 10
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
        name: nginx-ingress-controller
        ports:
        - containerPort: 80
        hostPort: 80
        protocol: TCP
        - containerPort: 443
        hostPort: 443
        protocol: TCP
        readinessProbe:
        failureThreshold: 3
        httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
        resources: {}
    hostNetwork: true
    serviceAccount: ingress
    serviceAccountName: ingress
    
    1
    2
    3
        
    - 创建default backend服务

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    name: default-http-backend
    labels:

    k8s-app: default-http-backend
    

    namespace: kube-system
    spec:
    replicas: 1
    template:

    metadata:
    labels:
        k8s-app: default-http-backend
    spec:
    terminationGracePeriodSeconds: 60
    containers:
    - name: default-http-backend
        # Any image is permissable as long as:
        # 1. It serves a 404 page at /
        # 2. It serves 200 on a /healthz endpoint
        image: gcr.io/google_containers/defaultbackend:1.0
        livenessProbe:
        httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
        initialDelaySeconds: 30
        timeoutSeconds: 5
        ports:
        - containerPort: 8080
        resources:
        limits:
            cpu: 10m
            memory: 20Mi
        requests:
            cpu: 10m
            memory: 20Mi
    

    apiVersion: v1
    kind: Service
    metadata:
    name: default-http-backend
    namespace: kube-system
    labels:

    k8s-app: default-http-backend
    

    spec:
    ports:

    • port: 80
      targetPort: 8080
      selector:
      k8s-app: default-http-backend
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      根据default-backend.yaml创建对应的Deployment和Service。 `kubectl create -f default-backend.yaml`

      - 根据DaemonSet Yaml创建NIC DaemonSet,启动NIC。

      `kubectl create -f nginx-ingress-daemonset.yaml`

      至此,NIC已经运行在代理节点上了,下面为测试内容。

      - (选择性)确认NIC启动成功后,创建测试用的服务。

      kubectl run echoheaders –image=gcr.io/google_containers/echoserver:1.8 –replicas=1 –port=8080
      kubectl expose deployment echoheaders –port=80 –target-port=8080 –name=echoheaders-x
      kubectl expose deployment echoheaders –port=80 –target-port=8080 –name=echoheaders-y
      1
      2
      创建测试用的Ingress Object

      apiVersion: extensions/v1beta1
      kind: Ingress
      metadata:
      name: echomap
      namespace: default
      spec:
      rules:
    • host: foo.bar.com
      http:
      paths:
      • backend:
        serviceName: echoheaders-x
        servicePort: 80
        path: /foo
    • host: bar.baz.com
      http:
      paths:
      • backend:
        serviceName: echoheaders-y
        servicePort: 80
        path: /bar
      • backend:
        serviceName: echoheaders-x
        servicePort: 80
        path: /foo
        1
        2
        3

        - (选择性)查看ingress的代理地址

        [root@host ~]# kubectl describe ing echomap
        Name: echomap
        Namespace: default
        Address: 10.8.8.234
        Default backend: default-http-backend:80 (172.254.109.193:8080)
        Rules:
        Host Path Backends

    foo.bar.com

    /foo    echoheaders-x:80 (<none>)
    

    bar.baz.com

    /bar    echoheaders-y:80 (<none>)
    /foo    echoheaders-x:80 (<none>)
    

    Annotations:
    Events:
    FirstSeen LastSeen Count From SubObjectPath Type Reason Message


    35m 35m 1 ingress-controller Normal CREATE Ingress default/echomap
    35m 35m 1 ingress-controller Normal UPDATE Ingress default/echomap

    1
    2
    3

    - 测试

    [root@host ~]# curl 10.8.8.234/foo -H ‘Host: foo.bar.com’

    Hostname: echoheaders-1076692255-p1ndv
    Pod Information:

    -no pod information available-
    

    Server values:

    server_version=nginx: 1.13.3 - lua: 10008
    

    Request Information:

    client_address=172.254.246.192
    method=GET
    real path=/foo
    query=
    request_version=1.1
    request_uri=http://foo.bar.com:8080/foo
    

    Request Headers:

    accept=*/*
    connection=close
    host=foo.bar.com
    user-agent=curl/7.29.0
    x-forwarded-for=10.8.8.234
    x-forwarded-host=foo.bar.com
    x-forwarded-port=80
    x-forwarded-proto=http
    x-original-uri=/foo
    x-real-ip=10.8.8.234
    x-scheme=http
    

    Request Body:

    -no body in request-
    

    [root@dceph04 ~]# curl 10.8.8.234/foo -H ‘Host: bar.baz.com’

    Hostname: echoheaders-1076692255-p1ndv
    Pod Information:

    -no pod information available-
    

    Server values:

    server_version=nginx: 1.13.3 - lua: 10008
    

    Request Information:

    client_address=172.254.246.192
    method=GET
    real path=/foo
    query=
    request_version=1.1
    request_uri=http://bar.baz.com:8080/foo
    

    Request Headers:

    accept=*/*
    connection=close
    host=bar.baz.com
    user-agent=curl/7.29.0
    x-forwarded-for=10.8.8.234
    x-forwarded-host=bar.baz.com
    x-forwarded-port=80
    x-forwarded-proto=http
    x-original-uri=/foo
    x-real-ip=10.8.8.234
    x-scheme=http
    

    Request Body:

    -no body in request-
    

    `

参考

https://my.oschina.net/jxcdwangtao/blog/1523812