profile
viewpoint

Ask questionsCilium agent logs complain about `Unable to fetch kubernetes labels`

Affected versions:

  • Cilium 1.7.x
  • Potentially later versions, not yet observed.

At this stage, only observed in CI (https://github.com/cilium/cilium/issues/10442).

Symptoms

  1. Every 15s or so, the cilium-agent log prints a warning about Unable to fetch kubernetes labels:

    level=warning msg="Unable to fetch kubernetes labels" containerID=015ef4fdbf datapathPolicyRevision=2 desiredPolicyRevision=1 endpointID=1596 error="pod.core \"app3-c6c587577-fq6kr\" not found" identity=5 ipv4=10.10.0.197 ipv6="f00d::a0a:0:0:aeec" k8sPodName=default/app3-c6c587577-fq6kr subsys=resolve-labels-default/app3-c6c587577-fq6kr
    
  2. cilium status reports failing controllers with resolve-labels-xxx:

    Failed controllers:
     controller resolve-labels-default/app1-68cb4f68c5-cftnr failure 'pod.core "app1-68cb4f68c5-cftnr" not found'
    
  3. There is no corresponding pod for these endpoints.

  4. Endpoints show up in the cilium endpoint list with the identity reserved:init, and they never get a proper identity:

    cmd: kubectl exec -n kube-system cilium-56pmk -- cilium endpoint list
    Exitcode: 0 
    Stdout:
     	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                            
    IPv6                 IPv4          STATUS   
    	            ENFORCEMENT        ENFORCEMENT                                                                                                            
    	 898        Enabled            Enabled           5          reserved:init                                          f00d::a0a:0:0:af1    10.10.0.151   ready   
    	 1479       Enabled            Enabled           5          reserved:init                                          f00d::a0a:0:0:969e   10.10.0.149   ready   
    

Impact

There is no known impact other than the failure logs potentially consuming disk space. The related application pods were already deleted so there is no traffic impact.

Migitation

Restarting the cilium-agent should cause Cilium to reevaluate the existence of this endpoint and clean up.

If you have more details on this issue, for example because you observe it in a real Cilium deployment, please post details below and react to this issue with :+1: so we know how widely it afffects the community.

cilium/cilium

Answer questions diversario

Seeing this currently happening to one Cilium 1.7.4 agent in k8s 1.16:

Controller Status:      157/159 healthy
  Name                                                                  Last success   Last error   Count   Message
  resolve-labels-deployment-4459/webserver-deployment-c7997dcc8-tgq7m   never          2m26s ago    231     pod.core "webserver-deployment-c7997dcc8-tgq7m" not found
  sync-to-k8s-ciliumendpoint (3912)                                     never          2m8s ago     231     namespaces "deployment-4459" not found

the pod and namespace in question were created and removed by Sonobuoy. It doesn't appear to be causing actual issues. Running cilium endpoint disconnect 3912 resolved the complaints.

useful!

Related questions

KVStore lock for identity gets stuck hot 1
0.0.0.0 CIDR identity is allocated globally hot 1
Operator disregards adaptor limit when doing ENI allocation hot 1
source:https://uonfu.com/
Github User Rank List