Explore, compromise and harden Kubernetes pods

By default, many container services run as the privileged root user. Every container is just a running process. Preventing root execution by using non-root containers (configured when the image is built) or a rootless container engine (some container engines run in an unprivileged context rather than using a daemon running as root; for example, podman) limits the impact of a container compromise. Harden Kubernetes pods is one of very first steps in cluster security.

If the process has root privileges and gets exploited, you are basically giving the root access to the hacker.

Find a very comprehensive guide about Kubernetes cluster hardening here. In this article I will only focus on Pod Security Policies based on example of GKE cluster.

Create a simple GKE cluster

We have to have at least free trial account in GCP and create a simple test cluster.

After the cluster is created, check your installed version of Kubernetes using the kubectl version command:

kubectl version

View your running nodes in the Cloud Console. On the Navigation menu, click Compute Engine > VM Instances. Check if your Kubernetes cluster is now ready for use.

Run a Google Cloud-SDK pod

kubectl run -it --rm gcloud --image=google/cloud-sdk:latest --restart=Never -- bash

You should now have a bash shell inside the pod’s container:

root@gcloud:/#

It may take a few seconds for the container to be started and the command prompt to be displayed. If you don’t see a command prompt, try pressing Enter.

Explore the Compute Metadata endpoint

Explore the Compute Metadata endpoint running the following command to access the v1 Compute Metadata endpoint:

root@gcloud:/# curl -s http://metadata.google.internal/computeMetadata/v1/instance/name


Output looks like:

……
Your client does not have permission to get URL /computeMetadata/v1/instance/name from this server. Missing Metadata-Flavor:Google header.
……
Notice how it returns an error stating that it requires the custom HTTP header to be present.

Add the custom header on the next run and retrieve the Compute Engine instance name that is running this pod:

root@gcloud:/# curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/name

Output looks like:

gke-simplecluster-default-pool-b57a043a-6z5v
Note: If a custom HTTP header is not required to access a Compute Engine Instance Metadata endpoint, an attacker would need only an application flaw to trick a web URL to provide user credentials. By requiring a custom HTTP header, an attack is more difficult as the attacker would need both an application flaw and the custom header to be successful.

Explore the GKE node bootstrapping credentials. From inside the same pod shell, run the following command to list the attributes associated with the underlying Compute Engine instances. Be sure to include the trailing slash:

root@gcloud:/# curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/

Perhaps the most sensitive data in this listing is kube-env. It contains several variables which the kubelet uses as initial credentials when attaching the node to the GKE cluster. The variables CA_CERT, KUBELET_CERT, and KUBELET_KEY contain this information and are therefore considered sensitive to non-cluster administrators.

To see the potentially sensitive variables and data, run the following command:

root@gcloud:/# curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env

We found:

  • A flaw that allows for SSRF in a pod application
  • An application or library flaw that allow for RCE in a pod
  • An internal user with the ability to create or exec into a pod
  • There exists a high likelihood for compromise and exfiltration of sensitive kubelet bootstrapping credentials via the Compute Metadata endpoint. With the kubelet credentials, it is possible to leverage them in certain circumstances to escalate privileges to that of cluster-admin and therefore have full control of the GKE Cluster including all data, applications, and access to the underlying nodes.

Leverage the Permissions assigned to this Node Pool’s service account

Run the following curl command to list the OAuth scopes associated with the service account attached to the underlying Compute Engine instance:

root@gcloud:/# curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/scopes

(output)

https://www.googleapis.com/auth/devstorage.read_only
https://www.googleapis.com/auth/logging.write
https://www.googleapis.com/auth/monitoring
https://www.googleapis.com/auth/service.management.readonly
https://www.googleapis.com/auth/servicecontrol
https://www.googleapis.com/auth/trace.append

The combination of authentication scopes and the permissions of the service account dictates what applications on this node can access. The above list is the minimum scopes needed for most GKE clusters, but some use cases require increased scopes.

Warning: If, during cluster creation, you configured the authentication scope to include https://www.googleapis.com/auth/cloud-platform, any Google Cloud API would be in scope and only IAM permissions assigned to the service account would determine access.

Further, if the default service account with the default IAM Role of Editor is in use, any pod on this node pool has Editor permissions to the Google Cloud project where the GKE cluster is deployed. As the Editor IAM Role has a wide range of read/write permissions to interact with project resources such as Compute instances, Cloud Storage buckets, GCR registries, and more, this is a significant security risk.

Keep this shell inside the pod available for the next step.

Deploy a pod that mounts the host filesystem

One of the simplest paths for “escaping” to the underlying host is by mounting the host’s filesystem into the pod’s filesystem using standard Kubernetes volumes and volumeMounts in a Pod specification.

To demonstrate this, run the following to create a Pod that mounts the underlying host filesystem / at the folder named /rootfs inside the container:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: hostpath
spec:
containers:
   name: hostpath 
   image: google/cloud-sdk:latest 
   command: ["/bin/bash"] 
   args: ["-c", "tail -f /dev/null"] 
volumeMounts:
  mountPath: /rootfs
  name: rootfs
volumes:
  name: rootfs
  hostPath:
  path: /
EOF

Run kubectl get pod and re-run until it’s in the “Running” state:

kubectl get pod

(output)

NAME READY STATUS RESTARTS AGE
hostpath 1/1 Running 0 30s

Explore and compromise the underlying host

Run the following to obtain a shell inside the pod you just created:

kubectl exec -it hostpath -- bash

Switch to the pod shell’s root filesystem point to that of the underlying host:

chroot /rootfs /bin/bash

With those simple commands, the pod is now effectively a root shell on the node. You are now able to do the following:

run the standard docker command with full permissionsdocker ps
list docker imagesdocker images
docker run a privileged container of your choosingdocker run –privileged <imagename>:<imageversion>
examine the Kubernetes secrets mounted mount | grep volumes | awk '{print $3}' | xargs ls
exec into any running container (even into another pod in another namespace)docker exec -it <docker container ID> sh

Nearly every operation that the root user can perform is available to this pod shell. This includes persistence mechanisms like adding SSH users/keys, running privileged docker containers on the host outside the view of Kubernetes, and much more.

Now you can delete the hostpath pod:

kubectl delete pod hostpath

Understand the available controls

Next, we will focus on how to harden the pod control and avoid leveraging a root privileges with:

  • Disabling the Legacy Compute Engine Metadata API Endpoint – By specifying a custom metadata key and value, the v1beta1 metadata endpoint will no longer be available from the instance.
  • Enable Metadata Concealment – Passing an additional configuration during cluster and/or node pool creation, a lightweight proxy will be installed on each node that proxies all requests to the Metadata API and prevents access to sensitive endpoints.
  • Enable and configure PodSecurityPolicy – Configuring this option on a GKE cluster will add the PodSecurityPolicy Admission Controller which can be used to restrict the use of insecure settings during Pod creation. In this demo’s case, preventing containers from running as the root user and having the ability to mount the underlying host filesystem.

Deploy a second node pool

To enable you to experiment with and without the Metadata endpoint protections in place, you’ll create a second node pool that includes two additional settings. Pods that are scheduled to the generic node pool will not have the protections, and Pods scheduled to the second node pool will have them enabled.

Note: Legacy endpoints were deprecated on September 30, 2020. In GKE versions 1.12 and newer, the --metadata=disable-legacy-endpoints=true setting is automatically enabled. The next command below explicitly defines it for clarity.

Create the second node pool:

gcloud beta container node-pools create second-pool --cluster=simplecluster --zone=$MY_ZONE --num-nodes=1 --metadata=disable-legacy-endpoints=true --workload-metadata-from-node=SECURE

Run a Google Cloud-SDK pod

In Cloud Shell, launch a single instance of the Google Cloud-SDK container that will be run only on the second node pool with the protections enabled and not run as the root user:

kubectl run -it --rm gcloud --image=google/cloud-sdk:latest --restart=Never --overrides='{ "apiVersion": "v1", "spec": { "securityContext": { "runAsUser": 65534, "fsGroup": 65534 }, "nodeSelector": { "cloud.google.com/gke-nodepool": "second-pool" } } }' -- bash

You should now have a bash shell inside the pod’s container running on the node pool named second-pool. You should see the following:

nobody@gcloud:/$

It may take a few seconds for the container to start and the command prompt to open.

Explore various blocked endpoints

With the configuration of the second node pool set to --workload-metadata-from-node=SECURE , the following command to retrieve the sensitive file, kube-env, will now fail:

nobody@gcloud:/$ curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env

(Output)

This metadata endpoint is concealed.

But other commands to non-sensitive endpoints will still succeed if the proper HTTP header is passed:

nobody@gcloud:/$ curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/name

(output)

gke-simplecluster-second-pool-8fbd68c5-gzzp

Deploy PodSecurityPolicy objects

In order to have the necessary permissions to proceed, grant explicit permissions to your own user account to become cluster-admin:

kubectl create clusterrolebinding clusteradmin --clusterrole=cluster-admin --user="$(gcloud config list account --format 'value(core.account)')"

(Output)

clusterrolebinding.rbac.authorization.k8s.io/clusteradmin created

Next, deploy a more restrictive PodSecurityPolicy on all authenticated users in the default namespace:

cat <<EOF | kubectl apply -f -

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restrictive-psp
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default'
apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
# This is redundant with non-root + disallow privilege escalation,
# but we can provide it for defense in depth.
requiredDropCapabilities:
- ALL
# Allow core volume types.
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
# Assume that persistentVolumes set up by the cluster admin are safe to use.
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
# Require the container to run without root privileges.
rule: 'MustRunAsNonRoot'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
EOF

(output)

podsecuritypolicy.extensions/restrictive-psp created

Next, add the ClusterRole that provides the necessary ability to “use” this PodSecurityPolicy:

cat <<EOF | kubectl apply -f -

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: restrictive-psp
rules:
- apiGroups: extensions
  resources: podsecuritypolicies
  resourceNames: restrictive-psp
  verbs: use
EOF

(output)

clusterrole.rbac.authorization.k8s.io/restrictive-psp created

Finally, create a RoleBinding in the default namespace that allows any authenticated user permission to leverage the PodSecurityPolicy:

cat <<EOF | kubectl apply -f -

All service accounts in kube-system

can 'use' the 'permissive-psp' PSP

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: restrictive-psp
namespace: default
roleRef:
- apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: restrictive-psp
  subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
EOF

(output)

rolebinding.rbac.authorization.k8s.io/restrictive-psp created
Note: In a real environment, consider replacing the system:authenticated user in the RoleBinding with the specific user or service accounts that you want to have the ability to create pods in the default namespace.

Enable PodSecurity policy

Enable the PodSecurityPolicy Admission Controller:

gcloud beta container clusters update simplecluster --zone $MY_ZONE --enable-pod-security-policy

Deploy a blocked pod that mounts the host filesystem

Because the account used to deploy the GKE cluster was granted cluster-admin permissions in a previous step, it’s necessary to create another separate “user” account to interact with the cluster and validate the PodSecurityPolicy enforcement.

To do this, run:

gcloud iam service-accounts create demo-developer

(output)

Created service account [demo-developer].

Next, run these commands to grant these permissions to the service account – the ability to interact with the cluster and attempt to create pods:

MYPROJECT=$(gcloud config list --format 'value(core.project)')

gcloud projects add-iam-policy-binding "${MYPROJECT}" --role=roles/container.developer --member="serviceAccount:demo-developer@${MYPROJECT}.iam.gserviceaccount.com"

Obtain the service account credentials file by running:

gcloud iam service-accounts keys create key.json --iam-account "demo-developer@${MYPROJECT}.iam.gserviceaccount.com"

Configure kubectl to authenticate as this service account:

gcloud auth activate-service-account --key-file=key.json

To configure kubectl to use these credentials when communicating with the cluster, run:

gcloud container clusters get-credentials simplecluster --zone $MY_ZONE

Now, try to create another pod that mounts the underlying host filesystem / at the folder named /rootfs inside the container:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: hostpath
spec:
containers:
- name: hostpath  
  image: google/cloud-sdk:latest 
  command: ["/bin/bash"] 
  args: ["-c", "tail -f /dev/null"] 
  volumeMounts:
    mountPath: /rootfs
    - name: rootfs
      - volumes:
         - name: rootfs
           hostPath:
           path: /
EOF

This output validates that it’s blocked by PSP:

Error from server (Forbidden): error when creating "STDIN": pods "hostpath" is forbidden: unable to validate against any pod security policy: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used]

Deploy another pod that meets the criteria of the restrictive-psp:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: hostpath
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
containers:
- name: hostpath
  image: google/cloud-sdk:latest
  command: ["/bin/bash"]
  args: ["-c", "tail -f /dev/null"]
EOF

(output)

pod/hostpath created

To view the annotation that gets added to the pod indicating which PodSecurityPolicy authorized the creation, run:

kubectl get pod hostpath -o=jsonpath="{ .metadata.annotations.kubernetes.io/psp }"
PodSecurityPolicy was deprecated in Kubernetes v1.21, and removed from Kubernetes in v1.25.

Instead of using PodSecurityPolicy, you can enforce similar restrictions on Pods using either or both:

Hope you like the post. Please find more Kubernetes related topics here. Subscribe to our newsletter or follow us on Twitter and LinkedIn.

Save your privacy, bean ethical!

subscribe to newsletter

and receive weekly update from our blog

By submitting your information, you're giving us permission to email you. You may unsubscribe at any time.

Leave a Comment