---
title: Zero trust deployment with Kubernetes
date: 2022-08-11
slug: zero-trust-kubernetes-deployment
authors:
- lunik
description: How to securely deploy application inside a Kubernetes Cluster. Especially with untrusted application or OpenSource software.
tags:
- kubernetes
- security
- firewall
- network
- zero-trust
- namespace
- capabilities
- system
- privileged
---
<!--
# CHANGELOG

-->

![cover](/blog/img/posts/2022-08-11-zero-trust-kubernetes-deployment/cover.jpg)

Using [OpenSource][opensource-wikipedia] software written by unkown people sometimes can be a little scary. Even more when I deploy them I a production environment in my company.
On my case, I have created a brand new [Kubernetes][kubernetes-website] cluster to host some private services on my local network and I wanted to be sure that they don't do anything malicious on my network.

<!-- truncate -->

## Isolate applications and services

First thing to do is to isolate each of the services inside a [Kubernetes namespace][kubernetes-doc-namespace]. This is a core resource of [Kubernetes][kubernetes-website] that allow to isolate groups of resources within a single cluster. It then allow to have a more fine control on access, permissions, network on the resources.

A [Namespace][kubernetes-doc-namespace] can be create pretty easily with the following command :
```shell
kubectl create namespace <insert-namespace-name-here>
```

Then I can list [Namespaces][kubernetes-doc-namespace] with :
```shell
kubectl get namespaces
```
Result :
```shell
NAME              STATUS   AGE
default           Active   4h52m
kube-system       Active   4h52m
kube-public       Active   4h52m
kube-node-lease   Active   4h52m
tiwabbit-prod     Active   2s
```

*Ignore [Namespaces][kubernetes-doc-namespace] prefixed with `kube-` who are defined for the [Kubernetes control plane][kubernetes-doc-control-plane]*

I can now create resources inside my [Namespace][kubernetes-doc-namespace]. Let's start with a [Secret][kubernetes-doc-secret] for example :
```shell
kubectl \
  --namespace tiwabbit-prod \
  create secret generic \
  db-credentials \
  --from-literal=username=tiwabbit \
  --from-literal=password=mysecurepassword
```

If I list all [Secrets][kubernetes-doc-secret] in my cluster :
```shell
kubectl get secrets --all-namespaces
```
Result :
```shell
NAMESPACE       NAME                  TYPE                                  DATA   AGE
[...]
default         default-token-m59jl   kubernetes.io/service-account-token   3      4h58m
tiwabbit-prod   default-token-8zkr2   kubernetes.io/service-account-token   3      3m15s
tiwabbit-prod   db-credentials        Opaque                                2      49s
```

In theory, only [Pods][kubernetes-doc-pod] running inside my [Namespace][kubernetes-doc-namespace] (`tiwabbit-prod`) can mount this secret and read it.

## RBAC securization

By default all [Pods][kubernetes-doc-pod] without specific configuration use the `default` [ServiceAccount][kubernetes-doc-service-account] of the [Namespace][kubernetes-doc-namespace] they are running in.
This last one dosn't have any right on the [Kubernetes API][kubernetes-doc-api] witch is a great thing and should not be changed.

How ever in some scenarios my application may needs to call the [Kubernetes API][kubernetes-doc-api]. For exemple if it need to create batch using a [Kubernetes Job][kubernetes-doc-job]. Let's take that last exemple to create a [ServiceAccount][kubernetes-doc-service-account] with those permission an assign it to my application [Pod][kubernetes-doc-pod].

First I need to create a new [ServiceAccount][kubernetes-doc-service-account] :
```shell
kubectl \
  --namespace tiwabbit-prod \
  create serviceaccount \
  my-application
```

Then I need a [Role][kubernetes-doc-rbac] that implement the level of permission that my [Pod][kubernetes-doc-pod] need. here is the manifest :
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: my-application-batch-creator
  namespace: tiwabbit-prod
rules:
- apiGroups: []
  resources: [ pods, pods/status, pods/log ]
  verbs: [ get, list, watch ]

- apiGroups: [ batch ]
  resources: [ jobs ]
  verbs: [ create, get, list, watch, patch, update, delete ]
```

Finally, I need to assign the [Role][kubernetes-doc-rbac] to my [ServiceAccount][kubernetes-doc-service-account] using a [RoleBinding][kubernetes-doc-rbac] with the following manifest definition :
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-application-permissions
  namespace: tiwabbit-prod
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: my-application-batch-creator
subjects:
- kind: ServiceAccount
  name: my-application
  namespace: tiwabbit-prod
```

Now let's create a [Pod][kubernetes-doc-pod] with the [ServiceAccount][kubernetes-doc-service-account] and review the actions allowed by the role. Here is a [Deployment][kubernetes-doc-deployment] manifest :
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-application
  namespace: tiwabbit-prod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-application
  template:
    metadata:
      labels:
        app: my-application
    spec:
      serviceAccountName: my-application
      containers:
      - command:
        - sleep
        - 1d
        image: alpine:latest
        name: alpine
```

Geting inside the [Pod][kubernetes-doc-pod] and follow the [Kubernetes documentation to install `kubectl`][doc-kubectl-install].

Let's check if I can list [Pods][kubernetes-doc-pod] by querying the Kubernetes API : 
```shell
kubectl get pods
```
Result : 
```shell
NAME                             READY   STATUS    RESTARTS   AGE
my-application-df5b5cb75-8twc5   1/1     Running   0          5m21s
```

Then I should try to create a batch execution with a [Job][kubernetes-doc-job] :
```shell
kubectl create job my-application-batch --image alpine:latest -- echo "Hello World"
```

Listing all the [Jobs][kubernetes-doc-job] in the [Namespace][kubernetes-doc-namespace] :
```shell
kubectl get jobs
```
Result :
```shell
NAME                   COMPLETIONS   DURATION   AGE
my-application-batch   0/1           2s         2s
```

Listing the [Pods][kubernetes-doc-pod] in the [Namespace][kubernetes-doc-namespace] :
```shell
kubectl get pods
```
Result :
```shell
NAME                             READY   STATUS    RESTARTS   AGE
my-application-df5b5cb75-8twc5   1/1     Running   0          11m
my-application-batch-f2pf6       1/1     Running   0          3s
```

My [Job][kubernetes-doc-job] finished and I can delete it with : 
```shell
kubectl delete job/my-application-batch
```

### Conclusions

If the process in my [Pod][kubernetes-doc-pod] doesn't need to communicate with the [Kubernetes API][kubernetes-doc-api] (for creating, querying, deleting ressources), use the `default` [ServiceAccount][kubernetes-doc-service-account] witch give zero permission to my [Pod][kubernetes-doc-pod]. In other case, use a custom [ServiceAccount][kubernetes-doc-service-account] for each of my apps and multiple [Role][kubernetes-doc-rbac] and [RoleBinding][kubernetes-doc-rbac] for each of my application use case.

## Prevent usage of root user or privileged escalation

Kubernetes allow by default multiple security options that can be applied to pods and underlaying containers. Most of them can be configured with the [SecurityContext][kubernetes-doc-security-context] block as follow :

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-secure-pod
spec:
  securityContext: {}
  containers:
  - name: my-secure-container
    image: nginx:latest
    securityContext: {}
  - name: my-second-secure-container
    image: busybox:latest
    command: ["sleep", "infinity"]
    securityContext: {}
```

Obviously such options can be added to a [StatefulSet][kubernetes-doc-statefulset] or [Deployment][kubernetes-doc-deployment] template.

### Running pod as non root

The easiest setup to secure the [Pod][kubernetes-doc-pod] is to overwrite the [UID][linux-uid-wikipedia] and [GID][linux-gid-wikipedia] of the user running the main process. This way, what ever the default user in the image, [Kubernetes][kubernetes-website] will bypass it. An [UID][linux-uid-wikipedia] and [GID][linux-gid-wikipedia] superior or equal to `1000` should be used.

Three parameters can be used :

- `runAsUser` : Allow to change the [UID][linux-uid-wikipedia] of the default process
- `runAsGroup` : Allow to change the [GID][linux-gid-wikipedia] of the default process
- `fsGroup` : If specified, the user will also be in that group and all files and directories created will take that [GID][linux-gid-wikipedia] as owner.
  - This last parameter can increase the mount time of a given external volume because [Kubernetes][kubernetes-website] ensure that files are owned by the group defined by `fsGroup`. Resulting on an `chmod -R` of all the filesystem.
  - `fsGroupChangePolicy` can be used to change this behaviour with those values :
    - `OnRootMismatch` : only check the root directory and change all filesystem if the group mismatch
    - `Always` : it's in the option name

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-secure-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 2000
    fsGroupChangePolicy: OnRootMismatch
  containers:
  - name: my-secure-container
    image: busybox:latest
    command: ["sleep", "infinity"]
    securityContext:
      runAsUser: 2000
      runAsGroup: 2000
```

Enforcing running as non `root` could be achieved with `runAsNonRoot` parameter :

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-secure-pod
spec:
  securityContext:
    runAsNonRoot: true
  containers:
  - name: my-secure-container
    image: busybox:latest
    command: ["sleep", "infinity"]
    securityContext: {}
```

### Use readonly filesystem

Preventing all write/update/delete operation on the root filesystem can prevent malicious process (or breached application) to take advantage of the container environment and modify it.
The `readOnlyRootFilesystem` option allow to achieve that :

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-secure-pod
spec:
  securityContext: {}
  containers:
  - name: my-secure-container
    image: busybox:latest
    command: ["sleep", "infinity"]
    securityContext:
      readOnlyRootFilesystem: true
```

If the application need to write temporary or application data during the runtime, I can still create an [EmptyDir][kubernetes-doc-empty-dir-volume] volume and mounting it inside the container :

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-secure-pod
spec:
  securityContext: {}
  containers:
  - name: my-secure-container
    image: busybox:latest
    command: ["sleep", "infinity"]
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
    - mountPath: /data
      name: my-data
  volumes:
  - name: my-data
    emptyDir: {}
```

### Prevent priviled escalation

Linux kernel expose low level function for a process to change is current [UID][linux-uid-wikipedia] or [GID][linux-gid-wikipedia]. For example using [setuid][linux-setuid-wikipedia] or [setgid][linux-setuid-setgid-wikipedia] privitive functions.

Using the option `allowPrivilegeEscalation` can prevent this to happens.

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-secure-pod
spec:
  securityContext: {}
  containers:
  - name: my-secure-container
    image: busybox:latest
    command: ["sleep", "infinity"]
    securityContext:
      allowPrivilegeEscalation: false
```

### Dropping all capabilities

As for privileged escalation, Linux give [Capabilities][linux-capabilities-manpage] to process allowing them to make high privileged system calls. Like binding network ports under `1024`.

We want to only give our process only the necessary capabilities for it to run.
In the cas of the exemple we can have :

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-secure-pod
spec:
  securityContext: {}
  containers:
  - name: my-second-secure-container
    image: busybox:latest
    command: ["sleep", "infinity"]
    securityContext:
      capabilities:
        drop:
        - ALL
        add:
        - NET_BIND_SERVICE
```

## Network communication hardening

In the world of pods, by default every pods in every namespace can talk to each other. If you have multiple application or even multiple clients on the same Kubernetes cluster, you should not allow them to communicate (at least not by default).

That's why, [NetworkPolicies][kubernetes-doc-networkpolicy] must be created with `deny all` by default. Then open point to point connections if services/clients needs to communicates with each others.

The default `deny all` config will looks like this :

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}
  policyTypes:
  - Ingress
```

Then if you have a third party application that needs to access your application pods :

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: my-namespace
spec:
  podSelector:
    matchLabels:
      application: my-application
      component: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              project: my-friend-namespace
        - podSelector:
            matchLabels:
              application: his-application
      ports:
        - protocol: TCP
          port: 8080
```

<!-- links -->

[opensource-wikipedia]: https://en.wikipedia.org/wiki/Open-source_model
[kubernetes-website]: https://kubernetes.io
[kubernetes-doc-namespace]: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
[kubernetes-doc-control-plane]: https://kubernetes.io/docs/concepts/overview/components/#control-plane-components
[kubernetes-doc-secret]: https://kubernetes.io/docs/concepts/configuration/secret/
[kubernetes-doc-pod]: https://kubernetes.io/docs/concepts/workloads/pods/
[kubernetes-doc-service-account]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
[kubernetes-doc-api]: https://kubernetes.io/docs/concepts/overview/kubernetes-api/
[kubernetes-doc-job]: https://kubernetes.io/docs/concepts/workloads/controllers/job/
[kubernetes-doc-rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
[kubernetes-doc-deployment]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
[kubernetes-doc-statefulset]: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
[kubernetes-doc-security-context]: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
[kubernetes-doc-empty-dir-volume]: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
[kubernetes-doc-networkpolicy]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
[linux-uid-wikipedia]: https://en.wikipedia.org/wiki/User_identifier
[linux-gid-wikipedia]: https://en.wikipedia.org/wiki/Group_identifier
[linux-setuid-setgid-wikipedia]: https://en.wikipedia.org/wiki/Setuid
[linux-capabilities-manpage]: https://manpages.ubuntu.com/manpages/jammy/en/man7/capabilities.7.html
[doc-kubectl-install]: https://kubernetes.io/docs/tasks/tools/#kubectl