Zero trust deployment with Kubernetes
Using OpenSource software written by unkown people sometimes can be a little scary. Even more when I deploy them I a production environment in my company. On my case, I have created a brand new Kubernetes cluster to host some private services on my local network and I wanted to be sure that they don't do anything malicious on my network.
Isolate applications and services
First thing to do is to isolate each of the services inside a Kubernetes namespace. This is a core resource of Kubernetes that allow to isolate groups of resources within a single cluster. It then allow to have a more fine control on access, permissions, network on the resources.
A Namespace can be create pretty easily with the following command :
Then I can list Namespaces with :
Result :NAME STATUS AGE
default Active 4h52m
kube-system Active 4h52m
kube-public Active 4h52m
kube-node-lease Active 4h52m
tiwabbit-prod Active 2s
Ignore Namespaces prefixed with kube-
who are defined for the Kubernetes control plane
I can now create resources inside my Namespace. Let's start with a Secret for example :
kubectl \
--namespace tiwabbit-prod \
create secret generic \
db-credentials \
--from-literal=username=tiwabbit \
--from-literal=password=mysecurepassword
If I list all Secrets in my cluster :
Result :NAMESPACE NAME TYPE DATA AGE
[...]
default default-token-m59jl kubernetes.io/service-account-token 3 4h58m
tiwabbit-prod default-token-8zkr2 kubernetes.io/service-account-token 3 3m15s
tiwabbit-prod db-credentials Opaque 2 49s
In theory, only Pods running inside my Namespace (tiwabbit-prod
) can mount this secret and read it.
RBAC securization
By default all Pods without specific configuration use the default
ServiceAccount of the Namespace they are running in.
This last one dosn't have any right on the Kubernetes API witch is a great thing and should not be changed.
How ever in some scenarios my application may needs to call the Kubernetes API. For exemple if it need to create batch using a Kubernetes Job. Let's take that last exemple to create a ServiceAccount with those permission an assign it to my application Pod.
First I need to create a new ServiceAccount :
Then I need a Role that implement the level of permission that my Pod need. here is the manifest :
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: my-application-batch-creator
namespace: tiwabbit-prod
rules:
- apiGroups: []
resources: [ pods, pods/status, pods/log ]
verbs: [ get, list, watch ]
- apiGroups: [ batch ]
resources: [ jobs ]
verbs: [ create, get, list, watch, patch, update, delete ]
Finally, I need to assign the Role to my ServiceAccount using a RoleBinding with the following manifest definition :
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: my-application-permissions
namespace: tiwabbit-prod
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: my-application-batch-creator
subjects:
- kind: ServiceAccount
name: my-application
namespace: tiwabbit-prod
Now let's create a Pod with the ServiceAccount and review the actions allowed by the role. Here is a Deployment manifest :
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-application
namespace: tiwabbit-prod
spec:
replicas: 1
selector:
matchLabels:
app: my-application
template:
metadata:
labels:
app: my-application
spec:
serviceAccountName: my-application
containers:
- command:
- sleep
- 1d
image: alpine:latest
name: alpine
Geting inside the Pod and follow the Kubernetes documentation to install kubectl
.
Let's check if I can list Pods by querying the Kubernetes API :
Result :Then I should try to create a batch execution with a Job :
Listing all the Jobs in the Namespace :
Result :Listing the Pods in the Namespace :
Result :NAME READY STATUS RESTARTS AGE
my-application-df5b5cb75-8twc5 1/1 Running 0 11m
my-application-batch-f2pf6 1/1 Running 0 3s
My Job finished and I can delete it with :
Conclusions
If the process in my Pod doesn't need to communicate with the Kubernetes API (for creating, querying, deleting ressources), use the default
ServiceAccount witch give zero permission to my Pod. In other case, use a custom ServiceAccount for each of my apps and multiple Role and RoleBinding for each of my application use case.
Prevent usage of root user or privileged escalation
Kubernetes allow by default multiple security options that can be applied to pods and underlaying containers. Most of them can be configured with the SecurityContext block as follow :
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
securityContext: {}
containers:
- name: my-secure-container
image: nginx:latest
securityContext: {}
- name: my-second-secure-container
image: busybox:latest
command: ["sleep", "infinity"]
securityContext: {}
Obviously such options can be added to a StatefulSet or Deployment template.
Running pod as non root
The easiest setup to secure the Pod is to overwrite the UID and GID of the user running the main process. This way, what ever the default user in the image, Kubernetes will bypass it. An UID and GID superior or equal to 1000
should be used.
Three parameters can be used :
runAsUser
: Allow to change the UID of the default processrunAsGroup
: Allow to change the GID of the default processfsGroup
: If specified, the user will also be in that group and all files and directories created will take that GID as owner.- This last parameter can increase the mount time of a given external volume because Kubernetes ensure that files are owned by the group defined by
fsGroup
. Resulting on anchmod -R
of all the filesystem. fsGroupChangePolicy
can be used to change this behaviour with those values :OnRootMismatch
: only check the root directory and change all filesystem if the group mismatchAlways
: it's in the option name
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 2000
fsGroupChangePolicy: OnRootMismatch
containers:
- name: my-secure-container
image: busybox:latest
command: ["sleep", "infinity"]
securityContext:
runAsUser: 2000
runAsGroup: 2000
Enforcing running as non root
could be achieved with runAsNonRoot
parameter :
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
securityContext:
runAsNonRoot: true
containers:
- name: my-secure-container
image: busybox:latest
command: ["sleep", "infinity"]
securityContext: {}
Use readonly filesystem
Preventing all write/update/delete operation on the root filesystem can prevent malicious process (or breached application) to take advantage of the container environment and modify it.
The readOnlyRootFilesystem
option allow to achieve that :
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
securityContext: {}
containers:
- name: my-secure-container
image: busybox:latest
command: ["sleep", "infinity"]
securityContext:
readOnlyRootFilesystem: true
If the application need to write temporary or application data during the runtime, I can still create an EmptyDir volume and mounting it inside the container :
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
securityContext: {}
containers:
- name: my-secure-container
image: busybox:latest
command: ["sleep", "infinity"]
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /data
name: my-data
volumes:
- name: my-data
emptyDir: {}
Prevent priviled escalation
Linux kernel expose low level function for a process to change is current UID or GID. For example using [setuid][linux-setuid-wikipedia] or setgid privitive functions.
Using the option allowPrivilegeEscalation
can prevent this to happens.
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
securityContext: {}
containers:
- name: my-secure-container
image: busybox:latest
command: ["sleep", "infinity"]
securityContext:
allowPrivilegeEscalation: false
Dropping all capabilities
As for privileged escalation, Linux give Capabilities to process allowing them to make high privileged system calls. Like binding network ports under 1024
.
We want to only give our process only the necessary capabilities for it to run. In the cas of the exemple we can have :
apiVersion: v1
kind: Pod
metadata:
name: my-secure-pod
spec:
securityContext: {}
containers:
- name: my-second-secure-container
image: busybox:latest
command: ["sleep", "infinity"]
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
Network communication hardening
In the world of pods, by default every pods in every namespace can talk to each other. If you have multiple application or even multiple clients on the same Kubernetes cluster, you should not allow them to communicate (at least not by default).
That's why, NetworkPolicies must be created with deny all
by default. Then open point to point connections if services/clients needs to communicates with each others.
The default deny all
config will looks like this :
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress
Then if you have a third party application that needs to access your application pods :
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: my-namespace
spec:
podSelector:
matchLabels:
application: my-application
component: api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
project: my-friend-namespace
- podSelector:
matchLabels:
application: his-application
ports:
- protocol: TCP
port: 8080