Formating NVMe disks on IOPS nodes
Find out how to partition and format NVMe disks on OVHcloud Managed Kubernetes IOPS nodes
Find out how to partition and format NVMe disks on OVHcloud Managed Kubernetes IOPS nodes
Last updated May 4th, 2020.
When you order IOPS nodes for your OVHcloud Managed Kubernetes cluster, the NVMe disks are neither partitioned nor formatted, therefore they cannot be used to create Kubernetes persistent volumes. In this tutorial we are going to guide you on how to easily achieve the partitioning and the formatting of your NVMe disks on your existing nodes but also on future ones.
Kubernetes IOPS nodes are very useful to run applications that rely heavily on disk performance, as IOPS nodes have one or several local NVMe SSD disks. Applications like Elasticsearch for indexation or Spark for big data would be very good candidate for this.
This tutorial assumes that you already have a working OVHcloud Managed Kubernetes cluster, and some basic knowledge of how to operate it. If you want to know more on those topics, please look at the deploying a Hello World application documentation.
To be sure that all our nodes with NVMe disks are properly prepared, we are going to create a daemonset. This daemonset will create a new pod on every node in the cluster that has the label disktype=nvme
. If the node is an IOPS node and the NVMe disks are not already partitioned, the pod will proceed with the partitioning and the formatting of the NVMe disks.
One huge advantage of this method is the fact that if we add a new IOPS node later in the cluster, the disk will be automatically prepared by the daemonset as soon as we add the label disktype=nvme
to the new node.
Let's create a format-nvme-configmap.yaml
file:
apiVersion: v1
kind: ConfigMap
metadata:
name: format-nvme-config
namespace: kube-system
data:
partition_number: "3"
In this example we are going to split the NVMe disks in 3
equal sized partitions. Each partition might be later associated to a Kubernetes persistent volume. This means that we will be able to create three local NVMe persistant volumes per node. If you need more or less partitions, you can change the partition_number
parameter in the configmap definition.
Create the configmap:
kubectl apply -f format-nvme-configmap.yaml
Let’s create a format-nvme-daemonset.yaml
file:
apiVersion: v1
kind: ConfigMap
metadata:
name: format-nvme-script
namespace: kube-system
data:
format-nvme.sh: |
#!/bin/bash
set -e
if [ -b /dev/nvme0n1 ] && [ ! -d /dev/nvme ]
then
echo "Installing LVM tools"
apt-get -qq update
apt-get -y -qq -o Dpkg::Options::="--force-confold" install lvm2
echo "Disable LVM lock"
sed -i 's/locking_type = 1/locking_type = 0/' /etc/lvm/lvm.conf
pv_list=""
for disk_name in /dev/nvme[0-9]n1
do
echo "Creating physical volume for $disk_name"
pvcreate "$disk_name"
pv_list="$pv_list$disk_name "
done
echo "Creating virtual group nvme"
vgcreate nvme $pv_list
PERCENT=$(( 100 / PARTITION_NUMBER ))
for i in $(seq 1 "$PARTITION_NUMBER")
do
echo "Creating logical volume nvme$i"
lvcreate -Zn -n "nvme$i" -l $PERCENT%VG nvme
while [ ! -b "/dev/nvme/nvme$i" ]
do
echo "Wait for logical volume /dev/nvme/nvme$i"
sleep 1
done
echo "Creating filesystem ext4 on nvme$i"
mkfs -t ext4 "/dev/nvme/nvme$i"
done
echo "Enable LVM lock"
sed -i 's/locking_type = 0/locking_type = 1/' /etc/lvm/lvm.conf
fi
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: format-nvme
namespace: kube-system
labels:
k8s-app: format-nvme
spec:
selector:
matchLabels:
name: format-nvme
template:
metadata:
labels:
name: format-nvme
spec:
nodeSelector:
disktype: nvme
containers:
- name: format-nvme
image: ubuntu
command: ["/bin/sh","-c"]
args: ["/script/format-nvme.sh; while true; do echo Sleeping && sleep 3600; done"]
env:
- name: PARTITION_NUMBER
valueFrom:
configMapKeyRef:
name: format-nvme-config
key: partition_number
volumeMounts:
- name: format-nvme-script
mountPath: /script
- name: dev
mountPath: /dev
- name: etc-lvm
mountPath: /etc/lvm
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumes:
- name: format-nvme-script
configMap:
name: format-nvme-script
defaultMode: 0755
- name: dev
hostPath:
path: /dev
- name: etc-lvm
hostPath:
path: /etc/lvm
And deploy the daemonset on the cluster:
kubectl apply -f format-nvme-daemonset.yaml
Currently, the daemonset should not have started any pod in our cluster as none of our nodes has the label disktype=nvme
.
Let's fix this by adding the disktype=nvme
label on our IOPS nodes. In this example our IOPS nodes are called node-nvme-1
and node-nvme-2
, please modify this accordingly.
kubectl label nodes node-nvme-1 node-nvme-2 disktype=nvme
We should now have one format-nvme
pod running on each of our cluster node that has the label disktype=nvme
:
kubectl get pod -n kube-system -l 'name=format-nvme' -o wide
Under the hood all our IOPS nodes should now have three partitions /dev/nvme/nvme1
, /dev/nvme/nvme2
, and /dev/nvme/nvme3
created on their NVMe disk and these partitions should be of equal size.
First we need to create a new storage class. Let’s create a local-nvme-storage-class.yaml
file:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-nvme
provisioner: kubernetes.io/no-provisioner
allowVolumeExpansion: true
volumeBindingMode: Immediate
And apply this to our cluster:
kubectl apply -f local-nvme-storage-class.yaml
Then let’s create a new persistent volume on the first partition of our first IOPS node using our local NVMe storage class. In this example the node is called node-nvme-1
and the partition used is the first one /dev/nvme/nvme1
, please modify this accordingly. To do this, let’s create a local-nvme-persistent-volume.yaml
file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-pv
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-nvme
local:
path: /dev/nvme/nvme1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node-nvme-1
And apply this to create the persistent volume:
kubectl apply -f local-nvme-persistent-volume.yaml
Then, we need to create a persistent volume claim. Let’s create a local-nvme-persistent-volume-claim.yaml
file:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: local-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-nvme
resources:
requests:
storage: 50Gi
And apply this to create a 50Gi
claim:
kubectl apply -f local-nvme-persistent-volume-claim.yaml
Let’s now create an Nginx pod using the persistent volume claim as his webroot folder. For this, let’s create a local-nvme-nginx-pod.yaml
file:
apiVersion: v1
kind: Pod
metadata:
name: local-nvme-nginx
spec:
volumes:
- name: local-volume
persistentVolumeClaim:
claimName: local-pvc
containers:
- name: local-volume-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: local-volume
And deploy the Nginx pod:
kubectl apply -f local-nvme-nginx-pod.yaml
Let's enter inside the Nginx container to create a file on the NVMe presistent volume:
kubectl exec -it local-nvme-nginx -- bash
Create a new index.html
file:
echo "NVMe disk!" > /usr/share/nginx/html/index.html
and exit the Nginx container:
exit
Let's try to access our new web page:
kubectl proxy
and open the URL http://localhost:8001/api/v1/namespaces/default/pods/http:local-nvme-nginx:/proxy/
Now let's try to see if the data is persistent by deleting our current Nginx pod:
kubectl delete -f local-nvme-nginx-pod.yaml
Then, let's try to create a new one:
kubectl apply -f local-nvme-nginx-pod.yaml
We should now check that our web page is still accessible:
kubectl proxy
and browse the same URL http://localhost:8001/api/v1/namespaces/default/pods/http:local-nvme-nginx:/proxy/
Let's remove the Nginx pod used for testing:
kubectl delete -f local-nvme-nginx-pod.yaml
Let's remove the persistent volume claim used for testing:
kubectl delete -f local-nvme-persistent-volume-claim.yaml
And finally let's remove the persistent volume used for testing:
kubectl delete -f local-nvme-persistent-volume.yaml
Although the persistent volume is removed from our Kubernetes cluster, the data are still present on the node disk.
Let's identify the name of our format-nvme pod running on our node-nvme-1:
kubectl get pods -n kube-system -o wide -l 'name=format-nvme' --field-selector spec.nodeName=node-nvme-1
$ kubectl get pods -n kube-system -o wide -l 'name=format-nvme' --field-selector spec.nodeName=node-nvme-1
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
format-nvme-ct84s 1/1 Running 0 8m13s 10.2.0.6 node-nvme-1
And enter inside the container:
kubectl exec -it format-nvme-ct84s -n kube-system -- bash
We can now reformat our /dev/nvme/nvme1 logical volume:
mkfs -t ext4 /dev/nvme/nvme1
And exit the container:
exit
If you don't plan to add more IOPS nodes to your cluster, you should remove the daemonset. It is of course still possible to redeploy it later without automatically reformat the existing partitions.
kubectl delete -f format-nvme-daemonset.yaml
kubectl delete -f format-nvme-configmap.yaml
Please feel free to give any suggestions in order to improve this documentation.
Whether your feedback is about images, content, or structure, please share it, so that we can improve it together.
Your support requests will not be processed via this form. To do this, please use the "Create a ticket" form.
Thank you. Your feedback has been received.
Access your community space. Ask questions, search for information, post content, and interact with other OVHcloud Community members.
Discuss with the OVHcloud community