Manage ETCD
The GreptimeDB cluster requires an etcd cluster for metadata storage by default. Let's install an etcd cluster using Bitnami's etcd Helm chart.
Prerequisites
- Kubernetes >= v1.23
- kubectl >= v1.18.0
- Helm >= v3.0.0
Install
helm upgrade --install etcd \
oci://registry-1.docker.io/bitnamicharts/etcd \
--version 10.2.12 \
--set replicaCount=3 \
--set auth.rbac.create=false \
--set auth.rbac.token.enabled=false \
--create-namespace \
-n etcd-cluster
Wait for etcd cluster to be running:
kubectl get po -n etcd-cluster
Expected Output
The etcd initialClusterState parameter specifies the initial state of the etcd cluster when starting etcd nodes. It is important for determining how the node will join the cluster. The parameter can take the following two values:
- new: This value indicates that the etcd cluster is new. All nodes will start as part of a new cluster, and any previous state will not be used.
- existing: This value indicates that the node will join an already existing etcd cluster. In this case, you must ensure that the initialCluster parameter is configured with the information of all nodes in the current cluster.
After the etcd cluster is running, we need to set the initialClusterState parameter to existing:
helm upgrade --install etcd \
oci://registry-1.docker.io/bitnamicharts/etcd \
--version 10.2.12 \
--set initialClusterState="existing" \
--set removeMemberOnContainerTermination=false \
--set replicaCount=3 \
--set auth.rbac.create=false \
--set auth.rbac.token.enabled=false \
--create-namespace \
-n etcd-cluster
Wait for etcd cluster to be running, use the following command to check the health status of etcd cluster:
kubectl -n etcd-cluster \
exec etcd-0 -- etcdctl \
--endpoints etcd-0.etcd-headless.etcd-cluster:2379,etcd-1.etcd-headless.etcd-cluster:2379,etcd-2.etcd-headless.etcd-cluster:2379 \
endpoint status -w table
Expected Output
Backup
In the bitnami etcd chart, a shared storage volume Network File System (NFS) is used to store etcd backup data. By using CronJob in Kubernetes to perform etcd snapshot backups and mount NFS PersistentVolumeClaim (PVC), snapshots can be transferred to NFS.
Add the following configuration and name it etcd-backup.yaml
file, Note that you need to modify existingClaim to your NFS PVC name:
replicaCount: 3
auth:
rbac:
create: false
token:
enabled: false
initialClusterState: "existing"
removeMemberOnContainerTermination: false
disasterRecovery:
enabled: true
cronjob:
schedule: "*/30 * * * *"
historyLimit: 2
snapshotHistoryLimit: 2
pvc:
existingClaim: "${YOUR_NFS_PVC_NAME_HERE}"
Redeploy etcd cluster:
helm upgrade --install etcd \
oci://registry-1.docker.io/bitnamicharts/etcd \
--version 10.2.12 \
--create-namespace \
-n etcd-cluster --values etcd-backup.yaml
You can see the etcd backup scheduled task:
kubectl get cronjob -n etcd-cluster
Expected Output
kubectl get pod -n etcd-cluster
Expected Output
kubectl logs etcd-snapshotter-28936038-tsck8 -n etcd-cluster
Expected Output
Next, you can see the etcd backup snapshot in the NFS server:
ls ${NFS_SERVER_DIRECTORY}
Expected Output
Restore
When you encounter etcd data loss or corruption, such as critical information stored in etcd being accidentally deleted, or catastrophic cluster failure that prevents recovery, you need to perform an etcd restore. Additionally, restoring etcd can also be useful for development and testing purposes.
Before recovery, you need to stop writing data to the etcd cluster (stop GreptimeDB Metasrv writing) and create the latest snapshot file use for recovery.
Add the following configuration file and name it etcd-restore.yaml
. Note that existingClaim is the name of your NFS PVC, and snapshotFilename is change to the etcd snapshot file name:
replicaCount: 3
auth:
rbac:
create: false
token:
enabled: false
startFromSnapshot:
enabled: true
existingClaim: "${YOUR_NFS_PVC_NAME_HERE}"
snapshotFilename: "${YOUR_ETCD_SNAPSHOT_FILE_NAME}"
Deploy etcd recover cluster:
helm upgrade --install etcd-recover \
oci://registry-1.docker.io/bitnamicharts/etcd \
--version 10.2.12 \
--create-namespace \
-n etcd-cluster --values etcd-restore.yaml
After waiting for the etcd recover cluster to be Running, redeploy the etcd recover cluster:
helm upgrade --install etcd-recover \
oci://registry-1.docker.io/bitnamicharts/etcd \
--version 10.2.12 \
--set initialClusterState="existing" \
--set removeMemberOnContainerTermination=false \
--set replicaCount=3 \
--set auth.rbac.create=false \
--set auth.rbac.token.enabled=false \
--create-namespace \
-n etcd-cluster
Next, change Metasrv etcdEndpoints to the new etcd recover cluster, in this example is "etcd-recover.etcd-cluster.svc.cluster.local:2379"
:
apiVersion: greptime.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: greptimedb
spec:
# Other configuration here
meta:
etcdEndpoints:
- "etcd-recover.etcd-cluster.svc.cluster.local:2379"
Restart GreptimeDB Metasrv to complete etcd restore.