Cluster Maintenance Mode
Maintenance mode is a safety feature in GreptimeDB that temporarily disables automatic cluster management operations.
This mode is particularly useful during:
- Cluster upgrades
- Planned downtime
- Any operation that might temporarily affect cluster stability
When to Use Maintenance Mode
With GreptimeDB Operator
If you are upgrading a cluster using GreptimeDB Operator, you don't need to enable the maintenance mode manually. The operator handles this automatically.
Without GreptimeDB Operator
When upgrading a cluster without using GreptimeDB Operator, you must manually enable Metasrv's maintenance mode before:
- Rolling upgrades of Datanode nodes
- Metasrv nodes upgrades
- Frontend nodes upgrades
- Any operation that might cause temporary node unavailability
Impact of Maintenance Mode
When maintenance mode is enabled:
- Auto Balancing (if enabled) will be paused
- Region Failover (if enabled) will be paused
- Manual region operations are still possible
- Read and write operations continue to work normally
- Monitoring and metrics collection continue to function
Managing Maintenance Mode
The maintenance mode can be enabled and disabled through Metasrv's HTTP interface at: http://{METASRV}:{RPC_PORT}/admin/maintenance?enable=true
. Note that this interface listens on Metasrv's RPC_PORT
, which defaults to 3002
.
Enable Maintenance Mode
Enable maintenance mode by sending a POST request to the /admin/maintenance
endpoint.
curl -X POST 'http://localhost:3002/admin/maintenance?enable=true'
The expected output is:
{"enabled":true}
If you encounter any issues or unexpected behavior, do not proceed with maintenance operations.
Disable Maintenance Mode
Before disabling maintenance mode:
- Ensure all components are healthy and operational
- Verify that all nodes are properly joined to the cluster
curl -X POST 'http://localhost:3002/admin/maintenance?enable=false'
The expected output is:
{"enabled":false}
Check Maintenance Mode Status
Check maintenance mode status by sending a GET request to the /admin/maintenance
endpoint.
curl -X GET http:://localhost:3002/admin/maintenance
The expected output is:
{"enabled":false}
Troubleshooting
Common Issues
- Maintenance mode cannot be enabled
- Verify Metasrv is running and accessible
- Check if you have the correct permissions
- Ensure the RPC port is correct
Best Practices
- Always verify the maintenance mode status before and after operations
- Have a rollback plan ready
- Monitor cluster health during maintenance
- Document all changes made during maintenance