Skip to main content
Version: Nightly

Table Reconciliation

Overview

GreptimeDB uses a layered metadata management architecture in distributed environments:

  • Metasrv: Acts as the metadata management layer, responsible for maintaining metadata information for all tables in the cluster
  • Datanode: Handles actual data storage and query execution, while also persisting some table metadata

Under ideal conditions, table metadata between Metasrv and Datanode should remain perfectly consistent. However, in real production environments, restore from a metadata backup may cause metadata inconsistencies.

Table Reconciliation is a table metadata repair mechanism provided by GreptimeDB that:

  • Detects metadata differences between Metasrv and Datanode
  • Repairs inconsistency issues according to predefined strategies
  • Ensures the system can recover from abnormal states to a consistent, available state

Before starting

Before starting the table reconciliation process, you need to:

  1. Restore the cluster from a specific metadata backup
  2. Set the Next Table ID to the original cluster's next table ID

Repair Scenarios

Table not found Error

After a cluster is restored from specific metadata, write and query operations may encounter Table not found errors.

  • Scenario 1: The original cluster created new tables after the metadata backup, causing the new table metadata to not be included in the backup. This results in Table not found errors when querying these new tables. In this case, the new created table will be lost, and you must to manually set the Next Table ID to ensure that the restored cluster won't fail to create tables due to table ID conflicts when creating new tables.

  • Scenario 2: The original cluster renamed existing tables after the metadata backup. In this case, the new table names will be lost.

Empty region directory Error

After a cluster is restored from specific metadata, starting the Datanode may result in an Empty region directory error. This usually occurs because the original cluster deleted tables (executed DROP TABLE) after the metadata backup, causing the deleted table metadata to not be included in the backup, resulting in this error when starting the Datanode. For this situation, you need to enable Recovery Mode after starting Metasrv when starting the cluster to ensure the Datanode can start normally.

  • Mito Engine tables: Table metadata cannot be repaired, you need to manually execute the DROP TABLE command to delete the non-existent table.
  • Metric Engine tables: Table metadata can be repaired, you need to manually execute the ADMIN reconcile_table(table_name) command to repair the table metadata.

No field named Error

After a cluster is restored from specific metadata, write and query operations may encounter No field named errors. This usually occurs because the original cluster deleted columns (executed DROP COLUMN) after the metadata backup, causing the deleted column metadata to not be included in the backup, resulting in this error when querying these deleted columns. For this situation, you need to manually execute the ADMIN reconcile_table(table_name) command to repair the table metadata.

schema has a different type Error

After a cluster is restored from specific metadata, write and query operations may encounter schema has a different type errors. This usually occurs because the original cluster modified column types (executed MODIFY COLUMN [column_name] [type]) after the metadata backup, causing the modified column type metadata to not be included in the backup, resulting in this error when querying these modified columns. For this situation, you need to manually execute the ADMIN reconcile_table(table_name) command to repair the table metadata.

Missing Specific Columns

After a cluster is restored from specific metadata, write and query operations may run normally but not include some columns. This occurs because the original cluster added columns (executed ADD COLUMN) after the metadata backup, causing the new column metadata to not be included in the backup, making these columns unavailable during queries. For this situation, you need to manually execute the ADMIN reconcile_table(table_name) command to repair the table metadata.

Missing Column Indexes

After a cluster is restored from specific metadata, write and query operations may run normally, but SHOW CREATE TABLE/SHOW INDEX FROM [table_name] shows that certain columns don't include expected indexes. This occurs because the original cluster modified indexes (executed MODIFY INDEX [column_name] SET [index_type] INDEX) after the metadata backup, causing the index change metadata to not be included in the backup. For this situation, you need to manually execute the ADMIN reconcile_table(table_name) command to repair the table metadata.

Repair operations

GreptimeDB provides the following Admin functions to trigger table metadata repair:

Repair All Tables

Repair the metadata inconsistency of all tables in the entire cluster:

ADMIN reconcile_catalog()

Repair a Specific Database

Repair the metadata inconsistency of all tables in a specific database:

ADMIN reconcile_database(database_name)

Repair a Specific Table

Repair the metadata inconsistency of a single table:

ADMIN reconcile_table(table_name)

View Repair Progress

After the Admin function is executed, it will return a ProcedureID, you can use the following command to view the progress of the repair task:

ADMIN procedure_state(procedure_id)

When procedure_state returns Done, it indicates that the repair task has completed.

Important Notes

When performing table metadata repair operations, please note the following:

  • Repair operations are executed asynchronously, and you can check the execution progress through the procedure_id
  • It is recommended to perform repair operations during low-traffic periods to reduce the impact on system performance
  • For large-scale repair operations (such as reconcile_catalog()), it is recommended to validate in a test environment first