Auto Repartition
Auto Repartition is an Autopilot strategy that automatically splits large Regions into smaller Regions. When a table has a large Region that may become a performance bottleneck, Auto Repartition samples data, generates new partition boundaries, and submits a Repartition action.
The split Regions can then be scheduled across multiple Datanodes to distribute potential bottlenecks. Auto Repartition reduces the operational cost of manually identifying large Regions and running Repartition. For manual Repartition, see Repartition.
Prerequisites
Auto Repartition depends on GreptimeDB Repartition. It is only available in distributed clusters and requires:
- Shared object storage, such as AWS S3.
- GC enabled on Metasrv and all Datanodes.
Otherwise, Repartition cannot be executed.
Object storage stores Region files. GC reclaims old files after their references are released, which prevents files still in use from being deleted during Repartition.
When to use Auto Repartition
Auto Repartition is useful in the following scenarios:
- Some large Regions may become performance bottlenecks.
- The original partition rule no longer matches the current data distribution.
- You want to split large Regions into smaller Regions and distribute potential bottlenecks through later scheduling.
- You want to reduce the operational cost of manually identifying large Regions and running Repartition.
Limitations
Auto Repartition only works for partitioned tables. It can only split tables that already have partition rules. If a table does not have partition rules, Auto Repartition does not generate new partition rules for it automatically.
For more information about table partitioning and Repartition, see Table Sharding and Repartition.
Configuration
Auto Repartition depends on the Autopilot runtime and cluster statistics. The following example includes both shared configuration and Auto Repartition configuration:
[[plugins]]
[plugins.autopilot]
tick_interval = "45s"
[[plugins]]
[plugins.cluster_stat]
sampling_window = "45s"
max_history_windows = 5
ewma_alpha = 0.2
[[plugins]]
[plugins.auto_repartition]
split_trigger_ratio = 1.8
max_split_parts = 3
table_repartition_cooldown_period = "60s"
max_actions_per_tick = 4
max_actions_per_table_per_tick = 2
In this example:
plugins.autopilotcontrols the Autopilot scheduling interval.plugins.cluster_statcontrols sampling and smoothing for Datanode and Region write statistics.plugins.auto_repartitioncontrols large Region split trigger conditions, split size, and submission limits.
For details about shared configuration, see Autopilot configuration.
Core options
| Option | Default | Description |
|---|---|---|
split_trigger_ratio | 1.8 | The load ratio required before a Region is considered for splitting. For example, the default value 1.8 means split planning starts only when a Region reaches more than 1.8 times the target per-Region write load. |
max_split_parts | 3 | The maximum number of child Regions a single Region can be split into. |
table_repartition_cooldown_period | "60s" | The table-level Repartition cooldown period. After a Repartition request is submitted successfully, the same table will not submit another Repartition request during this period. |
max_actions_per_tick | 4 | The maximum number of Repartition actions submitted in one scheduling cycle. |
max_actions_per_table_per_tick | 2 | The maximum number of Repartition actions submitted for one table in one scheduling cycle. |
Advanced options
The following options usually do not need to be changed. Adjust them only when you understand the table distribution and split-point selection behavior.
| Option | Default | Description |
|---|---|---|
sampling_budget | "10MB" | The maximum amount of data sampled when computing split points for one Region. A larger budget may improve split-point quality but increases planning cost. |
split_segment_min_ratio | 0.7 | The minimum allowed segment-size ratio when validating a split recommendation. |
split_segment_max_ratio | 1.3 | The maximum allowed segment-size ratio when validating a split recommendation. |
min_samples | 3 | The minimum number of historical samples required to evaluate Region write stability. |
max_region_history_cv | 0.2 | The maximum coefficient of variation allowed for Region write history. Regions above this value are considered unstable. |