Skip to main content
Version: Nightly

EXPLAIN

EXPLAIN is used to provide the execution plan of a statement.

Syntax

EXPLAIN [ANALYZE] [VERBOSE] SELECT ...

The ANALYZE clause will execute the statement and measure time spent at each plan node and the total rows of the output etc.

The VERBOSE clause will provide more detailed information about the execution plan.

Examples

Explains the following query:

EXPLAIN SELECT * FROM monitor where host='host1'\G

Example:

*************************** 1. row ***************************
plan_type: logical_plan
plan: MergeScan [is_placeholder=false]
*************************** 2. row ***************************
plan_type: physical_plan
plan: MergeScanExec: peers=[4612794875904(1074, 0), ]

The column plan_type indicates whether it's alogical_plan or physical_plan. And the column plan explains the plan in detail.

The MergeScan plan merges the results from multiple regions. The peers array in the MergeScanExec physical plan contains the IDs of the regions that the plan will scan.

Explains the execution of the plan by ANALYZE:

EXPLAIN ANALYZE SELECT * FROM monitor where host='host1'\G

Example:

*************************** 1. row ***************************
stage: 0
node: 0
plan: MergeScanExec: peers=[4612794875904(1074, 0), ] metrics=[output_rows: 0, greptime_exec_read_cost: 0, finish_time: 3301415, first_consume_time: 3299166, ready_time: 3104209, ]

*************************** 2. row ***************************
stage: 1
node: 0
plan: SeqScan: region=4612794875904(1074, 0), partition_count=0 (0 memtable ranges, 0 file 0 ranges) metrics=[output_rows: 0, mem_used: 0, build_parts_cost: 1, build_reader_cost: 1, elapsed_await: 1, elapsed_poll: 21250, scan_cost: 1, yield_cost: 1, ]

*************************** 3. row ***************************
stage: NULL
node: NULL
plan: Total rows: 0

The EXPLAIN ANALYZE command provides metrics about the execution of each stage. The SeqScan plan scans the data from a single region.

Explains the verbose information of the plan execution:

EXPLAIN ANALYZE VERBOSE SELECT * FROM monitor where host='host1';

Example:

*************************** 1. row ***************************
stage: 0
node: 0
plan: MergeScanExec: peers=[4612794875904(1074, 0), ] metrics=[output_rows: 0, greptime_exec_read_cost: 0, finish_time: 3479084, first_consume_time: 3476000, ready_time: 3209041, ]

*************************** 2. row ***************************
stage: 1
node: 0
plan: SeqScan: region=4612794875904(1074, 0), partition_count=0 (0 memtable ranges, 0 file 0 ranges), projection=["host", "ts", "cpu", "memory"], filters=[host = Utf8("host1")], metrics_per_partition: [[partition=0, {prepare_scan_cost=579.75µs, build_reader_cost=0ns, scan_cost=0ns, convert_cost=0ns, yield_cost=0ns, total_cost=789.708µs, num_rows=0, num_batches=0, num_mem_ranges=0, num_file_ranges=0, build_parts_cost=0ns, rg_total=0, rg_fulltext_filtered=0, rg_inverted_filtered=0, rg_minmax_filtered=0, rg_bloom_filtered=0, rows_before_filter=0, rows_fulltext_filtered=0, rows_inverted_filtered=0, rows_bloom_filtered=0, rows_precise_filtered=0, num_sst_record_batches=0, num_sst_batches=0, num_sst_rows=0, first_poll=785.041µs}]] metrics=[output_rows: 0, mem_used: 0, build_parts_cost: 1, build_reader_cost: 1, elapsed_await: 1, elapsed_poll: 17208, scan_cost: 1, yield_cost: 1, ]

*************************** 3. row ***************************
stage: NULL
node: NULL
plan: Total rows: 0

The EXPLAIN ANALYZE VERBOSE command provides the detail metrics about the execution of scan plans.

Scanner Metrics

Scanner nodes, including SeqScan, SeriesScan, and UnorderedScan, can print scanner-specific metrics in EXPLAIN ANALYZE and EXPLAIN ANALYZE VERBOSE output. These metrics help diagnose read-path behavior such as partitioning, SST pruning, index usage, cache behavior, and scanner timing.

Verbose scanner metrics are diagnostic text, not a stable public API. Optional zero-valued fields are usually omitted.

Output Layout

A scanner line can contain three layers of information:

  • Scanner display fields before the trailing metrics=[...] section.
  • metrics_per_partition, which is printed only in verbose mode and keeps each partition's PartitionMetrics snapshot separate.
  • DataFusion aggregate metrics in the trailing metrics=[...] section.

In metrics_per_partition, partition means a DataFusion execution-plan output partition. It is not a GreptimeDB region and not a table partition created by PARTITION ON COLUMNS(...). One scanner metric partition may cover one or more GreptimeDB scan ranges.

Normal mode layout:

<Scanner>: region=<region>, <scanner display fields> metrics=[<aggregate metrics>]

Verbose mode layout:

<Scanner>: region=<region>, {
"partition_count": {...},
"projection": [...],
"filters": [...],
"dyn_filters": [...],
"files": [...],
"flat_format": true,
"metrics_per_partition": [
{
"partition": <partition>,
"metrics": {
<per-partition metrics>,
"fetch_metrics": {...},
"metadata_cache_metrics": {...},
"top_file_metrics": {...}
}
}
]
} metrics=[<aggregate metrics>]

Aggregate Metrics

The trailing metrics=[...] section is DataFusion's aggregate output for the scanner plan node. Scanner partitions record these values through ExecutionPlanMetricsSet, and the plan output aggregates them across partitions.

The exact aggregate fields can vary by version and execution path. The output may include additional runtime fields such as mem_used, elapsed_poll, and elapsed_await.

MetricMeaning
output_rowsRows returned by the scanner plan node.
elapsed_computeAggregate scanner compute time reported to DataFusion.
build_parts_costAggregate time spent building SST file ranges.
build_reader_costAggregate time spent building readers or merge readers.
convert_costAggregate time spent converting batches into Arrow record batches.
scan_costAggregate time spent scanning data from scanner inputs.
yield_costAggregate time spent waiting while yielding batches downstream.

Example with synthetic values:

SeqScan: region=0(1, 0), "partition_count":{"count":2, "mem_ranges":1, "files":1, "file_ranges":1} metrics=[output_rows: 128, elapsed_compute: 12ms, build_parts_cost: 1.2ms, build_reader_cost: 2.1ms, convert_cost: 300µs, scan_cost: 8.4ms, yield_cost: 900µs, ]

Scanner Display Fields

The scanner display fields describe what the scanner will read and which predicates or projections are attached to it. EXPLAIN ANALYZE prints only the non-verbose fields. EXPLAIN ANALYZE VERBOSE adds detailed scanner input and per-partition metrics.

FieldModeMeaningWhen present
Scanner nameAllThe physical scanner, such as SeqScan, SeriesScan, or UnorderedScan.Always.
regionAllRegion ID scanned by this plan node.Always.
partition_count.countAllNumber of partition ranges in the scanner.Always.
partition_count.mem_rangesAllNumber of memtable ranges in all partition ranges.Always.
partition_count.filesAllNumber of SST files in the scan input before range expansion.Always.
partition_count.file_rangesAllNumber of SST file ranges in all partition ranges.Always.
partition_count.other_rangesAllNumber of extension or non-memtable, non-SST ranges.Nonzero only.
selectorAllSeries row selector attached to the scan.When a series row selector is attached.
distributionAllDistribution information attached to the scan.When distribution is attached.
projectionVerboseOutput column names after projection pruning.When the output schema is not empty.
filtersVerboseStatic physical predicate expressions pushed into the scanner. These predicates may drive row-group pruning, index application, and precise filtering.When static predicates exist.
dyn_filtersVerboseDynamic filter expressions attached after plan creation. These can change during execution as upstream operators produce filter values.When dynamic filters exist.
vector_index_kVerboseVector index top-k value used by vector search.When the vector index feature is enabled and the scan uses vector index search.
filesVerboseSST file metadata for files in the scan input.When SST files are present.
flat_formatVerboseWhether the scan input is expected to use flat format.Always in verbose mode.
extension_rangesVerboseEnterprise extension ranges attached to the scan.Enterprise builds only, when extension ranges exist.
metrics_per_partitionVerbosePer-partition scanner metrics collected by PartitionMetrics.Verbose mode after partitions have metrics.

Each files entry contains:

FieldMeaning
file_idRegion file ID of the SST file.
time_range_startInclusive start of the file time range, formatted as value::unit.
time_range_endInclusive end of the file time range, formatted as value::unit.
rowsNumber of rows recorded in the file metadata.
sizeSST file size in bytes.
index_sizeTotal index size in bytes recorded for the file.

Example verbose scanner display with synthetic values:

UnorderedScan: region=0(1, 0), {"partition_count":{"count":4, "mem_ranges":1, "files":2, "file_ranges":3}, "projection": ["host", "ts", "value"], "filters": ["host = Utf8(\"demo\")"], "dyn_filters": ["ts@1 < @runtime_filter"], "files": [{"file_id":"0(1, 0)/00000000-0000-0000-0000-000000000001","time_range_start":"1000::Millisecond","time_range_end":"2000::Millisecond","rows":1024,"size":65536,"index_size":4096}], "flat_format": true, "metrics_per_partition": [...]}

Per-Partition Metrics

metrics_per_partition is a verbose-only list. Each item contains a partition number and that partition's metric snapshot. These values are not the same output layer as the aggregate metrics=[...] section, even when metric names overlap.

Example with synthetic values:

{
"metrics_per_partition": [
{
"partition": 0,
"metrics": {
"prepare_scan_cost": "500µs",
"build_reader_cost": "2ms",
"scan_cost": "8ms",
"yield_cost": "1ms",
"total_cost": "12ms",
"num_rows": 128,
"num_batches": 4,
"num_mem_ranges": 1,
"num_file_ranges": 1,
"build_parts_cost": "1ms",
"sst_scan_cost": "6ms",
"rg_total": 3,
"rows_before_filter": 4096,
"num_sst_record_batches": 2,
"num_sst_batches": 2,
"num_sst_rows": 96,
"first_poll": "600µs",
"convert_cost": "300µs",
"rg_bloom_filtered": 1,
"rows_bloom_filtered": 1024,
"fetch_metrics": {
"total_fetch_elapsed": "2ms",
"page_cache_hit": 2,
"cache_miss": 1
},
"metadata_cache_metrics": {
"metadata_load_cost": "100µs",
"mem_cache_hit": 1
},
"build_ranges_peak_mem_size": 2048,
"num_peak_range_builders": 1,
"stream_eof": true
}
}
]
}

All three scanners use the same PartitionMetrics and ScanMetricsSet structure for verbose per-partition metrics.

Timing and Output

MetricScannerMeaningWhen present
prepare_scan_costAllElapsed time between query start and partition metric creation.Always.
build_reader_costAllTime spent building readers or merge readers in this partition.Always.
scan_costAllTime spent polling scanner inputs in this partition.Always.
yield_costAllTime spent after yielding batches to downstream operators in this partition.Always.
convert_costAllTime spent converting batches to Arrow record batches in this partition.When conversion time is recorded.
total_costAllElapsed time from query start until the partition finishes, or until the metrics object is dropped.Always.
first_pollAllElapsed time from query start until the partition stream is first polled.Always.
num_rowsAllRows returned by this partition.Always.
num_batchesAllBatches returned by this partition.Always.
num_distributor_rowsSeriesScanRows scanned by the series distributor.Nonzero only.
num_distributor_batchesSeriesScanBatches scanned by the series distributor.Nonzero only.
distributor_scan_costSeriesScanTime spent by the series distributor scanning input.Nonzero only.
distributor_yield_costSeriesScanTime spent by the series distributor sending batches to partition channels.Nonzero only.
distributor_divider_costSeriesScanTime spent splitting flat batches into series batches.Nonzero only.

Ranges and SST

MetricScannerMeaningWhen present
num_mem_rangesAllMemtable ranges scanned by this partition.Always.
num_file_rangesAllSST file ranges scanned by this partition.Always.
build_parts_costAllTime spent building SST file ranges in this partition.Always.
sst_scan_costAllTime spent scanning SST readers.Always.
rg_totalAllRow groups considered before pruning.Always.
num_sst_record_batchesAllArrow record batches read from SST readers.Always.
num_sst_batchesAllBatches decoded from SST readers.Always.
num_sst_rowsAllRows decoded from SST readers.Always.

Filters and Pruning

MetricScannerMeaningWhen present
rg_fulltext_filteredAllRow groups filtered by fulltext index.Nonzero only.
rg_inverted_filteredAllRow groups filtered by inverted index.Nonzero only.
rg_minmax_filteredAllRow groups filtered by min-max pruning.Nonzero only.
rg_bloom_filteredAllRow groups filtered by bloom filter index.Nonzero only.
rg_vector_filteredAllRow groups filtered by vector index.Nonzero only.
rows_before_filterAllRows in candidate row groups before row-level filtering.Always.
rows_fulltext_filteredAllRows filtered by fulltext index.Nonzero only.
rows_inverted_filteredAllRows filtered by inverted index.Nonzero only.
rows_bloom_filteredAllRows filtered by bloom filter index.Nonzero only.
rows_vector_filteredAllRows filtered by vector index.Nonzero only.
rows_vector_selectedAllRows selected by vector index search.Nonzero only.
rows_precise_filteredAllRows filtered by precise row-level filters.Nonzero only.
pruner_cache_hitAllPruner builder cache hits while building file ranges.Nonzero only.
pruner_cache_missAllPruner builder cache misses while building file ranges.Nonzero only.
pruner_prune_costAllTime spent waiting for pruners to build file ranges.Nonzero only.

Index Result Caches

MetricScannerMeaningWhen present
fulltext_index_cache_hitAllFulltext index result cache hits.Nonzero only.
fulltext_index_cache_missAllFulltext index result cache misses.Nonzero only.
inverted_index_cache_hitAllInverted index result cache hits.Nonzero only.
inverted_index_cache_missAllInverted index result cache misses.Nonzero only.
bloom_filter_cache_hitAllBloom filter index result cache hits.Nonzero only.
bloom_filter_cache_missAllBloom filter index result cache misses.Nonzero only.
minmax_cache_hitAllMin-max pruning cache hits.Nonzero only.
minmax_cache_missAllMin-max pruning cache misses.Nonzero only.

Memtables

MetricScannerMeaningWhen present
mem_scan_costAllTime spent scanning memtables.Nonzero only.
mem_rowsAllRows read from memtables.Nonzero only.
mem_batchesAllBatches read from memtables.Nonzero only.
mem_seriesAllSeries read from time-series memtables.Nonzero only.
mem_prefilter_costAllTime spent applying memtable prefilters.Nonzero only.
mem_prefilter_rows_filteredAllRows filtered by memtable prefilters.Nonzero only.

SeriesScan Distributor

MetricMeaningWhen present
num_series_send_timeoutTimes the distributor timed out sending to a partition channel.Nonzero only.
num_series_send_fullTimes a non-blocking send found a full partition channel.Nonzero only.

Range Cache and Lifecycle

MetricScannerMeaningWhen present
range_cache_sizeAllBytes added to the range cache during the scan.Nonzero only.
range_cache_hitAllRange cache lookup hits.Nonzero only.
range_cache_missAllRange cache lookup misses.Nonzero only.
build_ranges_peak_mem_sizeAllPeak memory tracked while building file ranges.Always.
num_peak_range_buildersAllPeak number of active file range builders.Always.
stream_eofAllWhether the partition stream reached EOF normally.Always.

Nested Metrics

Verbose per-partition output can include nested metric objects when the related work is recorded.

MetricScannerMeaningWhen present
fetch_metricsAllPage and row-group fetch metrics from Parquet readers.When fetch or prefilter work is recorded.
metadata_cache_metricsAllParquet metadata cache metrics.When metadata load time is recorded.
inverted_index_apply_metricsAllElapsed time and cache/read metrics for inverted index appliers.When inverted indexes are applied.
bloom_filter_apply_metricsAllElapsed time and cache/read metrics for bloom filter index appliers.When bloom filter indexes are applied.
fulltext_index_apply_metricsAllElapsed time and cache/read metrics for fulltext index appliers.When fulltext indexes are applied.
merge_metricsSeqScan, SeriesScanMerge reader metrics.When merge scan cost is recorded.
dedup_metricsSeqScan, SeriesScanDeduplication metrics.When dedup cost is recorded.
top_file_metricsAllUp to ten files with the largest accumulated build_part_cost + build_reader_cost + scan_cost.When per-file metrics are collected in verbose mode.

fetch_metrics fields:

FieldMeaningWhen present
total_fetch_elapsedTotal elapsed time for fetching row groups.Always when fetch_metrics is printed.
page_cache_hitPage cache hits.Nonzero only.
write_cache_hitWrite cache hits.Nonzero only.
cache_missCache misses.Nonzero only.
pages_to_fetch_memPages to fetch from memory cache.Nonzero only.
page_size_to_fetch_memBytes to fetch from memory cache.Nonzero only.
pages_to_fetch_write_cachePages to fetch from write cache.Nonzero only.
page_size_to_fetch_write_cacheBytes to fetch from write cache.Nonzero only.
pages_to_fetch_storePages to fetch from object store.Nonzero only.
page_size_to_fetch_storeBytes to fetch from object store.Nonzero only.
page_size_neededBytes actually needed by the read.Nonzero only.
write_cache_fetch_elapsedElapsed time fetching from write cache.Nonzero only.
store_fetch_elapsedElapsed time fetching from object store.Nonzero only.
prefilter_costElapsed time running row-group prefilters.Nonzero only.
prefilter_filtered_rowsRows filtered by row-group prefilters.Nonzero only.

metadata_cache_metrics fields:

FieldMeaningWhen present
metadata_load_costTime spent loading Parquet metadata.Always when metadata_cache_metrics is printed.
mem_cache_hitParquet metadata memory cache hits.Nonzero only.
file_cache_hitParquet metadata file cache hits.Nonzero only.
cache_missParquet metadata cache misses.Nonzero only.
num_readsMetadata read operations.Nonzero only.
bytes_readMetadata bytes read from storage.Nonzero only.

Index apply metric fields:

Parent metricFieldMeaningWhen present
inverted_index_apply_metricsapply_elapsedTime spent applying inverted indexes.Always when parent is printed.
inverted_index_apply_metricsblob_cache_missIndex blob cache misses.Nonzero only.
inverted_index_apply_metricsblob_read_bytesBytes read for index blobs.Nonzero only.
inverted_index_apply_metricsinverted_index_read_metricsNested inverted index read metrics.Always when parent is printed.
bloom_filter_apply_metricsapply_elapsedTime spent applying bloom filter indexes.Always when parent is printed.
bloom_filter_apply_metricsblob_cache_missIndex blob cache misses.Nonzero only.
bloom_filter_apply_metricsblob_read_bytesBytes read for index blobs.Nonzero only.
bloom_filter_apply_metricsread_metricsNested bloom filter read metrics.Always when parent is printed.
fulltext_index_apply_metricsapply_elapsedTime spent applying fulltext indexes.Always when parent is printed.
fulltext_index_apply_metricsblob_cache_missFulltext index blob cache misses.Nonzero only.
fulltext_index_apply_metricsdir_cache_hitFulltext index directory cache hits.Nonzero only.
fulltext_index_apply_metricsdir_cache_missFulltext index directory cache misses.Nonzero only.
fulltext_index_apply_metricsdir_init_elapsedTime spent initializing fulltext index directory data.Nonzero only.
fulltext_index_apply_metricsbloom_filter_read_metricsNested bloom filter read metrics used by the fulltext path.Always when parent is printed.

Index read metric fields used by inverted_index_read_metrics, read_metrics, and bloom_filter_read_metrics:

FieldMeaningWhen present
total_bytesBytes read for index data.Nonzero only.
cache_hitIndex data cache hits.Nonzero only.
total_rangesRanges read for index data.Nonzero only.
fetch_elapsedElapsed time fetching index data.Nonzero only.
cache_missIndex data cache misses.Nonzero only.

merge_metrics fields:

FieldMeaningWhen present
scan_costTotal scan cost of the merge reader.Always when merge_metrics is printed.
init_costTime spent initializing the merge reader.Nonzero only.
num_fetch_by_batchesNumber of batch-oriented fetches from sources.Nonzero only.
num_fetch_by_rowsNumber of row-oriented fetches from sources.Nonzero only.
fetch_costTime spent fetching batches from sources.Nonzero only.

dedup_metrics fields:

FieldMeaningWhen present
dedup_costTime spent deduplicating rows.Always when dedup_metrics is printed.
num_unselected_rowsRows removed by deduplication or delete filtering.Nonzero only.
num_deleted_rowsDeleted rows removed during deduplication.Nonzero only.

Each top_file_metrics entry is keyed by file ID and can contain:

FieldMeaningWhen present
build_part_costTime spent building this file's ranges or parts.Always for each printed file entry.
num_rangesRanges read from this file.Nonzero only.
num_rowsRows read from this file.Nonzero only.
build_reader_costTime spent building readers for this file.Nonzero only.
scan_costTime spent scanning this file.Nonzero only.

Scanner Differences

SeqScan scans ranges while preserving the ordering required by the read path. It may report merge_metrics when merge readers are used and dedup_metrics when deduplication removes older versions or deleted rows.

UnorderedScan provides no output ordering guarantee. It uses the same per-partition metric structure as SeqScan, but downstream plan nodes may add sort operators when a query needs ordered output.

SeriesScan returns rows grouped by series. It uses the same per-partition metric structure as the other scanners and can also report distributor metrics from the series distributor partition.