Skip to content

Quick Start

What's Remote WAL

WAL(Write-Ahead Logging) is a crucial component in GreptimeDB that persistently records every data modification to ensure no memory-cached data loss. We implement WAL as a module in the Datanode service using a persistent embedded storage engine, raft-engine. When deploying GreptimeDB in the public cloud, we can persistently store WAL data in cloud storage(AWS EBS, GCP persistent disk, etc.) to achieve 0 RPO(Recovery Point Objective). However, the deployment has a significant RTO(Recovery Time Objective) because WAL is tightly coupled with Datanode. Additionally, the rafe-engine can't support multiple log subscriptions, which makes it difficult to implement region hot standby and region migration.

To resolve the above problems, we decided to design and implement a remote WAL. The remote WAL decouples the WAL from the Datanode to the remote service, which we chose to be Apache Kafka. Apache Kafka is widely adopted in stream processing and exhibits excellent distributed fault tolerance and a subscription mechanism based on topics. With the release v0.5.0, we introduced Apache Kafka as an optional storage engine for WAL.

Run Standalone GreptimeDB with Remote WAL

It's very easy to experience remote WAL by using the Docker with the following steps. In this quick start, we will create a Kafka cluster with one broker in KRaft mode and use it as remote WAL for the standalone GreptimeDB.

Step 1: Create a user-defined bridge of the Docker network

The user-defined bridge can help us create a bridge network to connect multiple containers:

docker network create greptimedb-remote-wal
docker network create greptimedb-remote-wal

Step 2: Start the Kafka Service

Use the KRaft mode to start the singleton Kafka service:

docker run \
  --name kafka --rm \
  --network greptimedb-remote-wal \
  -p 9092:9092 \
  -e KAFKA_CFG_NODE_ID="1" \
  -e KAFKA_CFG_PROCESS_ROLES="broker,controller" \
  -e KAFKA_CFG_CONTROLLER_QUORUM_VOTERS="1@kafka:9093" \
  -e KAFKA_CFG_ADVERTISED_LISTENERS="PLAINTEXT://kafka:9092" \
  -e KAFKA_CFG_CONTROLLER_LISTENER_NAMES="CONTROLLER" \
  -e KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP="CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT" \
  -e KAFKA_CFG_LISTENERS="PLAINTEXT://:9092,CONTROLLER://:9093" \
  -e ALLOW_PLAINTEXT_LISTENER="yes" \
  -e KAFKA_BROKER_ID="1" \
  -e KAFKA_CFG_LOG_DIRS="/bitnami/kafka/data" \
  -v $(pwd)/kafka-data:/bitnami/kafka/data \
  bitnami/kafka:3.6.0
docker run \
  --name kafka --rm \
  --network greptimedb-remote-wal \
  -p 9092:9092 \
  -e KAFKA_CFG_NODE_ID="1" \
  -e KAFKA_CFG_PROCESS_ROLES="broker,controller" \
  -e KAFKA_CFG_CONTROLLER_QUORUM_VOTERS="1@kafka:9093" \
  -e KAFKA_CFG_ADVERTISED_LISTENERS="PLAINTEXT://kafka:9092" \
  -e KAFKA_CFG_CONTROLLER_LISTENER_NAMES="CONTROLLER" \
  -e KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP="CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT" \
  -e KAFKA_CFG_LISTENERS="PLAINTEXT://:9092,CONTROLLER://:9093" \
  -e ALLOW_PLAINTEXT_LISTENER="yes" \
  -e KAFKA_BROKER_ID="1" \
  -e KAFKA_CFG_LOG_DIRS="/bitnami/kafka/data" \
  -v $(pwd)/kafka-data:/bitnami/kafka/data \
  bitnami/kafka:3.6.0

The data will be stored in $(pwd)/kafka-data.

Step 3: Start the GreptimeDB with Remote WAL Configurations

Use the Kafka wal provider to start the standalone GreptimeDB:

docker run \
  --network greptimedb-remote-wal \
  -p 4000-4003:4000-4003 -p 4242:4242 \
  -v "$(pwd)/greptimedb:/tmp/greptimedb" \
  --name greptimedb --rm \
  -e GREPTIMEDB_STANDALONE__WAL__PROVIDER="kafka" \
  -e GREPTIMEDB_STANDALONE__WAL__BROKER_ENDPOINTS="kafka:9092" \
  greptime/greptimedb standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003 \
  --opentsdb-addr 0.0.0.0:4242
docker run \
  --network greptimedb-remote-wal \
  -p 4000-4003:4000-4003 -p 4242:4242 \
  -v "$(pwd)/greptimedb:/tmp/greptimedb" \
  --name greptimedb --rm \
  -e GREPTIMEDB_STANDALONE__WAL__PROVIDER="kafka" \
  -e GREPTIMEDB_STANDALONE__WAL__BROKER_ENDPOINTS="kafka:9092" \
  greptime/greptimedb standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003 \
  --opentsdb-addr 0.0.0.0:4242

We use the environment variables to specify the provider:

  • GREPTIMEDB_STANDALONE__WAL__PROVIDER: Set kafka to use Kafka remote WAL;
  • GREPTIMEDB_STANDALONE__WAL__BROKER_ENDPOINTS: Specify the advertised listeners for all brokers in the Kafka cluster. In this example, we will use the Kafka container name, and the bridge network will resolve it into IPv4;

Step 4: Write and Query Data

There are many ways to connect to GreptimeDB, so let's choose mysql.

  1. Connect the GreptimeDB

    mysql -h 127.0.0.1 -P 4002
    mysql -h 127.0.0.1 -P 4002
  2. Write the testing data

    • Create the table system_metrics

      sql
      CREATE TABLE IF NOT EXISTS system_metrics (
          host STRING,
          idc STRING,
          cpu_util DOUBLE,
          memory_util DOUBLE,
          disk_util DOUBLE,
          ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
          PRIMARY KEY(host, idc),
          TIME INDEX(ts)
      );
      CREATE TABLE IF NOT EXISTS system_metrics (
          host STRING,
          idc STRING,
          cpu_util DOUBLE,
          memory_util DOUBLE,
          disk_util DOUBLE,
          ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
          PRIMARY KEY(host, idc),
          TIME INDEX(ts)
      );
    • Write the testing data

      sql
      INSERT INTO system_metrics
      VALUES
          ("host1", "idc_a", 11.8, 10.3, 10.3, 1667446797450),
          ("host1", "idc_a", 80.1, 70.3, 90.0, 1667446797550),
          ("host1", "idc_b", 50.0, 66.7, 40.6, 1667446797650),
          ("host1", "idc_b", 51.0, 66.5, 39.6, 1667446797750),
          ("host1", "idc_b", 52.0, 66.9, 70.6, 1667446797850),
          ("host1", "idc_b", 53.0, 63.0, 50.6, 1667446797950),
          ("host1", "idc_b", 78.0, 66.7, 20.6, 1667446798050),
          ("host1", "idc_b", 68.0, 63.9, 50.6, 1667446798150),
          ("host1", "idc_b", 90.0, 39.9, 60.6, 1667446798250);
      INSERT INTO system_metrics
      VALUES
          ("host1", "idc_a", 11.8, 10.3, 10.3, 1667446797450),
          ("host1", "idc_a", 80.1, 70.3, 90.0, 1667446797550),
          ("host1", "idc_b", 50.0, 66.7, 40.6, 1667446797650),
          ("host1", "idc_b", 51.0, 66.5, 39.6, 1667446797750),
          ("host1", "idc_b", 52.0, 66.9, 70.6, 1667446797850),
          ("host1", "idc_b", 53.0, 63.0, 50.6, 1667446797950),
          ("host1", "idc_b", 78.0, 66.7, 20.6, 1667446798050),
          ("host1", "idc_b", 68.0, 63.9, 50.6, 1667446798150),
          ("host1", "idc_b", 90.0, 39.9, 60.6, 1667446798250);
  3. Query the data

    sql
    SELECT * FROM system_metrics;
    SELECT * FROM system_metrics;
  4. Query the Kafka topics:

    # List the Kafka topics.
    docker exec kafka /opt/bitnami/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
    # List the Kafka topics.
    docker exec kafka /opt/bitnami/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

    By default, all the topics start with greptimedb_wal_topic, for example:

    docker exec kafka /opt/bitnami/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
    greptimedb_wal_topic_0
    greptimedb_wal_topic_1
    greptimedb_wal_topic_10
    ...
    docker exec kafka /opt/bitnami/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
    greptimedb_wal_topic_0
    greptimedb_wal_topic_1
    greptimedb_wal_topic_10
    ...

Step 5: Cleanup

  • Stop the greptimedb and Kafka

    docker stop greptimedb
    docker stop kafka
    docker stop greptimedb
    docker stop kafka
  • Remove the user-defined bridge

    docker network rm greptimedb-remote-wal
    docker network rm greptimedb-remote-wal
  • Remove the data

    The data will be stored in the working directory that runs greptimedb:

    rm -r <working-dir>/greptimedb
    rm -r <working-dir>/kafka-data
    rm -r <working-dir>/greptimedb
    rm -r <working-dir>/kafka-data