Version: 0.14

Quick Start

This guide provides step-by-step instructions for quickly writing and querying logs in GreptimeDB.

GreptimeDB supports a pipeline mechanism to parse and transform structured log messages into multiple columns for efficient storage and querying.

For unstructured logs, you can write them directly into a table without utilizing a pipeline.

Write logs by Pipeline

Pipelines enable automatic parsing and transformation of log messages into multiple columns, as well as automatic table creation and alteration.

Write JSON logs using the built-in Pipeline (Experimental)

warning

The JSON feature is currently experimental and may change in future releases.

GreptimeDB offers a built-in pipeline, greptime_identity, for handling JSON log formats. This pipeline simplifies the process of writing JSON logs.

curl -X POST \
  "http://localhost:4000/v1/events/logs?db=public&table=pipeline_logs&pipeline_name=greptime_identity" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic {{authentication}}" \
  -d '[
    {
      "name": "Alice",
      "age": 20,
      "is_student": true,
      "score": 90.5,
      "object": { "a": 1, "b": 2 }
    },
    {
      "age": 21,
      "is_student": false,
      "score": 85.5,
      "company": "A",
      "whatever": null
    },
    {
      "name": "Charlie",
      "age": 22,
      "is_student": true,
      "score": 95.5,
      "array": [1, 2, 3]
    }
  ]'

Authorization header.
pipeline_name=greptime_identity specifies the built-in pipeline.
table=pipeline_logs specifies the target table. If the table does not exist, it will be created automatically.

The greptime_identity pipeline automatically creates columns for each field in the JSON log. A successful command execution returns:

{"output":[{"affectedrows":3}],"execution_time_ms":9}

For more details about the greptime_identity pipeline, please refer to the Manage Pipelines document.

Write logs using a custom Pipeline

Custom pipelines allow you to parse and transform log messages into multiple columns based on specific patterns, and automatically create tables.

Create a Pipeline

GreptimeDB provides an HTTP interface for creating pipelines. Here is how to do it:

First, create a pipeline file, for example, pipeline.yaml.

processors:
  - dissect:
      fields:
        - message
      patterns:
        - '%{ip_address} - - [%{timestamp}] "%{http_method} %{request_line}" %{status_code} %{response_size} "-" "%{user_agent}"'
      ignore_missing: true
  - date:
      fields:
        - timestamp
      formats:
        - "%d/%b/%Y:%H:%M:%S %z"

transform:
  - fields:
      - ip_address
      - http_method
    type: string
    index: inverted
    tag: true
  - fields:
      - status_code
    type: int32
    index: inverted
    tag: true
  - fields:
      - request_line
      - user_agent
    type: string
    index: fulltext
  - fields:
      - response_size
    type: int32
  - fields:
      - timestamp
    type: time
    index: timestamp

The pipeline splits the message field using the specified pattern to extract the ip_address, timestamp, http_method, request_line, status_code, response_size, and user_agent. It then parses the timestamp field using the format %d/%b/%Y:%H:%M:%S %z to convert it into a proper timestamp format that the database can understand. Finally, it converts each field to the appropriate data type and indexes it accordingly. It is worth noting that the request_line and user_agent fields are indexed as fulltext to optimize full-text search queries. And there must be one time index column specified by the timestamp.

Execute the following command to upload the configuration file:

curl -X "POST" \
  "http://localhost:4000/v1/events/pipelines/nginx_pipeline" \
     -H 'Authorization: Basic {{authentication}}' \
     -F "[email protected]"

After successfully executing this command, a pipeline named nginx_pipeline will be created, and the result will be returned as:

{"name":"nginx_pipeline","version":"2024-06-27 12:02:34.257312110Z"}.

You can create multiple versions for the same pipeline name. All pipelines are stored at the greptime_private.pipelines table. Please refer to Query Pipelines to view the pipeline data in the table.

Write logs

The following example writes logs to the custom_pipeline_logs table and uses the nginx_pipeline pipeline to format and transform the log messages.

curl -X POST \
  "http://localhost:4000/v1/events/logs?db=public&table=custom_pipeline_logs&pipeline_name=nginx_pipeline" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic {{authentication}}" \
  -d '[
    {
      "message": "127.0.0.1 - - [25/May/2024:20:16:37 +0000] \"GET /index.html HTTP/1.1\" 200 612 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\""
    },
    {
      "message": "192.168.1.1 - - [25/May/2024:20:17:37 +0000] \"POST /api/login HTTP/1.1\" 200 1784 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36\""
    },
    {
      "message": "10.0.0.1 - - [25/May/2024:20:18:37 +0000] \"GET /images/logo.png HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0\""
    },
    {
      "message": "172.16.0.1 - - [25/May/2024:20:19:37 +0000] \"GET /contact HTTP/1.1\" 404 162 \"-\" \"Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1\""
    }
  ]'

You will see the following output if the command is successful:

{"output":[{"affectedrows":4}],"execution_time_ms":79}

Write unstructured logs directly

When your log messages are unstructured text, you can write them directly to the database. However, this method limits the ability to perform high-performance analysis.

Create a table for unstructured logs

You need to create a table to store the logs before inserting. Use the following SQL statement to create a table named origin_logs:

The FULLTEXT INDEX on the message column optimizes text search queries
Setting append_mode to true optimizes log insertion by only appending new rows to the table

CREATE TABLE `origin_logs` (
  `message` STRING FULLTEXT INDEX,
  `time` TIMESTAMP TIME INDEX
) WITH (
  append_mode = 'true'
);

Insert logs

Write logs using the SQL protocol

Use the INSERT statement to insert logs into the table.

INSERT INTO origin_logs (message, time) VALUES
('127.0.0.1 - - [25/May/2024:20:16:37 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"', '2024-05-25 20:16:37.217'),
('192.168.1.1 - - [25/May/2024:20:17:37 +0000] "POST /api/login HTTP/1.1" 200 1784 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"', '2024-05-25 20:17:37.217'),
('10.0.0.1 - - [25/May/2024:20:18:37 +0000] "GET /images/logo.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"', '2024-05-25 20:18:37.217'),
('172.16.0.1 - - [25/May/2024:20:19:37 +0000] "GET /contact HTTP/1.1" 404 162 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1"', '2024-05-25 20:19:37.217');

The above SQL inserts the entire log text into a single column, and you must add an extra timestamp for each log.

Write logs using the gRPC protocol

You can also write logs using the gRPC protocol, which is a more efficient method.

Refer to Write Data Using gRPC to learn how to write logs using the gRPC protocol.

Differences between using a pipeline and writing unstructured logs directly

In the above examples, the table custom_pipeline_logs is automatically created by writing logs using pipeline, and the table origin_logs is created by writing logs directly. Let's explore the differences between these two tables.

DESC custom_pipeline_logs;

+---------------+---------------------+------+------+---------+---------------+
| Column        | Type                | Key  | Null | Default | Semantic Type |
+---------------+---------------------+------+------+---------+---------------+
| ip_address    | String              | PRI  | YES  |         | TAG           |
| http_method   | String              | PRI  | YES  |         | TAG           |
| status_code   | Int32               | PRI  | YES  |         | TAG           |
| request_line  | String              |      | YES  |         | FIELD         |
| user_agent    | String              |      | YES  |         | FIELD         |
| response_size | Int32               |      | YES  |         | FIELD         |
| timestamp     | TimestampNanosecond | PRI  | NO   |         | TIMESTAMP     |
+---------------+---------------------+------+------+---------+---------------+
7 rows in set (0.00 sec)

DESC origin_logs;

+---------+----------------------+------+------+---------+---------------+
| Column  | Type                 | Key  | Null | Default | Semantic Type |
+---------+----------------------+------+------+---------+---------------+
| message | String               |      | YES  |         | FIELD         |
| time    | TimestampMillisecond | PRI  | NO   |         | TIMESTAMP     |
+---------+----------------------+------+------+---------+---------------+

From the table structure, you can see that the origin_logs table has only two columns, with the entire log message stored in a single column. The custom_pipeline_logs table stores the log message in multiple columns.

It is recommended to use the pipeline method to split the log message into multiple columns, which offers the advantage of explicitly querying specific values within certain columns. Column matching query proves superior to full-text searching for several key reasons:

Performance Efficiency: Column matching query is typically faster than full-text searching.
Resource Consumption: Due to GreptimeDB's columnar storage engine, structured data is more conducive to compression. Additionally, the inverted index used for tag matching query typically consumes significantly fewer resources than a full-text index, especially in terms of storage size.
Maintainability: Tag matching query is straightforward and easier to understand, write, and debug.

Of course, if you need keyword searching within large text blocks, you must use full-text searching as it is specifically designed for that purpose.

Query logs

We use the custom_pipeline_logs table as an example to query logs.

Query logs by tags

With the multiple tag columns in custom_pipeline_logs, you can query data by tags flexibly. For example, query the logs with status_code 200 and http_method GET.

SELECT * FROM custom_pipeline_logs WHERE status_code = 200 AND http_method = 'GET';

+------------+-------------+-------------+----------------------+---------------------------------------------------------------------------------------------------------------------+---------------+---------------------+
| ip_address | http_method | status_code | request_line         | user_agent                                                                                                          | response_size | timestamp           |
+------------+-------------+-------------+----------------------+---------------------------------------------------------------------------------------------------------------------+---------------+---------------------+
| 127.0.0.1  | GET         |         200 | /index.html HTTP/1.1 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 |           612 | 2024-05-25 20:16:37 |
+------------+-------------+-------------+----------------------+---------------------------------------------------------------------------------------------------------------------+---------------+---------------------+
1 row in set (0.02 sec)

Full-Text Search

For the text fields request_line and user_agent, you can use matches_term function to search logs. Remember, we created the full-text index for these two columns when creating a pipeline.
This allows for high-performance full-text searches.

For example, query the logs with request_line containing /index.html or /api/login.

SELECT * FROM custom_pipeline_logs WHERE matches_term(request_line, '/index.html') OR matches_term(request_line, '/api/login');

+-------------+-------------+-------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+---------------+---------------------+
| ip_address  | http_method | status_code | request_line         | user_agent                                                                                                               | response_size | timestamp           |
+-------------+-------------+-------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+---------------+---------------------+
| 127.0.0.1   | GET         |         200 | /index.html HTTP/1.1 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36      |           612 | 2024-05-25 20:16:37 |
| 192.168.1.1 | POST        |         200 | /api/login HTTP/1.1  | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 |          1784 | 2024-05-25 20:17:37 |
+-------------+-------------+-------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+---------------+---------------------+
2 rows in set (0.00 sec)

You can refer to the Query Logs document for detailed usage of the matches_term function.

Next steps

You have now experienced GreptimeDB's logging capabilities. You can explore further by following the documentation below:

Pipeline Configuration: Provides in-depth information on each specific configuration of pipelines in GreptimeDB.
Managing Pipelines: Explains how to create and delete pipelines.
Writing Logs with Pipelines: Provides detailed instructions on efficiently writing log data by leveraging the pipeline mechanism.
Query Logs: Describes how to query logs using the GreptimeDB SQL interface.

Quick Start

Write logs by Pipeline​

Write JSON logs using the built-in Pipeline (Experimental)​

Write logs using a custom Pipeline​

Create a Pipeline​

Write logs​

Write unstructured logs directly​

Create a table for unstructured logs​

Insert logs​

Write logs using the SQL protocol​

Write logs using the gRPC protocol​

Differences between using a pipeline and writing unstructured logs directly​

Query logs​

Query logs by tags​

Full-Text Search​

Next steps​