Share on
·

Harnessing Materialized Views and ClickHouse for High-Performance Analytics

Ruslan Shcherbin·

Overview

At Inigo, we were seeking a database solution that could handle a high volume of raw data for analytics. After exploring various alternatives, including SQLite, Snowflake, PostgreSQL, we decided to experiment with Materialized Views using ClickHouse on a relatively slow machine. In this post, we share our findings after loading billions of rows of data and conducting measurements.

Database Alternatives Considered

Before arriving at ClickHouse, we tried several other databases:

  1. Snowflake - too slow and costly for our needs. While it performs well for processing in-house data, it becomes quite expensive when handling real-time customer data within a product, which negatively impacts the product's unit economics. The lack of a local Snowflake emulator further complicates development, as it prevents running continuous unit tests against the database. Additionally, the user interface did not align with our preferred database usage. Since each service had its own separate table, we had to obtain and hardcode table names in the UI. This made it impossible to retrieve data from all tables using a single query in the user interface.
  2. PostgreSQL - An excellent database, but its transactional nature made it unsuitable for large analytic workloads. We were able to get it to work with around 100K rows, but past that, the indexes were growing out of control and the cost of running a Postgres cluster didn’t make sense.

What is ClickHouse?

Clickhouse is a columnar database specifically designed to handle high volumes of data while providing fast query execution. Materialized views in Clickhouse serve as pre-aggregated datasets that can significantly improve the performance of analytical queries. Furthermore, Clickhouse's ability to automatically manage and update materialized views as new data is ingested simplifies the maintenance process, making it an attractive choice for those seeking an efficient and powerful analytical database solution.

Why Clickhouse

We ended up choosing Clickhouse for its general high performance, ability to handle large scale, and ability to run it locally for testing purposes.

Inigo’s ClickHouse Setup

Our current ClickHouse setup is a single database where nearly every Inigo service has its own table, with some tables containing billions of rows of data. It runs on 4 vCPUs, 32GB RAM for the development environment and 16 vCPUs, 128GB RAM for the production environment. Setting up ClickHouse was challenging, as understanding the nuances of each index, sharding, and materialized views required considerable effort. However, the end result has been a system capable of loading and querying vast amounts of analytics data in a scalable manner, both in terms of hardware cost and engineering development time.

Materialized Views in ClickHouse

Given that we have billions of rows in some of our raw tables, we strive to minimize joins whenever possible. Instead, we have consolidated the data into several materialized views, each grouping the data into distinct time window buckets. As a result, most of the time, we are querying thousands of aggregated rows instead of millions of raw data rows, resulting in faster real-time filtering.

Creating materialized views

Our raw tables all contain a timestamp column observed_at. We then aggregate the data rows into specified intervals for the materialized tables. For example, if the interval is 1 minute, we use the toStartOfInterval(observed_at, INTERVAL 1 minute) function to aggregate data into one minute intervals. Each row in the materialized views contains several raw data values, all corresponding to a particular minute based on the observed_at column. By creating materialized views with different intervals, such as 6 minutes, 40 minutes, and 3 hours, we can further enhance query performance and enable more efficient data analysis across various timeframes.

We exclude all columns containing unique data, such as trace_id, from materialized views. When we need to filter by trace_id, we query the raw data. It is worth noting that raw data queries are reasonably quick in Clickhouse, as Clickhouse can efficiently sift through a large number of rows in real-time using the appropriate index for each column.

Grouping data

In our analytics, we primarily use grouped queries to obtain data. Standard queries and materialized view queries use the same SQL files, but there is a slight modification in the way they handle counting. Instead of using count(1) to count the rows, materialized view queries use sum(calls) with a derived column named calls. The column calls is created by using count(1) as calls in the materialized view definition. This change allows us to aggregate the count of rows from the materialized view, resulting in more efficient querying and better performance.

Merge and State Suffix

In Clickhouse, the State and Merge suffixes are used with aggregation functions to handle intermediate aggregation states and merge them efficiently. These suffixes are particularly useful when working with distributed databases or when we need to combine aggregated results from multiple sources.

  • The State suffix: The aggregation functions with the State suffix return an intermediate aggregation state rather than the final result. These intermediate states are stored in a compact binary format that represents the aggregation in progress.

For example, if we want to calculate the 95th (0.95) and 99th (0.99) percentiles of a column called server_process_time, we can use the quantilesState(value) function. This function returns the intermediate state of the quantiles calculation, which we can store in a separate table or combine with other intermediate states.

  • The Merge suffix: The aggregation functions with the Merge suffix take the intermediate aggregation states produced by their corresponding State functions and merge them to produce the final aggregation result. This is useful when we have multiple intermediate states that we need to combine into a single aggregated result.

For instance, to obtain the 99th percentile of the aggregated server_process_time, we would use quantileMerge(0.99)(quantiles_server_process_time), where quantiles_server_process_time is the intermediate state resulting from the quantilesState function.

Results

By leveraging Materialized Views and ClickHouse, we achieved the following:

  1. The database is now faster than our Golang code, allowing us to use pooling and improving our UI responsiveness.
  2. Confidence in query speed, enabling us to identify issues outside the database, such as in tracing extensions.

The scalability of this solution depends on various factors, including the scaling setup. ClickHouse officially offers a cloud service with 720 GiB RAM, 180 vCPU, and 10 TB of data, indicating its potential for handling significant workloads. However, we have been able to easily scale to 1B rows and still growing.

Conclusion

While ClickHouse does have limitations, such as the absence of transactions and potential inaccuracies in Materialized Views, these drawbacks are minor for our analytics use case. Overall, ClickHouse, when paired with Materialized Views, has proven to be an excellent solution for handling large-scale analytics data with impressive performance and scalability.

Ready to accelerate your GraphQL adoption?
Start Inigo for free
*No credit card needed
Join our newsletter