Apache Druid Architecture and core concepts in layman terms

6 min readAug 2, 2022

Apache Druid is a leading database for modern analytics applications. The main features of a modern analytics applications are,

Interactive data querying — Even with petabytes of data, the application would be able to provide sub second respond time to queries.
High concurrency — Thousands of users being able to execute thousands of concurrent queries affordably.
Supporting both real time and historical data — Modern analytics applications would be capable of ingesting data from real time sources like Kinesis, Kafka etc and also from batch sources like HDFS, S3 etc.

Druid is an open-source, high-performance, real-time analytics database that is flexible, efficient, and resilient.

Fig 1. Druid Architecture. Source: https://druid.apache.org/docs/latest/design/architecture.html

In the architecture, three types of servers and three external dependencies are involved.

Servers involved

Master Server — manages data ingestion & availability
Query Server — Takes queries, executes & return the results
Data Server — Stores queryable data & executes ingestion workloads

External dependencies

Deep storage — It is used as a backup of the ingested data. As long as druid processes are able to access the data stored in the deep storage, you will not lose data irrespective of how many druid nodes are dead. Once the data in deep storage is lost, you lost the data forever. Deep storage is also used as a way to transfer data in the background between Druid processes.

Historical processes of Data Servers store queryable data. They use in-memory cache & the data cached as segments in local disk while querying, offering the best latency as possible. It doesn’t rely on deep storage to serve the query requests. Deep storage data is not used for active querying & utilized as a backup.

Metadata storage — Generally RDBMS like MySQL or PostgreSQL is used as a meta store, but in case of single-server deployment, it is locally owned Derby database. It stores system metadata like Segments records, Rule records, Configuration records, Task-related tables, Audit records and not the actual data.
Zookeeper — It is used for internal service discovery, coordination, and leader election.

Storage Design

In Druid, data is stored in datasources which is similar to tables in RDBMS. Each datasource is partitioned by time and if needed they can be further be partitioned (secondary partitioning) by other attributes or dimensions. Each partition time range is called “chunk” and each chunk has one or more segments based on the size of the segment file. Please refer the below diagram to understand how data is partitioned based on time & stored in Druid.

Queries from clients are handled by broker in Query Server nodes. The broker finds the relevant segments for the queries by filtering based on the timestamp, other attributes (if sub partitioned) and distributes the rewritten subquery to the relevant Historical/Middle Managers processes of Data Servers. The Historical/Middle Manager processes return results to the Broker. The Broker returns the final result to the requested client.

Creating & Publishing segments flow:

First, each segment is created by Middle Manager process as mutable & uncommitted. Data is saved column oriented in segments that it helps in retrieving only required columns while querying. Each column storage is further optimized by compressing the data as per their type ie, string columns are compressed using dictionary encodings, numeric columns are stored using compressed raw values. These segment data is queryable since uncommitted state. The segments are then prepared by indexing, creating dictionary encodings & compressing for efficient storage and faster scan & filter. Once prepared, they become committed & get published to Deep Storage. Then entries are made in Metadata Store for the segments to understand what data is available in the cluster. Segments are versioned in order to support batch-mode overwriting. Whenever the data is overwritten, new set of segments is created with same datasource, time interval but with higher version number.

When data is ingested into Druid, it is automatically indexed, partitioned by time, sub partitioned(secondary partition) by other attributes and optionally pre-aggregated. Indexes created for storing segments help in faster scan & filter operations and pre-aggregation(roll-up) helps in reducing the storage cost. The concepts of search index and roll-up are explained below with examples.

Search Indexes:

Druid uses inverted indexes (in particular, compressed bitmaps) for fast searching and filtering. As mentioned earlier, each chunk has one or more segments in it. In segment files, data is stored and partitioned by time. Each segment file follows an internal structure laid out in separate data structures for each column.

Timestamp and metric columns values are compressed with LZ4. Metrics columns are usually numeric values and used in aggregations and computations. Dimension columns are different since they support filter and group-by operations. Each dimension is stored with the help of three data structures,

Please find below, the three data structures for the column — Page in Dimensions of Fig 3.

A dictionary that maps values to IDs

Dictionary that encodes column values 
{ “Justin Bieber”: 0, “Ke$ha”: 1 }

2. A list of column values with 1 for value present & 0 for absence.

Column data [0, 0, 1, 1]

3. A bitmap for each column value to indicate which row contains that column value.

Bitmaps — one for each unique value of the column 
value=”Justin Bieber”: [1,1,0,0] 
value=”Ke$ha”: [0,0,1,1]

These bitmap are inverted indexes that help in faster querying and reduction in storage space through compression.

For filter queries with conditions like AND/OR/NOT, answers can be retrieved with the help of Datastructures #1 & #3.

#1 for knowing the encoded ID for the filter column values.#3 for fetching the rows that has that filter column values.

For group-by with filter queries, answers can be retrieved with the help of Datastructures #1, #2 & #3.

#1 for knowing the encoded ID for the filter column values.#3 for fetching the rows that has that filter column values.#2 for grouping the rows that has that filter column values.

With the help of these indexes, Druid gives faster results for filter/group-by operations by accessing only the relevant rows & columns.

Roll ups:

Roll ups is a feature that can be enabled or disabled during data ingestion to dramatically reduce the size of data that needs to be stored. It in-turn helps in saving on storage resources and faster query results.

In many cases, individual events are not much needed for analysis because there may be trillions of events generated. So, summarization of such data gives better insights than the individual events. This summarization of data is made possible with a process called “roll-up” in Druid. Demonstration with an example will give a clear picture of what is what. Please consider the following example where each event has the timestamp, domain, gender and whether clicked or not. The left table represents individual events recorded from Jan 1st 2011, 00:00:00 to Jan 1st 2011, 01:00:00. These events recorded at seconds level, do not add much value for data analytics. Hence, during ingestion, roll up can be enabled to summarize events at hour level. The right table has summarized data that shows the number of clicks by each gender & domain level. But with roll up enabled, we loose individual events and they cannot be queried. The rollup granularity is the minimum granularity you will be able to explore data at and events are floored to this granularity.

Source :Source: Interactive Exploratory Analytics with Druid | DataEngConf SF ’17. Left Table- Individual Events. Right Table — Summarized Events.

Druid supports wide range of programming languages including Python, R, JavaScript, Clojure, Elixir, Ruby, PHP, Scala, Java, .NET, and Rust. Using any of these languages, queries can be fired either in SQL or JSON-over-HTTP format. Data stored in druid can be visualized using visualization tools like Apache Superset/ Imply Pivot/ Tableau etc.

Conclusion:

In this post, Druid’s architecture components and concepts like storage design, how data is ingested & published, data retrieval while querying, indexes supporting faster slice & dice and roll ups enabling reduced storage cost are explained with examples.

Apache Druid Architecture and core concepts in layman terms

Servers involved

External dependencies

Storage Design

Conclusion:

Written by Teepika R M