Russian search giant Yandex announced this week that it has outsourced its distributed columnar analytics database ClickHouse into its own company. New York City-based ClickHouse Inc. also received $ 50 million in Series A capital to fuel its business.
Moscow-based Yandex began developing the ClickHouse Database in 2009, and a few years later the OLAP backend for its Yandex.Metrica web analytics service went live. The main advantage of the database was the ability to continuously process large amounts of data on a large scale with relatively low latency, which remains a technical challenge for companies with big data needs.
By storing data in pre-aggregated columns and using other techniques – including but not limited to compression, vector calculations, and the ability to scale linearly – ClickHouse was able to achieve the highest levels of performance. According to Yandex, ClickHouse is capable of scanning hundreds of millions of rows (equivalent to tens of gigabytes) per second, which enables users to run SQL queries on datasets in the petabyte range with latencies of less than a second. That’s 100x to 1,000x faster than traditional databases, the company claims.
In a blog post, ClickHouse co-founder and CTO Alexey Milovidov, the original creator of ClickHouse, discussed the history of the database and the source of its technological advantage.
“The most notable benefit of ClickHouse is its extremely high query processing speed and data storage efficiency,” wrote Milovidov. âIn previous generation data warehouses, you cannot run interactive queries without pre-aggregation. or you cannot insert new data in real time while using interactive queries; or you can’t just save all of your data. With ClickHouse you can keep all records for as long as necessary and create real-time interactive reports on the data. “
What is the secret sauce that ClickHouse makes so fast? According to “Distinguishing Features” Section of the ClickHouse website that avoids storing additional values ââand storing data as a primary key, as is the case with a ârealâ columnar database, as the main aspect of their benefit. (It’s also refreshing to see the company admitting the downsides of its approach, including no full-fledged transactions and no support for updates, other than some batch update and delete functionality to comply with GDPR.)
According to Milovidov, there isn’t a single thing. “…[T]there is not a single ‘silver bullet’ here, âhe writes. “The main advantage is the attention to detail in the most extreme production loads.”
Shortly after the implementation of ClickHouse at Yandex.Metrica, it was largely taken over by Yandex, the largest Internet company in Europe with more than 14,000 employees. At that point, Milovidov said he knew the software needed to be distributed more widely.
âPerhaps ClickHouse is too good to run just in Yandex?â He wrote on the blog. âGoing open source is hard, but it’s a big win. While maintaining a popular open source product takes a tremendous amount of effort and responsibility, the benefits outweigh any costs for us. “
In 2016, Yandex released ClickHouse as an open source offering with Apache License 2.0. This led to exponential growth and adoption by thousands of companies around the world, including Uber, Comcast, eBay, and Cisco, according to Yandex.
Some of the customer acceptance stories are compelling. For example, Uber has adopted ClickHouse as its central logging platform for processing millions of logs per second from thousands of services representing several petabytes of data on the service. Corresponding its attribution in February 2021Clickhouse has achieved a 10-fold increase in performance compared to its ELK implementation (Elastic, Logstash, Kibana).
Spotify, meanwhile, used ClickHouse to add its A / B testing program to its Google Cloud-based log management system that replaces 2,500 node Hadoop clusters. The company needed to be able to run hundreds of queries per second representing hundreds of billions of rows per day. When choosing ClickHouse over BigQuery, the simplicity of the architecture, a comprehensive set of built-in functions and aggregations, and the superset integration were named among other things.
Deutsche Bank chose ClickHouse as the basis for their data warehouse, which serves a variety of use cases including regulatory compliance, risk, trades and know-your-customer initiatives. Corresponding this presentation, the bank had tried several other databases, including KDB +, Vertica, Hive, and Spark. Today it has chosen a combination of Spark, Alpakka, Kafka, Tableau, RShiny and Clickhouse to support its queries.
“The variety of ways companies use ClickHouse is incredibly compelling and speaks volumes to the strength of the technology,” said Yury Izrailevsky, ClickHouse co-founder and president of Product and Engineering, who served as vice president of engineering at Google has given up and will lead product development at ClickHouse. “By founding ClickHouse Inc. we can concentrate on making the product even more powerful, especially when it is used in cloud environments.”
Milovidov and Izrailevsky will be joined by Silicon Valley veteran Aaron Katz, who is the CEO and co-founder of the New York company. Mike Volpi, Partner at Index Ventures, who co-led the round with Benchmark, sees something in ClickHouse that reminds him of other high-profile tech companies.
âAt Index, we believed and invested in data infrastructure early on, and we’ve been fortunate to have worked with leaders like Elastic, Confluent and Datadog since the beginning,â says Volpi. “It is clear that, given its impressive adoption and community buzz, ClickHouse has an equally exciting development.”
A purely voluntary deep learning army
Cloud is the new focus for data warehousing
Do customers want open data platforms?