According to Katana Graph, large graph workloads require large cloud hardware


According to Gartner, graph technologies will be used in 80% of data and analytics innovations by 2025, a significant increase from 10% in 2021. One of the companies hoping to capture a piece of this booming market is Katana Graph making its mark by developing a graph database platform that can leverage advances in distributed hardware to handle large graph workloads.

Katana Graph was co-founded in 2020 by two computer science professors at the University of Texas at Austin, CTO Chris Rossback and CEO Keshav Pigali. Previously a member of the VMware Research Group, Rossback has focused his academic research in areas such as virtualization, accelerators and parallel architectures. Pigali, on the other hand, according to his resume, specializes in parallel programming and distributed computing.

Though the Austin-based company is fairly young, the technology underlying Katana Graph’s property graph database has its roots in its co-founders’ research going back decades, says Farshid Sabet, the company’s chief business officer.

“The value of the company is when the data is bigger, when you need to do very deep analysis, as you go through the nodes and jump deeper, the computational intensity grows exponentially,” says Sabet.

Distributed Graphs

The Katana Graph distributed parallel computing framework consists of three parts, including a streaming partitioner, a graph compute engine, and a communication engine. The partitioner is responsible for distributing the data across different nodes of the cluster while the compute engine orchestrates and schedules the work across the nodes. Meanwhile, the communication engine enables the nodes to do their jobs efficiently.

Katana Graph uses multiple engines for graph data (Image source: Katana Graph)

The company is taking a fresh look at the problem of how best to build a distributed graph database, says Sabet, who previously worked at Movidius and Intel before joining Katana Graph. This allows Katana Graph to operate at scale and speeds unmatched by Graph competitors, he claims.

“A lot of people take it easy [approach] in terms of graph partitioning,” says Sabet Datanami. “But as chart sizes get larger and new cases come in, some of those assumptions aren’t accurate.”

The company’s core IP resides in the Graph communication element of the framework, Sabet says. Advances at this level enable Katana Graph to run very large graph workloads at high speeds. They also allow the platform to run different workloads concurrently in a dataflow style, similar to how Databricks works, Sabet says.

Katana Graph provides four ways to query data on the graph, including Graph Queries (contextual search); Graph Analytics (pathfinding, centrality and community recognition); Graph mining (pattern recognition); and Graph AI (prediction).

Developers can program workflows in Katana Graph using Cypher, originally developed by Neo4j and later open source. Many graph database vendors support Cypher. According to Sabet, Katana Graph also supports Python and C++.

hardware boosting

Katana Graph can leverage different types of hardware including CPUs, GPUs, FPGAs and ARM chips. The software can also support Intel’s Optane memory and accelerators. But it’s the distributed nature of Katana Graph that sets it apart, says Sabet.

Distributed memory communication is an important factor in the efficiency of scale-out graphics data environments, says Katana Graph (Gorodenkoff/Shutterstock)

“We’ve done a lot of work over the past nine years… to be able to use the distributed memories, even some of the memories of different types,” says Sabet. “Most of these [graph] Environments only run on one CPU in this memory. Nvidia has something that runs on a GPU and a machine. If you want to combine this together [for scalability] The only game in town is not only supporting multiple hardware, but also supporting distributed hardware that uniformly addresses the graph.”

The core technologies underlying Katana Graph were originally developed and tested on a High Performance Computing (HPC) infrastructure at UT-Austin, according to Sabet. These machines had huge storage capacities, which was very expensive ten years ago, but was necessary to solve sophisticated scientific and engineering problems.

As storage costs have fallen, particularly in public cloud environments, this has opened up new opportunities for users to run analytics and AI workloads that were previously unaffordable in the commercial space. That works in favor of Katana Graph, which has been shown to scale to 256 nodes and graphs with more than 3.5 billion nodes and 128 billion edges (it’s designed to scale beyond 1 trillion edges, the company says).

“Graph is really computationally and memory intensive,” says Sabet. “The supercomputers of 10 years ago, 12 years ago are the servers we have today. That is why the company does very well here.”

A dozen years ago, many developers were looking for a way to fit their applications into a CPU using as little memory as possible. “That was the right decision 12 years ago,” says Sabet. “But these guys [Rossback and Pigali] did not have this limitation. They thought about what we need to be able to solve this problem.”

Growth of GNNs

One of the benefits of Katana Graph is that developers can bring machine learning and AI models they’ve already built with frameworks like XG Boost and PyTorch into the Katana Graph platform, says Sabet.

“We can combine all of these without you having to change anything or re-mod the algorithm. You use and add to these existing frameworks and libraries [your] machine learning,” he says. “You want to make sure developers are comfortable with the environments they have.”

Graphic neural networks or GNNs combine the power of deep learning and graph databases and are currently an area of ​​particular interest. Instead of training a convolutional or recurrent neural network to identify patterns in an image or in a sequence of words, GNNs can recognize and exploit patterns in the connectivity of the data elements that make up the graph.

The accuracy, performance, and cost advantages of GNNs are gaining a lot of supporters now, he says. For example, a biomedical researcher could use GNNs running in Katana Graph to identify new proteins, which are expressed as a convoluted collection of molecules in a graph database. “You train it to look for this group of proteins,” says Sabet.

In addition to biomedical researchers, Katana Graph has attracted interest from the financial services space. Fraud detection is a classic use case for graph databases, and Katana Graph has its share of those customers and prospects, says Sabet.

“There are many fraud detection technologies out there. But this one can predict the fraud that could happen with a higher accuracy,” he says. “They want the updated version of machine learning algorithms like XGBoost and other techniques.” GNN provides that updated version, he says.

Katana Graph’s third area of ​​focus is cybersecurity. With so many cyber signals floating around the internet, Graph Analytics offers a powerful tool to help the good guys connect the dots and keep the bad guys on their toes. The company partially began working with DARPA to bring those signals together, Sabet says.

Katana Graph has a handful of paying clients and an active pipeline for many more. The company closed a $28.5 million Series A funding round in 2021. As a result, Sabet says the company has grown from fewer than 20 employees to almost 100 over the course of a year.

“We have experts from different fields [joining the company],” he says. “Most of the employees are on the technical side, but the business side has also grown. We’ve been able to hire very capable people from our competitors [like] TigerGraph, Neo, Google and Microsoft.”

The company’s software is only available in the cloud at this time, and it plans to launch a managed cloud offering soon.

Similar articles:

Can streaming graphics clean up the data pipeline chaos?

AWS introduces graph database called Neptune

Graph databases everywhere by 2020, says Neo4j boss


Comments are closed.