Alluxio 2.9 scales orchestration of open source data


Alluxio 2.9 brings users of the open source data orchestration platform new capabilities designed to scale deployments across multiple environments in a secure approach.

The sellerbased in San Mateo, California, develops its eponymous data orchestration platform with open source community and enterprise editions.

Alluxio enables organizations to access data from multiple locations, including on-premises and cloud, and then run queries for data analytics, business operations, and machine learning applications.

The new version, which became generally available on November 16th, follows that Alluxio 2.8 update an improved policy engine for data management was introduced in May.

The new update builds on the previous version by introducing a cross-environment sync feature that allows organizations to more easily control and update multiple Alluxio clusters. Alluxio also now has multi-tenant isolation and policy controls built in, making it easier for different groups to use the same cluster.

Alluxio competes with a number of different data orchestration technologies that help companies bring disparate data sources together, including the Open source Apache hop platformDenodo and K2View.

“Alluxio provides a self-contained solution that connects any computing engine to any data store, anywhere,” said Kevin Petrie, analyst at Eckerson. “This helps organizations run advanced analytics projects across hybrid and multi-cloud environments.”

Multi-tenant isolation can provide a boost for data orchestration

New features in Alluxio 2.9 include multi-tenant isolation, making it easier for different teams to use the same Alluxio instance.

The updates also make it easier for multiple tenants to use their own storage and processing power but still share metadata, Petrie said.

This means that a data science team may a sandbox to train machine learning models, and a business intelligence team could have a separate platform to manage operational dashboards, Petrie said. The two teams can isolate resources to simplify chargeback while sharing metadata to help each other.

“It’s like two football games on two fields next to each other. They keep each other at bay,” Petrie said. “But the umpires share metadata, which means they enforce the same rules and update each other on the score.”

Cross-cluster synchronization extends data orchestration capabilities

The new version is a significant advance in the scalability of the vendor’s platform, said Adit Madan, director of product management at Alluxio.

The cross-cluster feature is the first time multiple instances of Alluxio can be easily deployed in a way that makes them aware of each other so they can be managed together, Madan said.

Previously, a common architecture for Alluxio was for each business unit within an organization to have its own copy of Alluxio, isolating that unit from a nearby unit that might be accessing the same data lake storage. The complexity was further compounded as Alluxio instances were deployed across multiple cloud environments, Madan said.

With the update, each of the different running instances can be synchronized, be it for different business units of the same company in one cloud or across multiple clouds. With cross-cluster synchronization, companies get a consistent view of data regardless of the environment in which they operate.

As part of enabling easier multi-cloud operations, Alluxio is also releasing an open-source Kubernetes operator. The new operator includes the configuration options that help users to deploy and run Alluxio in Kubernetes environments running in the cloud.

Looking ahead, Alluxio will try to fill a number of gaps in the platform, the provider said.

Alluxio does not currently have a software-as-a-service platform. Instead, companies deploy the technology themselves in the cloud.

Madan said there will likely be an Alluxio Cloud SaaS service at some point in the future. However, the next big step for the provider will be the annual release of Alluxio 3.0 in the first half of 2023, which will focus on improving the user experience for large datasets.


Comments are closed.