8 CNCF projects for cloud-native persistent storage


Data persistence is a bit counter-intuitive when it comes to containerization. The ephemeral container is an ephemeral computing environment in which code is not stored forever. But you still need to store data somewhere on a physical disc!

The highly variable nature of containers conflicts with the need for stateful storage, a dilemma that has led to countless workarounds. To enable a stateful Kubernetes approach, teams typically have to rely on external tools and databases to store and transfer this data.

The Cloud Native Computing Foundation (CNCF) has emerged as a premier host of open-source technologies to support our cloud-infused world. And when it comes to persistent data storage, this is no exception – CNCF has a wide range of tools that integrate with Kubernetes to manage the administrative tasks of working with persistent storage volumes. Below we will review some of these tools hosted by CNCF. These packages range from providing cloud-native storage, to offering a standard interface between client applications and storage, to providing data backup and recovery options. Let’s dive in.

1. Tower

Storage orchestration for Kubernetes

Website | GitHub

Persistent storage systems require a lot of maintenance to keep them running. Rook is a cloud-native, open-source storage utility for Kubernetes that aims to automate some of a storage administrator’s tasks, such as: B. programmatic storage, migration, disaster recovery, monitoring and resource management. Rook supports file, block, and object storage types. As this introductory video shows, Rook actually uses the actual architecture of Kubernetes with special K8s operators. As of 2022, Rook, a tiered CNCF project, supports three storage providers – Ceph, Cassandra, and NFS. Developers can visit the Rook forum here to stay up to date on the project and ask questions.

2. Longhorn

Cloud-native distributed storage built on and for Kubernetes

GitHub | website

Longhorn is an open source distributed block storage tool for Kubernetes. With Longhorn, replicate storage for Kubernetes clusters and benefit from built-in incremental backups of persistent volumes. You can make these snapshots recurring and back them up to secondary object storage. According to the documentation, this works by “partitioning a large block storage controller into a number of smaller storage controllers” and thus helps alleviate the problems associated with storing various container-based microservices. Longhorn is also compatible with non-cloud hosted K8s clusters and has an elegant graphical management interface that is free to use. Similar to Rook, it is Kubernetes native. Originally developed by Rancher, Longhorn is now an incubation project within the CNCF.

3. CubeFS

Cloud-native distributed file system and object storage.

GitHub | website

CubeFS, formerly known as ChubaoFS, is a distributed file system designed to support large-scale cloud-native architectures. One study found that CFS is about three times faster compared to Ceph. CubeFS works by having client applications hosted in a container cluster talk to them volumes who communicate with a metadata subsystem and data subsystem. These volumes can be mounted in different containers to allow concurrent file sharing between many different clients. CubeFS’ advanced underlying metadata subsystem is self-distributed to increase performance and scalability. CubeFS could be used as a general purpose storage engine for multi-tenant access or to ensure consistency for replicas of the same file. In particular, as the documentation notes, a distributed file system like CubeFS could support the creation of machine learning models. At the time of writing, CubeFS is an incubation project within the CNCF.

4. K8auf

Kubernetes and OpenShift backup operator

GitHub | website

K8up, affectionately called “ketchup” by its developers, is a Kubernetes operator for performing backups. Conveniently distributed over a Helm chart, K8up is easy to deploy and customize for specific cloud-native backup use cases. K8up can be used to automatically back up all Persistent Volume Claims (PVC) marked as ReadWriteMany or with a custom label. You can also use K8up to initiate on-demand backups, schedule routine backups, schedule long-term archiving, and view and manage backups. K8up works with S3 compatible storage. At the time of writing, K8up is a sandbox project with the CNCF.

5. OpenEBS

Open Source Container Attached Storage (CAS)

GitHub | website

OpenEBS is another open-source project that aims to simplify the process of managing stateful workloads with cloud-native infrastructure. With OpenEBS, developers can use familiar K8s commands and APIs to control the storage of workloads for specific containers. The storage software itself is containerized and orchestrated by Kubernetes. The project calls this structure Container Attached Storage (CAS). Originally created and sponsored by MayaData, OpenEBS is a sandbox project at CNCF at the time of writing.


OCI registration as storage

GitHub | website

This one requires an explanation as it’s a bit more nuanced. You’re probably familiar with the Open Container Initiative (OCI), the group that sets industry standard formats for containers. One such format is the distribution specification, which defines a standard way of storing, processing, and retrieving container images. Well, developers have started using OCI registrations to store non-container types as well. Therefore, OCI artifacts were created to define these arbitrary storage types. Finally, OCI Registry-as-Storage (ORAS) is a utility that specifically helps in pushing and pulling these generic OCI artifacts from OCI registries. So far, ORAS has hardly been implemented. The documentation only mentions Singularity and Helm projects as current implementations. ORAS is a sandbox project with CNCF.

7. Piraeus Data Store

Highly available data store for Kubernetes

GitHub | website

Piraeus is an open-source, cloud-native storage system designed to work with local Kubernetes persistent volumes. The utility provides features such as dynamic provisioning, resource management, and high availability, and enables a failover process for stateful workloads. Piraeus is fairly easy to use compared to others on this list, and it only takes a few commands to get started. Piraeus is a good option if your project only works with local storage. At the time of writing, Piraeus is a sandbox project with the CNCF.

8. Vineyard

An immutable in-memory data manager

GitHub | website

Unlike others on this list, Vineyard (v6d) is unique in that it focuses on that in remembrance data storage. Vineyard is suitable for large data systems because it uses zero-copy data sharing to reduce redundant processing. It provides an abstract way to work with multiple computational frameworks that may use graph databases. At the time of writing, Vineyard is a CNCF sandbox project.

To implement persistent storage in Kubernetes, one needs to define a persistent volume, of which there are many classes for different storage types. For example, you use local storage and could point to a specific folder on the host running Kubernetes, but this is not always a best practice as you often need to share storage across nodes. Running an NFS server is an option, but most use cases will want to bake cloud storage as a persistent volume.

No matter what infrastructure they’re working on, engineers and ITOps need easy access to store and retrieve data. And to reap the full benefits of a cloud-native ecosystem, it’s critical that storage be decoupled from the end node and intelligently orchestrated in a container ecosystem. As we saw above, there are many packages within the CNCF that attempt to streamline the process of merging Kubernetes with persistent, stateful storage.


Comments are closed.