Improving service mobility in the 5G edge cloud

0

What is the best way to implement cloud-native edge services in 5G and beyond?

The edge cloud offloads computing from user devices with lower latency, higher bandwidth, and less network load than centralized cloud solutions. Services that will benefit immensely from edge cloud support in 5G and future 6G include augmented reality, cloud gaming and cooperative vehicle collision avoidance.

With the mobility we expect in cellular networks, a challenge arises: how to portability of a service when the end user moves? If the terminal physically moves, the service should be relocated to the nearest edge cloud again.

Endpoint mobility has been supported by several generations of cellular networks – we can move around with our mobile devices, and the network keeps calls and other services running. But now with the edge cloud, server-side mobility is also required.

Regardless of the actual intent or policy behind a service relocation, the service itself may or may not be stateful.

Stateful vs. stateless services

In the digital world, stateless services can be implemented with serverless or function-as-a-service (FaaS) technologies, for example. The relocation of such services can be handled, for example, via load balancers or – in the case of Kubernetes, which we are looking at here – ingress services. However, serverless services that can serve clients based solely on ephemeral input from the client are rare; Even serverless services often need to store some state in databases, message queues, or key-value stores. When the service is moved, its status should go with it and, if possible, be transferred to the vicinity of the service independent of the manufacturer. Otherwise, the service may experience unexpected latencies, for example, when trying to access its state.

While stateless services are ideal in the cloud-native philosophy, some stateful legacy services might be too expensive to rewrite stateless, or some applications might just perform better than stateful. In such a case, moving the service in the case of Kubernetes could be managed with a container migration. The advantage of such a scheme is that it works with unmodified applications. The main disadvantage is that existing connections based on the popular Transport Control Protocol (TCP) may drop because the network stack is not transmitted along with the container. This can lead to service interruptions, which can even be felt in the terminal.

Implement cloud-native edge services

Our approach is an attempt to find a balance between the various constraints: the application is allowed to be stateful, but must be able to transfer its state to a database and recover it. The underlying framework does the rest. But what would such a system look like?

Figure 1: The system architecture of the proposed implementation prototype

Before we explain how the system works, let’s first focus on what the system is supposed to do. The figure above shows four different clouds, each represented as a Kubernetes cluster, with the host cluster at the top managing the three edge clusters shown in the figure below. The goal is to be able to move or relocate a stateful server-side application (gRPC-Server-Pod 1) from the second cluster to the third cluster without the application losing any state. To quantify how well the system avoids service disruptions during the move, the server-side application is connected to a test application (gRPC client pod 1, in the first cluster) that constantly measures the latency to the server pod and sends the measurements to the Server that the server stores as its “state”. The challenge here is that this state must remain intact when the system moves the server pod across cluster boundaries. Further, how can this be accomplished with minimized service disruption?

component purpose
user interface (UI) Web-based user interface that can be used to visualize the topology
KubeMQ Publish-subscribe service to simplify signaling between system components
Service Mobility Controller Orchestrates the process of relocating the running server pod and its state
Federator An optional wrapper for KubeFed that allows a cluster to join the federation more easily
KubeFed Federated Kubernetes supports starting and stopping workloads in a multi-cluster environment
K8s API Unmodified K8s API available in every cluster
K8s agent Monitors the status of the pods (for example, running or stopped) and reports to the Service Mobility Controller
apartment The actual workload or application running in a container. Both the client and server applications communicate via gRPC
sidecar A system container based on the Network Service Mesh (NSM) framework that runs in the same pod as the app. Connectivity between applications is managed by NSM.
gRPC Client/Server Pod The pod hosting either the gRPC client or server application
Database (DB) In-memory key-value store based on Redis used to store the latency measurements in the server pod

Figure 2: Description of the purpose of each component of the prototype

How does the proposed solution work? When the server-side pod needs to be moved, the Service Mobility Controller (SMC) launches replicas of the server-side pod, including the database, in Cluster 3. Then the SMC starts synchronizing the database replica in Cluster 3 with the one in Cluster 2. When the When the database sync is almost complete, the SMC temporarily blocks the server-side pod until the database sync is complete. After that, the SMC instructs the test client to reestablish communication with the new server pod. Finally, the SMC then clears the unused resources in cluster 2.

Implementation of a service mobility prototype

We evaluated the prototype’s performance from the standpoint of service disruption, as shown in the figure below. The x-axis shows how often (every x milliseconds) the gRPC client measured latency. The y-axis shows how many times the gRPC client had to resend the data during the gRPC server move (green bar) and the standard deviation of ten test runs (the error bar).

Evaluation of prototype performance based on service disruptions

Figure 3: Evaluation of prototype performance based on service disruptions

In the figure above, the leftmost bar shows that the gRPC client had to retransmit an average of 3.5 times when the client measured the latency every 30 milliseconds. On the right side of the figure, the number of retransmissions decreases to a single retransmission at latency measurement intervals of 90 and 100 milliseconds. It’s worth noting that since the gRPC uses reliable TCP as its transport, no packets are dropped. The measurement environment was also challenging in that Kubernetes was running on virtual machines in an OpenStack environment that was also running other workloads, and the connection throughput was limited to 1 Gbps.

Based on our evaluation with the prototype, we believe it’s possible to support relocatable, stateful services in a multi-cloud environment. Additionally, it is possible to achieve this in a cloud-native manner, optimizing the underlying application framework to minimize service disruption. We believe that the proposed solution could be used for Kubernetes to implement relocatable and non-disruptive third-party services within the 3GPP edge computing architecture, more specifically for the application context relocation technique in specification 23.558. Furthermore, an edge computing architecture with support for service mobility could be used as a building block in various scenarios, such as the mentioned use cases for augmented reality, cloud gaming, and cooperative vehicle collision avoidance.

Search for next-generation cloud-native applications

The results shown in this article are preliminary and require further analysis. The prototype can be further optimized and also compared with the container migration. Our work is a complementary solution to migration, not a competing one; one could use whatever suits the service in question better. In container migration, the application is unaware of the migration, while in our approach, the application is aware of its relocation and state transfer procedures, although some parts are hidden from the application.

We’ve barely scratched the surface with our prototyping efforts; This raises the question of how the next generation of cloud-native applications will be written and how responsibility will be shared between the application, the cloud integration framework, and the underlying cloud platform.

Learn more

Visit Ericsson’s edge computing pages to discover the latest trends, opportunities and insights at the edge.

Find out how edge exposure in the 5G ecosystem can add value beyond connectivity.

Explore our previous work on multi-cloud connectivity for Kubernetes in 5G.

Share.

Comments are closed.