“Scalability challenges at the edge can be addressed by creating independent failure domains and Kubernetes’ declarative infrastructure”
December 13, 2024

“Scalability challenges at the edge can be addressed by creating independent failure domains and Kubernetes’ declarative infrastructure”

Keith-Basil General Manager (Edge Business Unit), SUSE

It’s been a decade since Google released Kubernetes. Almost every single cloud provider has since embraced this open source container orchestration system. In recent years, there has been a shift from using cloud and Kubernetes concepts to pushing that API to the edge. In an exclusive conversation with OSFY’s Yashasvini Razdan, Keith Basil, General Manager (Edge Business Unit), SUSE, elucidates how organisations can fulfil the stringent demands of flexibility and scalability in edge computing through Kubernetes.


Q. Why do we need containers for the edge?

A. Containers became essential at the edge, not only for their lightweight and standardised deployment but also for solving the management-at-scale challenge that edge computing introduces. Kubernetes, which originated in the cloud, standardised infrastructure management for microservices and containerised applications. In 2019, Rancher released K3s, a lightweight version of Kubernetes, designed to run on low-resource hardware with a single command, offering a CNCF-certified API. K3s quickly gained traction among hobbyists and developers using Kubernetes on devices like the Raspberry Pi and Intel NUC. PoCs emerged for running Kubernetes on small, remote machines, attracting attention from industries like retail and manufacturing.

As edge use cases grew, the focus shifted from managing a few Kubernetes clusters to supporting thousands. Containers, with Kubernetes’ declarative nature, allowed teams to reuse cloud-based tools, processes like CI/CD pipelines, and developer skills to meet the scaling demands at the edge, while maintaining cloud-like efficiency.

Q. How is deploying Kubernetes at the edge different from deploying it on the cloud?

A. In the cloud, the infrastructure is preconfigured, so developers only interact with the API, without needing to worry about hardware or infrastructure setup. In contrast, at the edge, you need to build up to that API. The entire process of setting up infrastructure, installing software, and building clusters from the hardware layer to API availability is unique to edge deployments, and a fundamental challenge compared to cloud deployments where everything is already managed for you.

Q. How does SUSE navigate through these challenges?

A. We have software solutions to address these challenges, offering three provisioning models depending on the use case. For mass onboarding, we provide a tool called Edge Image Builder, which allows you to create a customised onboarding image. This image can be burned onto any hardware, shipped, or loaded onto a USB stick, significantly speeding up the process of taking a machine from an off state to a fully running cluster. It handles everything, including phoning home capabilities, security, and other essential configurations. Additionally, we offer adjacent tools that help manage the stack and solve both Day Zero and Day One challenges. These solutions are part of our comprehensive offerings.

Q.What kind of business model does SUSE follow?

A. We monetise the support for the free and open source software. While the technology stack of Linux is very mature, Kubernetes operates on a much faster release cycle, with new versions coming out every three months. We reconcile these differing life cycles between Linux and Kubernetes, with seamless upgrades for Kubernetes while keeping the entire stack stable. We have validated designs and a CI/CD pipeline that runs thorough validation checks to ensure that when Kubernetes is upgraded your applications and infrastructure remain intact and functional.

Our support services help customers maintain smooth operations while dealing with the frequent updates in Kubernetes and slower-moving Linux base layers.

Q. Do you also incorporate the help of the community in offering these support services?

A. Most of our support comes directly from SUSE employees, who contribute upstream and work closely with the community. It’s very rare that we encounter an issue we can’t resolve ourselves, requiring us to reach back to the wider community for help. In most cases, our engineers are the ones who wrote the code, so customers often get direct support from the people who developed the software.

Q. In India, which industry segments are you targeting with your product for the edge?

A. The three primary industries we’ve identified so far are telecom, manufacturing, and banking. While we’re still evaluating, there may also be opportunities in healthcare.

Q. What products do you have for the India market?

A. SUSE offers two key products for the India market. The first is SUSE Edge, which includes all the necessary components for onboarding, setting up, and configuring Kubernetes, supporting multiple architectures.

The second product is the Adaptive Telco Infrastructure Platform (ATIP), which is optimised for the high-performance requirements of telco customers. ATIP is a commercial implementation of the Sylva specification, tailored for 5G deployment. This telco-focused product supports various hardware types and includes optimisations like real-time kernel implementation, Precision Time Protocol (PTP), Single Root I/O Virtualization (SR-IOV), and Data Plane Development Kit (DPDK) to efficiently move packets from network interface cards (NICs) to containers.

Q. What’s your strategy for the Indian market?

A. We’ve discovered that in India, customers and CIOs are seeking a complete end-to-end solution—from the bottom of the stack to full application support. To effectively meet this demand, we are partnering with integrators, systems integrators, and value-added resellers who can help deliver these comprehensive solutions.

Q. How does the deployment of the Kubernetes platform differ across various applications?

A. We have three large deployment methodologies that include all the bare metal management capabilities and cover the edge as well. At SUSE, we manage hardware using BMC (baseboard management controller) interfaces, such as ILO, DRAC, or IPMI, which are the out-of-band management interfaces for data centre-grade hardware.

For instance, if you have five racks of gear, we can spin up two or three of those racks at any time to build clusters from their resources. When demand increases, additional racks can be powered on, providing extra capacity. Conversely, when demand decreases, machines can be powered down, spinning down the cluster. This is akin to the elasticity offered by cloud service providers, but instead, it’s hardware elasticity that optimises energy usage and contributes to green computing.

The third and more simple deployment method involves shipping hardware or building an image such that when a machine is powered on, the OS boots up, connects to Rancher, and registers for control from there. The deployment infrastructure is different in that case.

Q. What extensions are required when we integrate different edge devices across the Kubernetes platform, and how do they work?

A. Yes, some extensions are required, particularly in the telecom industry, where the primary focus is on passing packets quickly. Kubernetes is designed to run in environments where nodes are elastically scalable and generic—but that’s not always the case in telecom. So we extend the Kubernetes API to create custom resources using custom resource definitions (CRDs) to manage the infrastructure. When sending an upgrade plan to the downstream cluster, these CRDs coordinate the rolling window upgrade for each individual node in the cluster.

Q. How does SUSE equip the user to handle the deployment and management of different functionalities?

A. Our failure domain model allows each remote location to run independently without requiring permanent connectivity. In this model, the remote cluster (or downstream cluster) regularly phones home to Rancher for task updates. It pulls down a declarative manifest that specifies the necessary changes, and Kubernetes then ensures the system aligns with the declared state. If Rancher doesn’t have any tasks, the cluster continues operating locally.

Q. What do you mean by open standards, and how are these different from open source?

A. The difference between open standards and open source lies in their purpose. Open source is about the collaborative development of software, while open standards refer to the agreed-upon protocols, APIs, or interfaces that emerge from these collaborations. Open standards ensure uniformity and interoperability, allowing anyone to build on top of them, benefiting everyone in the ecosystem. Open source fosters collaboration, and the result is often the creation of open standards that facilitate innovation and widespread adoption.

Q. Could you give us examples of open source projects that have led to the adoption of Kubernetes as a standard?

A. The Sylva Project—a Linux Foundation open source community, of which SUSE is a part—aims to address 5G challenges, and they’ve chosen cloud-native technologies to do so. Kubernetes was selected as the infrastructure platform for use at the tower and within the core network. Similarly, major players in the industrial IoT (IIoT) space have come together within the Margo (at margo.org) initiative, by independently selecting Kubernetes as the standard API for industrial process automation services. These communities embrace the spirit of open source and collaborate to establish standardised specifications and implementations. This allows anyone globally to build value on top of these standards or APIs.

Q. How does selecting Kubernetes as an open standard impact interoperability across different kinds of edge hardware or platforms?

A. Kubernetes functions as an infrastructure platform that abstracts the underlying hardware. Its biggest value lies in its declarative API, which allows you to define your infrastructure needs, simplifying management by ensuring the system aligns with your configuration without manual intervention.

Beyond the basic layer, in the IIoT space, there are various types of devices—sensors, actuators, IP cameras, etc—that are too small to run Linux or Kubernetes but still operate using different protocols. Some vendors even add proprietary modifications, which makes interoperability across these devices challenging, especially since many of the protocols are industry specific. This is where open standards come into play.

The Margo project and Akri, an initiative started by Microsoft, address these interoperability challenges. Akri standardised the discovery of IIoT devices and unifies the way Kubernetes communicates with them. The Margo project complements this by focusing on firmware management, security, and process automation for these devices. Margo helps with managing credentials and ensuring IIoT devices receive firmware updates. Together, Akri and Margo create a standardisation framework that improves interoperability and reduces fragmentation across diverse edge hardware and platforms.

Q. How can Kubernetes handle any connectivity or integration challenge that arises in hybrid cloud connections?

A. Each on-premises location (on edge) functions as a downstream cluster with its own Kubernetes API. On the cloud side, a cloud-hosted Kubernetes cluster is deployed, providing the same API. In a hybrid environment, tools such as Rancher act as a multi-cluster management tool that bridges the gap between on-premises (edge) clusters and cloud clusters. Rancher allows you to have a control plane in the cloud that manages all the downstream clusters, whether they’re in the cloud or onsite, all using the same Kubernetes API. It provides the connectivity and visibility needed to manage both environments as a unified system, solving the challenges of hybrid cloud connections effectively.

Q. How do you manage scalability across multiple platforms at the edge?

A. Scalability challenges at the edge can be addressed by creating small, independent failure domains and Kubernetes’ declarative infrastructure. Automated testing pipelines ensure that once a declaration is validated, it can be replicated across multiple locations without human intervention. This approach avoids errors and ensures that scaling can be handled effectively.

Q. How does one avoid latency in edge deployments?

A. Contrary to popular belief, you don’t always need a fast connection at the edge. While that’s sometimes true, in many cases, it’s not. We’ve found that if you design the network correctly, with a failure domain where most processes can run independently, latency becomes less of an issue and is manageable. One of our customers operates in an environment where edge locations are connected via satellite, which has high latency and low bandwidth. As a result, they’ve adapted their CI/CD system so that major updates are scheduled when these mobile edge ‘locations’ are in an area with stronger and faster connectivity, while smaller updates can still be pushed at other times.

Q. How is security at the edge handled in case of Kubernetes clusters?

A. Security strategy needs to evolve when pushing compute and storage resources to the edge, as these resources are no longer in a hardened data centre but in more vulnerable locations, where there is physical access to the machines. The most effective approach is adopting a zero trust security model, where you assume the environment is hostile from the start. Every connection requires mutual identification processes, embedded via protocols such as mutual TLS (mTLS).

SUSE NeuVector is based on zero trust principles and covers three to four of the core zero trust pillars outlined by the Zero Trust Reference Architecture published by security agencies like DISA. It offers container isolation and secures the Kubernetes cluster, while complementing other security measures like identity management and monitoring to round out a complete security strategy at the edge.

Q. How do you ensure that all these deployments meet regulatory and compliance standards?

A. We identify the risk management framework that applies to our specific situation, and then conduct an assessment against our default system to identify how it meets the required technical controls. This process involves checking off security measures for the Linux layer, the Kubernetes layer, and providing guidance for the application layer. After confirming these measures are in place, we codify them into a policy and implement it at the edge to ensure compliance with the entire framework.

While I have simplified this process, the overarching steps are clear: identify compliance needs, perform an assessment, push the findings into production, and continuously reassess and audit the security state. This includes ongoing monitoring to adapt to any policy changes or emerging threats. We assist with this entire process to ensure compliance at every stage.



2024-12-05 04:30:26

Leave a Reply

Your email address will not be published. Required fields are marked *