Accelerating the Journey towards Autonomous Telecom Networks
Fri, 06 Jan 2023 14:29:40 -0000
|Read Time: 0 minutes
How Dell Technologies is supporting communications service providers accelerate automation
Communications service providers (CSPs) are on a journey of digital transformation that gives them the ability to offer new innovative services and a better customer experience in an open, agile, and cost-effective manner. Recent developments in 5G, Edge, Radio Access Network disaggregation, and, most importantly the pandemic have all proven to be catalysts that accelerated this digital transformation. However, all these advancements in telecom come with their own set of challenges. New architectures and solutions have made the modern network considerably more complex and difficult to manage.
In response, CSPs are evaluating new ways of managing their complex networks using automation and artificial intelligence. The ability to fully orchestrate the operation of digital platforms is vital for touchless operations and consistent delivery of services. Almost every CSP is working on this today. However, the standard automation architecture and tools can't be directly applied by CSPs as all these solutions need to adhere to strict telecom requirements and specifications such as those defined by enhanced Telecom Operations Map (eTOM), Telecom Management Forum (TM Forum), European Telecommunications Standards Institute (ETSI), 3rd Generation Partnership Project (3GPP), etc. CSPs also need to operate many telecom solutions including legacy physical network functions (PNF), virtual network functions (VNF), and the latest 5G era containerized network functions (CNF).
Removing barriers with telecom automation
Although many CSPs have built cloud platforms, only a handful have achieved their automation targets. So, what do you do when there is no ready-made industry-standard automation solution? You build one. And that’s exactly what Dell Technologies did with the recent launch of its Dell Telecom Multi-Cloud Foundation. Dell Telecom Multi-Cloud Foundation automates the deployment and life-cycle management of the cloud platforms used in a telecom network to reduce operational costs while consistently meeting telco-grade SLAs. It also supports the leading cloud platforms offering operators the flexibility of choosing the platform that best meets their needs based on workload requirements and cost-to-serve. It streamlines telecom cloud design, deployment, and management with integrated hardware, software, and support.
The solution includes Dell Telecom Infrastructure Blocks. Telecom Infrastructure Blocks are engineered systems that provide foundational building blocks that include all the hardware, software and licenses to build and scale out cloud infrastructure for a defined telecom use case.
Telecom Infrastructure Block releases will be delivered in an agile manner with multiple releases per year to simplify lifecycle management. In 2023, Dell Telecom Infrastructure Blocks will support workloads for Radio Access Network and Core network functions with:
- Dell Telecom Infrastructure Blocks for Wind River which will support vRAN and Open RAN workloads.
Dell Telecom Infrastructure Blocks for RedHat will target core network workloads (planned). The primary goal of Telecom Multi-Cloud Foundation with Telecom Infrastructure Blocks is to deliver telco cloud platforms that are engineered for scaled deployments, providing three core capabilities:
- Integration: All components of the platform, including computing, storage, networking, ancillaries like accelerators, Cloud CaaS software, and management tools are integrated into Dell’s factories.
- Validation: A solution engineered and validated by our cloud partners and already proven to work in the field. The engineering and validation process includes detailed test cases across both functional and non-functional aspects of the platform
- Automation: A Solution that is fully automated and that can seamlessly integrate with Telco’s existing orchestration and inventory systems.
Dell Technologies Telecom Multi-Cloud Foundation meets Telco automation requirements
Dell Technologies Multi-Cloud Foundation provides communications service providers with a platform-centric solution based on open Application Programming interfaces (APIs) and consistent tools. This means the platform can deliver outcomes based on a unique use case and workload and then scale out deployments using an API-based approach.
Dell Telcom Multi-Cloud Foundation enables telco-grade automation through the following key capabilities:
- An open API and workflow approach: All the capabilities of the platform are available as declarative APIs so there is no need to manage each infrastructure component independently, rather open APIs and workflows are triggered via northbound orchestration systems. This capability not only automates deployment but also Day 2 operations and life-cycle management.
- Scalable architecture: The automation architecture is based on a fully distributed and federated architecture, so it can scale to 100,000’s of sites.
- Data-Driven architecture: The automation architecture is data-driven and distributed so data can be tapped from edge and regional sites enabling real-time use cases and data-driven automation.
Automation use cases with Dell Technologies Telecom Multi-Cloud Foundation
Telecom Automation is not just about Day 0 (design) and Day 1 (deployment) but should also cover Day 2 (operations and lifecycle management). Dell Telecom Multi-Cloud Foundation supports the following use cases:
- Automated Deployment: It includes a fully-automated deployment of the cloud infrastructure based on customer specifications.
- O-Cloud as Code: It employs declarative automation using infrastructure data, which includes site data, networking, resources, and credentials to automate tasks independent of the workflow. This de-coupling is crucial to orchestrate the platform.
- Operational fulfillment: Integrations with Wind River Studio Conductor delivers full set of operational tools that provide a single management and observation platform for the operations team. This helps with creating a unified layer for Network Operations Center (NOC) teams to monitor and manage the platform.
- Staging: The platform is staged in Dell’s factory to reduce the time spent deploying and configuring the system on-site and can be tuned in the field using the built-in automation to meet any unique operator specifications.
Dell Technologies developed Dell Telecom Multi-Cloud Foundation and Dell Telecom Infrastructure Blocks to accelerate 5G cloud infrastructure transformation. Our current release of Telecom Infrastructure Blocks for Wind River delivers an engineered and factory-integrated system that comes with a fully automated deployment model for CSPs looking to build resilient and high-performance RAN.
To learn more about our solution, please visit the Dell Telecom Multi-Cloud Foundation solutions site.
About the Author: Saad Sheikh
Saad Sheikh is APJ's Lead Systems Architect in Telecom Systems Business at Dell Technologies. In his current role, he is responsible for driving Telecom Cloud, Automation, and NGOPS transformations in APJ supporting partners, NEPs, and customers to accelerate Network Transformation for 5G, Open RAN, Core, and Edge using Dell’s products and capabilities. He is an industry leader with over 20 years of experience in Telco industry holding roles in Telco, System Integrators, Consulting businesses, and with Telecom vendors where he has worked on E2E Telecoms systems (RAN, Transport, Core, Networks), Cloud platforms, Automation, Orchestration, and Intelligent Networking. As part of Dell CTO team, he represents Dell in Linux Foundation, TMforum, GSMA, and TIP.
Related Blog Posts
Accelerating and Optimizing AI Operations with Infrastructure as Code
Fri, 03 May 2024 12:00:00 -0000
|Read Time: 0 minutes
Accelerating and Optimizing AI Operations with Infrastructure as Code
Achieving maturity in a DevOps organization requires overcoming various barriers and following specific steps. The level of maturity attained depends on the short-term and long-term goals set for the infrastructure. In the short term, IT teams must focus on upskilling their resources and integrating tools for containerization and automation throughout the operating lifecycles, from Day 0 to Day 2. Any progress made in scaling up containerized environments and automating processes significantly enhances the long-term economic viability and sustainability of the company. Furthermore, in the long term, it involves deploying these solutions across multicloud, multisite landscapes and effectively balancing workloads.
The optimization of your AI applications, and by extension, other high-value workloads, hinges upon the velocity, scalability, and efficacy of your infrastructure, as well as the maturity of your DevOps processes. Prior to the explosion that is AI, recent survey results indicated the state of automation for infrastructure operations’ workflows was overall less than 50%; partner that with twofold the increase of application counts and organizations may struggle against the waves of change[1].
From compute capabilities to storage density and speed, spanning across unstructured, block, and file formats, there exists fundamental elements of automation ripe for swift integration to establish a robust foundation. By seamlessly layering pre-built integration tools and a complementary portfolio of products at each stage, the journey towards ramping up AI can be alleviated.
There are important considerations regarding the various hardware infrastructure components for a generative AI system, including high performance computing, highspeed networking, and scalable, high-capacity, and low-latency storage to name a few. The infrastructure requirements for AI/ML workloads are dynamic and dependent on several factors, including the nature of the task, the size of the dataset, the complexity of the model, and the desired performance levels. There is no one-size-fits-all solution when it comes to Gen AI infrastructure, as different tasks and projects may demand unique configurations. Central to the success of generative AI initiatives is the adoption of Infrastructure-as-Code (IaC) principles which facilitate the automation and orchestration of underlying infrastructure components. By leveraging IaC tools like RedHat Ansible and HashiCorp Terraform, organizations can streamline the deployment and management of hardware resources, ensuring seamless integration with Gen AI workloads.
At the base of this foundation is Red Hat Ansible modules for Dell, and they speed up the provisioning of servers and storage for quick AI application workload mobility.
Creating playbooks with Ansible to automate server configurations, provisioning, deployments, and updates are seamless while data is being collected. Due to the declarative and mutable nature of Ansible, the playbooks can be changed in real-time without interruption to processes or end users.
Compute
On the compute front, a lot goes into configuring servers for the different AI and ML operations:
GPU Drivers and CUDA toolkit Installation: Install appropriate GPU drivers for the server's GPU hardware. For example, installing CUDA Toolkit and drivers to enable GPU acceleration for deep learning frameworks such as TensorFlow and PyTorch.
Deep Learning Framework Installation: Install popular deep learning frameworks such as TensorFlow or PyTorch, along with their associated dependencies.
Containerization: Consider using containerization technologies such as Docker or Kubernetes to encapsulate AI workloads and their dependencies into portable and isolated containers. Containerization facilitates reproducibility, scalability, and resource isolation, making it easier to deploy and manage GenAI workloads across different environments.
Performance Optimization: Optimize server configurations, kernel parameters, and system settings to maximize performance and resource utilization for GenAI workloads. Tune CPU and GPU settings, memory allocation, disk I/O, and network configurations based on workload characteristics and hardware capabilities.
Monitoring and Management: Implement monitoring and management tools to track server performance metrics, resource utilization, and workload behavior in real-time.
Security Hardening: Ensure server security by applying security best practices, installing security patches and updates, configuring firewalls, and implementing access controls. Protect sensitive data and AI models from unauthorized access, tampering, or exploitation by following security guidelines and compliance standards.
Dell Openmanage Ansible collection offers modules and roles both at the iDRAC/Redfish interface level and at the OpenManage Enterprise level for server configurations such as PowerEdge XE 9860 designed to collect, develop, train, and deploy large machine learning models (LLMs).
The following is a summary of the OME and iDRAC modules and roles as part of the openmanage collection:
Storage
When it comes to AI and storage, during the data processing and training aspects, customers rely on scalable and simple access to file systems which increased data is trained on. With AI unstructured data storage is necessary for the bounty of rich context and nuance that will be accessed during the building phase. It also highly depends on user access to be variable, and Ansible automation playbooks can help change and adapt quickly.
Dell PowerScale is the world’s leading scale-out NAS platform, and it recently became the first ethernet storage certified on NVIDIA SuperPod. When it comes to Ansible automation, PowerScale comes with an extensive set of modules that covers a wide range of platform operations:
Software defined storage
Hyper converged platforms like PowerFlex offer highly scalable and configurable compute and storage clusters. In addition to the common day-2 tasks like storage provisioning, data protection and user management, the Ansible collection for PowerFlex can be used for cluster deployment and expansion. Here is a summary of what Ansible collections for PowerFlex offers:
Conclusion
The one thing agreed upon is that Generative AI tools need the scale, repeatability, and reliability beyond anything created from the software and data center combined. This is precisely what building infrastructure-as-code practices into a multisite operation are designated to do. From PowerEdge to PowerScale, the level of capacity and performance is unmatched. This allows AI operations and Generative AI to absorb, grow and provide the intelligence that organizations need to be competitive and innovative.
[1] Infrastructure-as-code and DevOps Automation: The Keys to Unlocking Innovation and Resilience, September 2023
Other resources:
GenAI Acceleration Depends on Infrastructure as Code
Authors: Jennifer Aspesi, Parasar Kodati
RecoverPoint for VMs Automation – Advanced VM Protection
Wed, 21 Feb 2024 21:39:22 -0000
|Read Time: 0 minutes
In the spirit of automating everything, this blog will discuss a new automation solution in the RecoverPoint for VMs (RP4VMs) collection of automation solutions.
We have a variety of automation solutions for RP4VMs, including per-tag and per-cluster VM protection and use-case driven tasks, as well as a complete deployment automation solution. Now, I would like to present a new automation solution – Advanced VM Protection.
Let’s take a closer look at this exciting new solution.
What does the solution do?
The RecoverPoint for VMs advanced VM protection solution automates VM protection in RP4VMs with a wide variety of options:
- Automates VM protection based on pre-defined parameters in a JSON configuration file:
- VM name
- RP4VMs cluster name
- Plugin server IP or FQDN
- vCenter user/password or path to credentials file
- Production journal capacity (GB)
- Replica journal capacity (GB)
- Required RPO (sec)
- Failover networks per vNIC
- Performs and monitors mass VM protection
- Protects VMs for a specific RP4VMs cluster (optional)
- Performs VM protection operations on a specific plugin server (optional)
- Includes an option to skip the monitoring of VM protection preparation tasks
- Configures failover networks on a per network adapter basis as a post-protection operation
What is the solution?
It is a Python-based script that exclusively leverages the RP4VMs REST API.
Here is the list of prerequisites:
- Python 3.x (The script supports every platform Python is supported on)
- Python requests module, which can be installed using pip with the command:
pip install requests or python -m pip install requests
- RP4VMs 5.3.x and later
- Connectivity from the host running the script to the RP4VMs plugin server(s), specifically on tcp port 443
How do I use the script?
The script accepts the following parameters:
- One mandatory parameter, file, for a full path to the JSON configuration file.
- The optional parameters, rpvmcluster and server, limit script execution only for VM protection on a specified RP4VMs cluster and/or plugin server accordingly.
- The no-monitor parameter skips monitors of VM protection preparation task.
Here is the full script syntax:
# python advprotectvm.py -h usage: advprotectvm.py [-h] -file CONFIG_FILE [-cl RPVM_CLUSTER] [-s SERVER] [-nmonitor] Scripts advanced VM Protection in RecoverPoint for VMs options: -h, --help show this help message and exit -file CONFIG_FILE, --vm-config-file CONFIG_FILE Path to VM config file -cl RPVM_CLUSTER, --rpvmcluster RPVM_CLUSTER Optionally specify the RP4VMs cluster -s SERVER, --server SERVER Optionally specify RP4VMs Plugin Server DNS/IP -nmonitor, --no-monitor Optionally prevents protection monitoring
Use Cases and Examples
Let’s look at some common use cases for RP4VMs automated advanced VM protection:
- RP4VMs mass VM protection for onboarding of a new application:
# python advprotectvm.py -file idan-vms.json
- Batch VM protection only for a specific RP4VMs cluster:
# python advprotectvm.py -file idan-vms.json -cl Tel-Aviv
- Mass VM protection for a specific vCenter/ plugin or onboarding of a new datacenter:
# python advprotectvm.py -file vms.json -s pluginserver.idan.dell.com
Script output
# python advprotectvm.py -file vms.json -> Protecting VM prodwebsrv1 ---> Protection of VM prodwebsrv1 initiated -> Protecting VM prodappsrv1 ---> Protection of VM prodappsrv1 initiated -> Protecting VM proddbsrv1 ---> Protection of VM proddbsrv1 initiated -> VM protection initiated, monitoring ---> Protection of VM: prodwebsrv1, Transaction: d6783e2d-55be-47db-a082-de1d251c2375, Status: RUNNING ---> Protection of VM: prodappsrv1, Transaction: 808ab022-e79a-4ad1-a633-cc86e17644f2, Status: RUNNING ---> Protection of VM: proddbsrv1, Transaction: c7895dce-f3e6-4e70-872e-9d0b104d6273, Status: RUNNING ---> Protection of VM: prodwebsrv1, Transaction: d6783e2d-55be-47db-a082-de1d251c2375, Status: RUNNING ---> Protection of VM: prodappsrv1, Transaction: 808ab022-e79a-4ad1-a633-cc86e17644f2, Status: RUNNING ---> Protection of VM: proddbsrv1, Transaction: c7895dce-f3e6-4e70-872e-9d0b104d6273, Status: RUNNING ---> Protection of VM: prodwebsrv1, Transaction: d6783e2d-55be-47db-a082-de1d251c2375, Status: RUNNING ---> Protection of VM: prodappsrv1, Transaction: 808ab022-e79a-4ad1-a633-cc86e17644f2, Status: RUNNING ---> Protection of VM: proddbsrv1, Transaction: c7895dce-f3e6-4e70-872e-9d0b104d6273, Status: RUNNING ---> Protection of VM: prodwebsrv1, Transaction: d6783e2d-55be-47db-a082-de1d251c2375, Status: COMPLETED ---> Protection of VM: prodappsrv1, Transaction: 808ab022-e79a-4ad1-a633-cc86e17644f2, Status: COMPLETED ---> Protection of VM: proddbsrv1, Transaction: c7895dce-f3e6-4e70-872e-9d0b104d6273, Status: COMPLETED -> Configuring failover networks ---> Skipping failover network config for VM: prodwebsrv1 ---> Failover networks config is not required for VM: prodappsrv1 ---> Failover network config is successful for VM: proddbsrv1
Where can I find it?
The script and the config file can be found on GitHub: https://github.com/IdanKen/Dell-EMC-RecoverPoint4VMs.
Resources
- The Dell developer site provides comprehensive online API documentation, including full API references, tutorials, and use cases for the RP4VMs REST API.
- The RP4VMs REST API offers self-documentation – Swagger UI running on the plugin server itself – https://{plugin-server}/ui
- RecoverPoint for VMs GitHub repository
- RecoverPoint for VMs 5.3 – New RESTful API Demo
How can I get help?
For additional support, you are more than welcome to raise an issue in GitHub or reach out to me by email: Idan.kentor@dell.com
Thanks for reading!
Idan
Author: Idan Kentor