Data locality and network performance

Thank you for your feedback!

Though the hybrid cloud is often utilized for typical workloads, there are cases when it does not work well, mostly related to the network. The network is usually the slowest and least reliable component of any complex system. And in data processing, it is critical.
This original paper on HDFS states, “An important characteristic of Hadoop is the partitioning of data and computation across many (thousands) of hosts, and executing application computations in parallel close to their data.” You bring computation to your data, so you don’t need to transfer vast amounts of data across the network.
A hybrid cloud network can be unreliable and expensive. If data is stored and processed in different places, there can be throughput problems and big data transfer expenses. As a result, keeping data close to the processing systems - working with huge data sets without transferring them across many miles of Fibre Channel - is vital.
Because of security concerns, many companies keep their data on-premises for security reasons, not wanting to store it on “someone else’s computer.” They question using remote services for processing in terms of both security and efficiency.
For these two common reasons, companies are looking to deploy containers and developer tools locally. Integrating common data sources and standard enterprise services makes it possible to extend data plane functionality and build an in-house analytics platform without external dependencies.