Boomi DCP gives organizations a framework and tools to reduce the complexity of large multisource data environments. The DCP framework enables IT to manage authentication and authorization for the data analyst community to protect the privacy of personally identifiable information (PII), and gives proper access to the proper teams and individuals. The multilevel, role-based access control (RBAC) provides sufficient flexibility while remaining consistent and manageable for even the largest organization. The use case scenario in this document demonstrates one example of how the DCP RBAC could enable three types of financial organization users to have the minimum data access rights, while still performing their respective job duties.
Boomi DCP can be added to your new or existing cluster as easily as adding a node to Hadoop. Using the integrated Cloudera wizard to extend a few Hadoop services to the Boomi node is all that is required to prepare to install the platform. DCP comes with its own installation scripts that are used once the cluster node is prepared.
Once a user has created a data set or been given access to an existing data set, they must understand the data size, distributions, and quality. The Dell Technologies use case showed how Boomi DCP provides easy-to-access and interpret summaries of the data set size, and a summary of each variable. The time savings that result from the platform generating this metadata is significant, even for small teams. The savings are even more substantial as the size of the supported analytics team expands. Centralizing data summarization also helps eliminate the potential for inconsistent conclusions. Such inconsistencies can occur when each analyst or team chooses their own tools and methods for data exploration. Any organization can benefit from spending less hours developing duplicate exploratory data analysis code and notebooks. Boomi DCP incorporates centralizing both externally generated and platform-generated metadata, using a single framework.
As organizations have brought more digital data under management, an unanticipated consequence is that fewer business analysts have the skills to work with these big data
repositories. The professional data engineers and analysts can invest in code first
approaches to working with big data. These approaches are largely focused on the Hadoop and NoSQL open-source tool sets. Boomi DCP also provides low code-no code
capabilities that are accessible to the general business analyst community. These analysts have important roles that preclude investing in professional software development skills.
The Dell Technologies use case demonstrates how someone with a basic understanding of data structures and table joining can define a complete, end-to-end transformation using UI-based wizards and minimal coding. One of the more significant Boomi DCP differentiators is that the output of a data transformation job definition is highly efficient SQL code. That code can be immediately run or scheduled for later execution, on a fully featured, highly scalable Spark cluster. Boomi DCP re-engages the business analytics community that was primarily using spreadsheets or single-threaded Python or R on workstations, to generate value from the massive big data stores now available without learning scale-out Spark coding.
This document has also demonstrated the power of Boomi DCP to achieve additional value from existing or new Hadoop investments. The DCP framework enables IT to focus on the management of business-critical Hadoop investments, while supporting the needs of the business analyst community for accessing big data processing resources.