Dell Technologies conducted and documented sample workload tests for each component. These tests verified component compatibility and basic functionality with the recommended hardware and software configurations in Cloudera CDP Private Cloud Base.
Home > Workload Solutions > Data Analytics > Guides > Design Guide—Data Management with Cloudera Data Platform on Intel-powered Dell EMC Infrastructure > Component validation
Dell Technologies conducted and documented sample workload tests for each component. These tests verified component compatibility and basic functionality with the recommended hardware and software configurations in Cloudera CDP Private Cloud Base.
About this task
TeraSuite is a suite of programs that generate, sort, and validate a large dataset to benchmark the performance of a Hadoop cluster. It consists of TeraGen, TeraSort, and TeraValidate, which are part of the Apache Hadoop examples package. Dell Technologies ran TeraSuite programs to validate the HDFS and MapReduce layers of the Hadoop cluster.
Steps
yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/\
hadoop-mapreduce-examples.jar teragen 10000000000 teragen
yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/\
hadoop-mapreduce-examples.jar teragen terasort
yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/\
hadoop-mapreduce-examples.jar terasort teravalidate
About this task
Steps
yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/\
hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 5000 \
-size 128MB
yarn jar /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client \
-jobclient-3.1.1.7.1.7.0-551-tests.jar TestDFSIO -read -nrFiles 5000 \
-size 128MB
About this task
Steps
yarn jar /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p2000.37147774/jars/\
hadoop-streaming-3.1.1.7.1.7.2000-305.jar -file /root/final_test/mapper.py -mapper \
/root/final_test/mapper.py -file /root/final_test/reducer.py -reducer \
/root/final_test/reducer.py -input /user/root/words.txt -output /user/root/mp_reduce
About this task
Apache Spark is a distributed processing system for big data analytics workloads. It delivers high performance for both batch and streaming data by leveraging:
Steps
spark-submit --class org.apache.spark.examples.SparkPi --master yarn \
--deploy-mode cluster /opt/cloudera/parcels/CDH/jars/\
spark-examples_2.11-2.4.7.7.1.7.2000-305.jar
About this task
spark3-shell --master yarn --conf spark.task.resource.gpu.amount=1 --conf \
spark.rapids.sql.concurrentGpuTasks=1 --conf spark.sql.files.maxPartitionBytes=256m \
--conf spark.locality.wait=0s --conf spark.sql.adaptive.enabled=true \
--conf spark.rapids.memory.pinnedPool.size=2G --conf "spark.rapids.sql.enabled=true" \
--conf "spark.executor.memoryOverhead=5g" \
--conf spark.sql.adaptive.advisoryPartitionSizeInBytes=1
About this task
Hive is a data warehouse system that enables SQL-like queries on large datasets that are stored on the Hadoop cluster. It leverages Apache Tez or MapReduce as the execution engine.
Simple table creation and select queries were performed to validate the Hive service. These examples test the functional validation of the Hive service.
Steps
hive
CREATE DATABASE TEST;
CREATE TABLE TEST.Sales_Data(StoreLocation VARCHAR(30),Product VARCHAR(30),\
OrderDate DATE,Revenue DECIMAL(10,2))
Insert into Sales_Data Values('Bangalore','Nutella','2023-05-16',7455.67),\
('Bangalore','Peanut Butter','2023-05-16',5316.89),('Bangalore','Milk','2023-05-16',\
2433.76),('Hyderabad','Bananas','2023-05-16',9456.01),('Hyderabad','Nutella',\
'2023-05-16',3644.33),('Hyderabad','Peanut Butter', '2023-05-16', 8988.64),\
('Hyderabad','Milk','2023-05-16', 1621.58)
Results
About this task
HBase is a column-oriented, nonrelational database management system. It leverages HDFS as its distributed storage layer and provides a fault-tolerant mechanism for storing sparse datasets.
HBase queries that wrote to, and read, tables were used to test the functional validation of the HBase service.
Steps
hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.23, rf42302b28aceaab773b15f234aa8718fff7eea3c, Tue May 1600:55:22 UTC 2023hbase(main):001:0>
create 'history', 'home', 'away'
0 row(s) in 1.1300 seconds=> Hbase::Table - emp
put 'history','1','home data:name','jim'
put 'history','row1','home:city','Boston'
1 column=personal data:name, timestamp=1417524185058, value=jim1 column=personal data:city, timestamp=1417524216501, value=Boston
drop 'history'
0 row(s) in 0.3060 seconds
exists 'history'
Table history does not exist0 row(s) in 0.0730 seconds
Maximum Number of HStoreFiles Compaction: 20
HStore Blocking Store Files: 200
HBase Memstore Block Multiplier: 4
HBase Memstore Flush Size: 256
create 'staff', 'id', 'name', 'age', 'city', 'department', 'salary'
0 row(s) in 1.1400 seconds=> Hbase::Table - emp
/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' \
-Dimporttsv.columns=HBASE_ROW_KEY,<column names> i989<tablename> \
<location of file from HDFS>
count 'staff', INTERVAL => 1000000
Current count: 1000000, row: 100899997Current count: 2000000, row: 101799997...Current count: 999000000, row: 999099999Current count: 1000000000, row: id1000000000 row(s)
About this task
Dell Technologies performed the following steps to validate Hue functionality:
Steps
SELECT
query on the Editor to verify the table creation and accessibility. About this task
Dell Technologies performed the following steps to validate Ranger functionality:
Steps
About this task
Dell Technologies performed the following steps to validate Atlas functionality:
Steps
CREATE TABLE employee (ssn STRING, name STRING, location STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
printf "111-111-111,James,San Jose\\n222-222-222,Christian,Santa Clara
\\n333-333-333,George,Fremont" > employeedata.txt
hdfs dfs -copyFromLocal employeedata.txt /warehouse/tablespace/managed/hive/employee
CREATE TABLE employee_alt AS (SELECT name, location FROM employee);
About this task
Dell Technologies implemented tag-based access control policies on sample data to validate functionalities that are provided through integration of Atlas and Ranger.
Dell Technologies performed the following steps to validate Atlas and Ranger integration:
Steps
About this task
HDFS has a single NameNode that manages the namespace and the metadata. However, Ozone separates the namespace management and the block space management using Ozone Manager (OM) and Storage Container Manager (SCM). Therefore, Ozone can theoretically handle more files and objects than HDFS.
Dell Technologies performed the following steps to validate Ozone:
Steps