The testing objective was to run sample workloads to validate the functionality of different components and services that Dell Technologies installed as parts of Cloudera CDP Private Cloud Base.
Home > Workload Solutions > Data Analytics > Guides > Design Guide—Data Management with Cloudera Data Platform on AMD-powered Dell Infrastructure > Component validation
The testing objective was to run sample workloads to validate the functionality of different components and services that Dell Technologies installed as parts of Cloudera CDP Private Cloud Base.
The TeraSuite workload tool combines testing of HDFS and MapReduce layers of a Hadoop cluster. Its goal is to generate, sort, and validate a configurable amount of data as fast as possible. This test is designed to exercise the compute and local storage configurations with concurrent HDFS access.
time hadoop jar hadoop-mapreduce-examples-3.1.1.7.1.7.0-551.jar teragen \
-Ddfs.blocksize=536870912 -Dmapreduce.job.maps=240 -Dmapreduce.job.reduces=120 \
-Dmapreduce.map.speculative=true -Dmapreduce.map.output.compress=true 10000000000 \
hdfs://pvcmaster1.orange.local:8020/user/root/teragen1
time hadoop jar hadoop-mapreduce-examples-3.1.1.7.1.7.0-551.jar terasort \
-Ddfs.blocksize=536870912 -Dmapreduce.job.maps=240 -Dmapreduce.job.reduces=120 \
-Dmapreduce.map.speculative=true -Dmapreduce.map.output.compress=true \
/user/root/teragen1 /user/root/terasort1
time hadoop jar hadoop-mapreduce-examples-3.1.1.7.1.7.0-551.jar teravalidate \
-Ddfs.blocksize=536870912 -Dmapreduce.job.maps=240 -Dmapreduce.job.reduces=120 \
-Dmapreduce.map.speculative=true -Dmapreduce.map.output.compress=true \
/user/root/terasort1 /user/root/teravalidate
yarn jar /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client \
-jobclient-3.1.1.7.1.7.0-551-tests.jar TestDFSIO -write -nrFiles 5000 -size 128MB
yarn jar /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client \
-jobclient-3.1.1.7.1.7.0-551-tests.jar TestDFSIO -read -nrFiles 5000 -size 128MB
About this task
Hadoop MapReduce is a programming model that is used to process bulk data. Programs for MapReduce can be run in parallel. They deliver high-performance, large-scale data analyses in the cluster.
Step
hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-streaming-3.1.1.7.1.7.0-551.jar \
-file /root/final_test/mapper.py -mapper /root/final_test/mapper.py -file \
/root/final_test/reducer.py -reducer /root/final_test/reducer.py -input \
/user/root/words.txt -output /user/root/mp_reduce"
Results
At the conclusion of this test, Dell Technologies proved the validity of the MapReduce service.
About this test
Apache Spark achieves high performance for both batch and streaming data using:
Step
spark-submit --class org.apache.spark.examples.SparkPi --master yarn \
/opt/cloudera/parcels/CDH/jars/spark-examples_2.11-2.4.7.7.1.7.0-551.jar 10
Results
At the conclusion of this test, Dell Technologies proved the validity of the Spark service.
Hive is a data warehouse software project that is built on top of Hadoop to provide data query and analysis. Hive provides a SQL-like interface to query data that is stored in various databases and file systems that integrate with Hadoop.
These example queries test the functional validation of the Hive service.
!connect jdbc:hive//<Namenode>:10000/default
CREATE DATABASE TEST
CREATE TABLE TEST.Sales_Data(StoreLocation VARCHAR(30),Product VARCHAR(30),\
OrderDate DATE,Revenue DECIMAL(10,2))
Insert into Sales_Data Values('Bangalore','Nutella','2021-07-16',7455.67),\
('Bangalore','Peanut Butter','2021-07-16',5316.89),('Bangalore','Milk','2021-07-16',\
2433.76),('Hyderabad','Bananas','2021-07-16',9456.01),('Hyderabad','Nutella',\
'2021-07-16',3644.33),('Hyderabad','Peanut Butter', '2021-07-16', 8988.64),\
('Hyderabad','Milk','2021-07-16', 1621.58)
HBase is a column-oriented, nonrelational database management system that runs on top of HDFS. HBase provides a fault-tolerant way to store sparse datasets, which are common in many big data use cases.
These example HBase queries were used to test the functional validation of the HBase service.
cd /usr/localhost/
cd Hbase
./bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.94.23, rf42302b28aceaab773b15f234aa8718fff7eea3c, Mon Jan 31 00:55:22 UTC 2022 hbase(main):001:0>
create 'history', 'home', 'away'
0 row(s) in 1.1300 seconds => Hbase::Table - emp
put 'history','1','home data:name','jim'
put 'history','row1','home:city','Boston'
1 column=personal data:name, timestamp=1417524185058, value=jim 1 column=personal data:city, timestamp=1417524216501, value=Boston
drop 'history'
0 row(s) in 0.3060 seconds
exists 'history'
Table history does not exist 0 row(s) in 0.0730 seconds
Maximum Number of HStoreFiles Compaction: 20
HStore Blocking Store Files: 200
HBase Memstore Block Multiplier: 4
HBase Memstore Flush Size: 256
create 'staff', 'id', 'name', 'age', 'city', 'department', 'salary'
0 row(s) in 1.1400 seconds => Hbase::Table - emp
/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' \
-Dimporttsv.columns=HBASE_ROW_KEY,<column names> i989<tablename> \
<location of file from HDFS>
count 'staff', INTERVAL => 1000000
Current count: 1000000, row: 100899997 Current count: 2000000, row: 101799997 . . . Current count: 999000000, row: 999099999 Current count: 1000000000, row: id 1000000000 row(s) Took 16498.6309 seconds => 1000000000