Home > Workload Solutions > Oracle > Guides > Design Guide — Oracle Big Data SQL on Dell EMC PowerFlex > Generate and load data
This section provides detailed information about generating data, cleaning the data, and loading TPCH data to different data sources like Hadoop, Oracle Database, Microsoft SQL Server, and Oracle NoSQL.
The following table describes how we distributed TPCH data among different data sources:
Data source |
Table name |
Data size |
Table type |
HDFS |
SUPPLIERS |
10 GB |
Small |
HDFS |
LINEITEM |
2.4 TB |
Large |
Oracle |
ORDERS |
595 GB |
Medium |
Oracle |
PART |
75 GB |
Medium |
Microsoft SQL Server |
CUSTOMER |
75 GB |
Medium |
Microsoft SQL Server |
PARTSUPP |
400 GB |
Medium |
Oracle NOSQL |
NATION |
2 GB |
Small |
Oracle NOSQL |
REGION |
3 GB |
Small |
We loaded TPCH data with a scale factor of 3,000 using the DBGEN toolkit. This scale factor generated nearly 3.5 TB of data in text files format with “|” as the row delimiter. We cleaned the text files by properly formatting the rows and columns using “vi” scripting. We then loaded CSV files to the different data sources. The detailed steps for data loading are shown in the next section.