Home > Workload Solutions > SQL Server > White Papers > Solution Insight: SQL Server 2022 Data Analytics on Dell PowerEdge with AMD EPYC 7473X Processors and Dell ECS > Parquet file
Parquet file is an Apache open-source column-oriented datafile format that is designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Parquet is designed to be a common interchange format for both batch and interactive workloads.
SQL Server 2022 T-SQL enables the conversion of a .csv file into Parquet file format by using Create External Table as Select (CETAS) with OPENROWSET syntax. This is a powerful option to join relational data in SQL and non-relational data on object storage, such as Dell ECS.
CETAS can also be used to create external datasets directly, without ever landing within SQL, directly to a parquet file format.
Users can also use other analytics data processing engines like Apache Spark or .csv to convert files into Parquet format.
Figure 21 shows two examples for data conversion into Parquet format: using PySpark and using CETAS with OPENROWSET.
Working with data outside of SQL Server could be simplified using SQL Server external table. External table uses PolyBase to access data stored externally to SQL Server, in our case it would be ECS object storage.
Following configuration has to be created before creating the external table:
The following screenshots show how to create the external file format for parquet files.
Create external table pointing to Parquet files on S3 storage by providing file location, data source, and file format.
Select data from external table like any other tables in SQL Server database.