Many use cases in large enterprises follow a pattern of high velocity, high volume data streaming from various sources. These data streams:
The resultant data fetch activity can involve thousands of users across business units, who would prefer to fetch the latest calculated data; Kinetica DB was created with this use case in mind. Kinetica can ingest data using all of the cluster nodes (with multiple processes on each node). It can process analytic workloads at an impressive rate. This is due to its memory oriented vectorized engine which uses GPUs for high compute workloads and provides users a way to fetch keyed data items from multiple nodes (with multiple processes on each node).
Among the attributes discussed above, including ingest, analyze, and egress; ingest is heavily dependent on the underlying disk storage characteristics and speed. For example, when the dataset (which must be working) is much larger than the database clusters system memory, fetching data from the disk during calculation updates becomes disk performance sensitive. Generally, the speed of ingest is primarily limited by the disk write speed. This is the aspect that we set out to test with Kinetica working on Dell PowerFlex systems.
Dell PowerFlex systems provide a SDS system. Conducting the experiment in a lab environment addressed two primary concerns: 1) that Kinetica installs and runs out of the box with CPU and the GPU build distributions and 2) Poweflex systems with the SDS can maintain sustained ingest speeds. We ingested data with multi-node, multi-process synthetic data generators into two node Kinetica clusters on top of Dell PowerFlex systems.