Big Data as-a-Service(BDaaS) Use Cases on Robin Systems
Wed, 24 Apr 2024 15:27:10 -0000
|Read Time: 0 minutes
Do you have a Big Data mess? Do you have separate infrastructure for the likes of NoSQL databases like Cassandra, MongoDB, Neo4j & Riak? I’ll bet that kafka, spark and elastic search are on separate gear too. Let’s throw in PostgreSQL, MariaDB, MySQL, Greenplum and another db or two. We don’t want to forget machine learning with sckit-learn and DASK nor deep learning with Tensorflow and Pytorch.
What if I told you you could run all of them including test/dev, qa, prod w/ perhaps multiple instances and different versions all on the same multi-tenant, containerized platform?
Enter Robin Systems and their cloud native platform. Some of the features I find useful include:
- Similar to BlueData (HPE) but way better
- Multi-tenant
- Low cost
- Easy to manage
- Containerized via Kubernetes
- Compact and dense
- Disaggregated compute and storage or hybrid
- One platform and set of BOMs for all tenants, multi-tenant
- Can also do Oracle, Hadoop, elastic and more
- Can be delivered direct or via partner
- Infrastructure flexibility (compute-only, storage only, and/or hybrid nodes)
- Infrastructure + application / service / storage level monitoring and visibility via integrated ELK/Grafana/Prometheus (out of the box templates and customizable)
- QoS at the CPU, memory, disk, and network level + storage IOPs guarantees
- App-store enables deployment of new app instances (or entire app pipelines) in minutes
- Support for multiple run-time engines (LXC, Docker, kVM)
- Templates to customize with deep workload knowledge
- Application / storage / service thin cloning
- Native, application-aware backups and snapshots
- Scale up / scale down application / storage / service
- Can use optional VMs
- SAN storage via CSI is possible
As for the use cases some ideas
- Just Oracle dense. 500 dbs on 18 servers. SAN for storage. RAC or not
- MariaDB + Cassandra + MongoDB
- Just Hadoop…all containerized, multiple clusters incl test/prod/qa
- Hadoop + oracle
- Kafka, Hadoop, elastic, Cassandra, Oracle
- ML data pipelines
- DL such as TF w/ GPUs
- Spark
- Any NoSQL database
- RDBMSs such as MySQL, MariaDB, PostgreSQL, Greenplum, Oracle, etc..
- Streaming analytics as with kafka or flink
Contact info for Mike King, Advisory System Engineer for DA / AI / Big Data, Dell Technologies | NA Data Center Workload Solutions
- https://itsavant.wordpress.com
- /https://twitter.com/MikeDataKing
- http://www.linkedin.com/in/mikedataking/
Links
- https://infohub.delltechnologies.com/p/removing-the-barriers-to-hybrid-cloud-flexibility-for-data-analytics/ by Phil Hummel & Raj Naryanan
- https://itsavant.wordpress.com/2021/04/30/big-data-as-a-service-with-robin-systems/
- "Five Reasons to Choose Dell and Robin CNP for AI/ML" by Mike King and Raj Narayanan
Related Blog Posts
Live Optics is Your Friend
Tue, 05 Oct 2021 19:01:57 -0000
|Read Time: 0 minutes
It’s a rare day that a free tool exists that can help profile customer workloads to the mutual benefit of all. Live Optics (previously DPack) is a gem in the rough that is truly a win-win proposition for customers and vendors such as Dell. I’ve been using it for years and found that it’s a rare day that I don’t learn something of use.
The tool is similar to SAR on steroids. Data is collected for each host. Hosts can be VMs. Servers can be from any manufacturer. The data collected is on IOPS (size and amount), memory usage, CPU usage and network activity. It can be run in local mode where the data doesn’t go anywhere else or it can be stored in a Dell private cloud. The later is more beneficial as it may be accessed by folks in many roles for various assessments. The data may also be mined to help Dell make better decisions of current and future products based on actual observed user profiles.
I use LiveOptics to profile database workloads like Greenplum and Vertica, Hadoop, NoSQL databases like MongoDB, Cassandra, Marklogic and more.
Upon inspection of the workload the data collected helps facilitate more meaningful discussions with various SMEs and to right size future designs. In one case I found a customer that was using less than half their memory during peak periods…so we suggested new server BOMs with much less memory as they didn’t need what they had.
Can we help you with assessing your workloads of interest on our servers or those of our competitors?
Some links of interest
- https://www.delltechnologies.com/en-us/live-optics/index.htm#pdf-overlay=//www.delltechnologies.com/asset/en-us/solutions/business-solutions/briefs-summaries/cloud-live-optics-data-for-it-decisions-ebrochure.pdf
- https://www.liveoptics.com/
- https://www.starwindsoftware.com/resource-library/understand-your-it-environment-with-dell-live-optics/
How about SingleStore for your database on 15G Dell PE Servers?
Fri, 02 Dec 2022 04:58:29 -0000
|Read Time: 0 minutes
How about SingleStore for your database on 15G Dell PE Servers?
Singlestore is a distributed relational database that was previously called MemSQL. It is well suited to analytics workloads. There are two data structure constructs available. First is the column store which is on disk. Disk is typically SSDs. Second is a row store that is in memory and essentially a key-value database. Yes you can have both types in the same db and join across the two different table types. Data for the column store is arranged in leaves where the low level detail is stored and aggregators which are summarized data structures. Clients use the aggregators for queries via SQL.
Singlestore uses the MySQL protocol which makes it compatible with anything that can connect to MySQL.
Customers choose this database when they have demanding high performance analytics needs. We have many large financial customers that are very happy with it.
So what does it look like with the latest 15G IceLake servers for Dell.
Although it could run on most any server the leading candidate would be a Dell PowerEdge R650 for db sizes up to 400TBu. Environments that have larger db needs would use a Dell PowerEdge R750.
ROTs
- Aggregators use single 25GbE NIC
- Leaf nodes use a single 10GbE NIC
- Aggregator nodes use about ¼ RAM & ¼ cores as leaf nodes
Other items
- RAID is optional but most customers elect it. The figures below assume RAID10.
- Use an m.2 BOSS card w/ a R1 pair of 480GB RI SSDs for the OS and software. They are now hot swappable.
- As for durability & cost reasons 99.99% of the time read intensive value SAS SSDs will be the right fit.
5TB Env
- 2 aggregators w/ 4 x 480GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
- 4 leaf nodes w/ 4 x 960GB RI SSD, 256GB RAM, 2 x 12c, 10GbE NIC
100TB Env
- 3 aggregators w/ 2 x 960GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
- 7 leaf nodes w/ 8 x 3.84TB RI SSD, 512GB RAM, 2 x 24c, 10GbE NIC
400TB Env
- 4 aggregators w/ 2 x 960GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
- 14 leaf nodes w/ 8 x 7.68TB RI SSD, 1024GB RAM, 2 x 28c, 10GbE NIC
If you need your Singlestore database on Dell PE Servers do let us know.