Home > Workload Solutions > Data Analytics > White Papers > Application Modernization for High-Volume Workloads with DataStax > The challenges of distributed data
While a microservices architecture can help transform legacy applications into modern, scalable applications, another challenge of application modernization is designing a distributed database model that can span data centers, provide high availability, and deliver high-performance data processing capabilities.
Apache Hadoop, a scalable data-storage and analytics system, provides a model for storing and processing large, distributed datasets. The architecture of Apache Hadoop is distributed, meaning that data is distributed across multiple nodes. These clusters of Apache Hadoop nodes can be in a single data center, or they can span multiple data centers.
While Apache Hadoop paved the way for large-scale analytics across large, distributed datasets, it is better suited for batch processing than for real-time analysis of unstructured data. NoSQL databases, such as Apache Cassandra, fill the gap for real-time storage and analysis of high-velocity unstructured, semistructured, and structured data.
Like Apache Hadoop, NoSQL databases can scale horizontally to provide fast read/write access to various datatypes. Apache Hadoop and NoSQL databases differ in that Apache Hadoop is not intended to be a database-management system (DBMS); rather, it is designed to store and analyze massive amounts of unstructured data. NoSQL databases share the same ability as Apache Hadoop to store and analyze large, distributed datasets. Additionally, NoSQL databases are also designed to store and analyze unstructured, semistructured, and structured data.
The flexibility and variety of NoSQL databases and data models provide an answer to the inherent limitations of relational databases. Data models designed for relational databases have historically contained rigid schemas. While these rigid schemas have served enterprises well for decades, new data sources are pushing the limits of relational database designs. Data sources can now generate data with various structures that do not follow any specification, data model, or organization. Examples of unstructured data include raw sensor data from IoT devices and hand-written patient information created by medical professionals. Semistructured data typically contains tags or markers within the data that identify information, but the data does not comply with a specific relational database schema. Examples of semistructured data include Extensible Markup Language (XML) or JavaScript Object Notation (JSON) documents and email messages.
To avoid the limitations of relational databases, structured and unstructured data must be modeled differently when using a NoSQL database. NoSQL data models include:
Enterprises with unstructured, semistructured, and structured data can now rely on NoSQL databases to enable modern microservices-enabled applications that meet the needs of diverse organizations and customers with many different data models.