YugabyteDB is designed for fault tolerance and high availability. By maintaining at least three copies of data across multiple data regions or multiple clouds, it makes sure no losses occur if a single node or single data region becomes unavailable. Thus, with YugabyteDB, backups are required for:
- Recovery from a user or software error, such as accidental table removal or other permanent operations that were not intended.
- Recovery from a disaster scenario, like a full cluster failure or a simultaneous outage of multiple data regions.
- Maintaining a remote copy of data, as required by data protection regulations.
- Creating database copies for the test/development environment.
YugabyteDB has three features that allow users to set up a data protection strategy:
- Export and Import: Export and import data using SQL or CQL scripts. While export/import operations create a way to reconstruct the data, they are not as efficient in as using the following two methods.
- Distributed snapshots: Back up and restore using distributed snapshots. Distributed snapshots use operating system hard links to efficiently preserve or restore past images of the data. However, since they reside next to the current data, they can only be used if the infrastructure integrity is preserved, and only on the same host as the primary data. One of the advantages described in this paper is the ability to combine YugabyteDB distributed snapshots with PowerFlex snapshots so they can be preserved and used on other hosts.
- Point-in-time recovery (PITR): Restore data to a particular point in time. PITR is a very powerful feature of the YugabyteDB database where a combination of distributed snapshots and the database Flashback feature are used together to bring the database to any point in time in the past to which a distributed snapshot and Flash back data are available. Using PITR can have performance and capacity overheads based on frequency and retention settings. PowerFlex snapshots can be used in combination with PITR to allow its use on different hosts (including the original), and to reduce the associated overhead as data that is preserved by PowerFlex snapshot does not need to also be preserved as YugabyteDB distributed snapshots and Flashback.
In traditional RDBMS, it is possible to “clone” a database instance by “quiescing” (stopping writes) for a short period to create a disk snapshot, so that the administrators could create another instance of the database on another system without impacting the production environment. This is useful for investigations, testing, and blue/green upgrade strategies. Most of these databases are single node or shared disk environments that are relatively easy to implement. In the case of YugabyteDB, a distributed database that spans multiple nodes and fault domains presents more of a challenge. This is because YugabyteDB encodes the IP/DNS address in its internal structures on disk. The solution is to use a generic name for addressing the nodes and use local files (like /etc/hosts) to override the addresses assigned such that the volumes can be cloned and mounted in a new environment.
The following use cases are covered in this solution:
- Distributed snapshots
- Distributed snapshots with PowerFlex snapshots
- Point-in-time recovery with PowerFlex snapshots