In a data-driven world, innovation changes are forcing a new paradigm
Wed, 19 Aug 2020 22:15:07 -0000
|Read Time: 0 minutes
For the last two decades, I’ve enjoyed working at Dell Technologies focusing on customer big-picture ideas. Not just focusing on hardware changes, but on a holistic solution including hardware, software, and services that achieves a business objective by addressing customer goals, problems, and needs. I’ve also partnered with my clients on their transformation journey. The concepts of digital transformation and IT transformation have been universal themes and turning these ideas into realities is where the rubber meets the road.
Now as I engage with customers and partners about Microsoft solutions, an incremental awareness of the idea of “data”, and how data is accessed and leveraged, has become evident. A foundational shift around data has occurred.
We are now living in a new era of data management, but many of us were not aware this change was developing. This has crept up on us without the fanfare you might see from a new technology launch. When you take a step back and look at these shifts in their entirety you see these changes aren’t just isolated updates, but instead are amplifying their benefits within each other. This is a fundamental transformation in the industry, similar to when virtualization was first adopted 15 years ago.
For many, this change started to become apparent with the end of support for SQL Server 2008 earlier this year (along with support for all previous versions of the product). This deadline, coupled with the large install base that still exists on this platform, is helping the conversation along but it’s not just a replace the old with the new in a point-by-point swap out. The doors opened in this new era force a completely different view and approach. We no longer need to have a SQL, Oracle, SAP, or Hadoop conversation – instead it becomes a holistic “data” point of view.
In our hybrid/multi-cloud world, there is not just one answer for managing data. Regardless of the type of data or where it resides, all the diverse data languages and methods of control, the word “data” can encompass a great deal.
Emerging technologies including IoT, 5G, AI and ML are generating greater amounts and varied types of data. How we access that data and derive insight from it becomes critical, but we have been limited by people, processes, and technology.
People have become stuck in the rut of, “I want it to be this way because it has always been this way.” Therefore, replacing dated/expired architectures becomes a swap out story verses a re-examine story and new efficiencies are completely missed. Processes within the organization become rigid with that same mindset and, dare I say politics, where access to that data becomes path- limited. Technology is influenced by both people and process as “the old way is good enough, right?”
The value/importance of “data” really points back to the insight that you drive from it. Having a bunch of ones and zeros on a hard drive is nice but what you derive from that data is critically important. The conversations I have with customers are not so much, “Where is my data and how is it stored?” The conversation is more commonly, “I have a need to get business analytics from my proprietary data so I can impact my customers in a way I never did before.”
To put my Stephen Covey hat on, we are in a paradigm change. What is occurring is incredibly impactful for how customers should view and treat data. There are three key areas that we will examine with the new paradigm today and we’ll start with data gravity.
Data Gravity
Data gravity is the idea that data has weight. Wherever data is created, it tends to remain. Data stores are getting so big that moving data around is becoming expensive, time constrained, and database performance impacting. This in turn, results in silos of data by location and type. Versioning and lack of upgrade/migration/consolidation of databases also perpetuates these silo challenges.
As with physical gravity, we understand that data’s mass encourages applications and analytics to orbit that data store where it resides. Then, application dependency upon the data’s language version cements the silo requirement even further. We have witnessed the proliferation of intelligent core and edge devices, as well as bringing applications to that place where the data resides – at the customer location.
Silos of data based on language, version, and location can’t be readily accessed from a common interface. If I am a SQL user, how do I get that Oracle data I need? I cannot just pull all my data together into a huge common dataset – it’s just too big. We see these silos in almost every customer environment.
Data Virtualization
This is where data virtualization comes into the story. Please note this is not a virtual machine (a common confusion on the naming). Think instead of this being data democratization: the ability to allow all the people access to all the data – within reason, of course. Data virtualization allows you to access the data where the data is stored without a massive ETL event. You can see and control the data regardless of language, version, or location. The data remains where it is, but you have real-time source access to this data. You can access data from remote or diverse sources and perform actions on that data from one common point of view.
Data virtualization allows access into the silos that, in the past, have been very rigid, blocking the ability to effectively use that data. From a non-SQL Server point of view, having unstructured data or structured data in a different format (like Oracle), required you to hire a specialized person with a specific skill set to access that data. With data virtualization, that is no longer a barrier as these silo walls are reduced. Data virtualization becomes data democratization, meaning that all people (with appropriate permissions) can access and do things with that data.
From a Microsoft point of view, that technology came into reality with Polybase. Polybase with SQL Server allows access with T-SQL, the most commonly used database language. I started using this resource with the Analytics Platform System (APS) many years ago. After Microsoft placed this tool into SQL Server in 2016 and updated its functionality tremendously in SQL Server 2019, we now can ingest Hadoop, Oracle, and use orchestrators like Spark, to access all these disparate data sources. To visualize this, think of Polybase with SQL Server 2019 as a wrapper around these diverse silos of data. You now can access all these disparate data sources within one common interface: T-SQL using Polybase.
Holistic Solution
The final tenet of this fundamental change is the advent of containerization. This enablement technology allows abstraction beyond virtualization and runs just about anywhere. Data becomes nimble and you can move it where needed.
It’s amazing how pervasive containers have become. It’s no longer a science experiment, but is quickly becoming the new normal. In the past, many customers had a forklift perception that when a new technology comes into play, it requires a lift and replace. I’ve heard, “What I am doing today is no longer good, so I have to replace it with whatever your new product is, and it will be painful.”
I’ve been using the phrase that containerization enables “all the things”. Containerization has been adopted by so many architectures that it’s easier to talk about where you can’t do it verses where you can. Traditional SAN, converged, hyperconverged, hybrid cloud — you can place this just about anywhere. There is not just one right path here — do what makes sense for you. It becomes a holistic solution.
There are multiple ways to address the business need that customers have even if it’s leveraging existing designs that they’ve been using for years. Dell Technologies has published details of several architectures supporting SQL Server and has just recently published the first of many papers on SQL Server in containers.
The answer is, you can do all these things with all these architectures. By the way, this isn’t specific to Microsoft and SQL Server. We see similar architectures being created in other databases and technology formats.
These three tenets are each self-supporting to the new paradigm. Data gravity is supported by data virtualization and containerization. Data virtualization allows silos when needed (gravity) and is enabled by containerization. Containerization gives access to silos (wrapper) and is the mechanism to activate data virtualization.
From a Dell Technologies point of view, we are aggressively embracing these tenets. Our enablement technologies to support this paradigm are called out in three discrete points – accelerate, protect, and reuse. We will review these points in a separate blog.
There is much more to come as we continue this journey into the new era of data management. Dell Technologies has deeply invested in resources around this topic with several recent publications and reference designs embracing this paradigm change. Our leadership on this topic is the result of our 30+ year relationship with Microsoft and our continuing “better together” story. A detailed white paper that further expands the ideas within this blog is available here.
Related Blog Posts
Time to Rethink your SQL Backup Strategy – Part 2
Wed, 10 May 2023 15:17:38 -0000
|Read Time: 0 minutes
A while back, I wrote a blog about changes to backup/restore functionality in SQL Server 2022: SQL Server 2022 – Time to Rethink your Backup and Recovery Strategy. Now, more exciting features are here in PowerStoreOS 3.5 that provide additional options and enhanced flexibility for protecting, migrating, and recovering SQL Server workloads on PowerStore.
Secure your snapshots
Backup copies provide zero value if they have been compromised when you need them the most. Snapshot removal could happen accidentally or intentionally as part of a malicious attack. PowerStoreOS 3.5 introduces a new feature, secure snapshot, to ensure that snapshots can't be deleted prior to their expiration date. This feature is a simple checkbox on a snapshot or protection policy that protects snapshots until they expire and can't be turned off. This ensures that your critical data will be available when you need it. Secure snapshot can be enabled on new or existing snapshots. Here’s an example of the secure snapshot option on an existing snapshot.
Once this option is selected, a warning is displayed stating that the snapshot can’t be deleted until the retention period expires. To make the snapshot secure, ensure that the Secure Snapshot checkbox is selected and click Apply.
Secure snapshot can be applied to individual snapshots of volumes or volume groups. The secure snapshot option can also be enabled on one or more snapshot rules in a protection policy to ensure that snapshots taken as part of the protection policy have secure snapshot applied.
Since existing snapshots can be marked as secure, this option can be used on snapshots taken outside of PowerStore Manager or even snapshots taken with other utilities such as AppSync. Consider enabling this option on your critical snapshots to ensure that they are available when you need them!
There's no such thing as too many backups!
If you're responsible for managing and protecting SQL Server databases, you quickly learn that it's valuable to have many different backups and in various formats, for various reasons. It could be for disaster recovery, migration, reporting, troubleshooting, resetting dev/test environments, or any combination of these. Perhaps you’re trying to mitigate the risk of failure of a single platform, method, or tool. Each scenario and workflow has different requirements. PowerStoreOS 3.5 introduces direct integration with Dell PowerProtect DD series appliances, including PowerProtect DDVE which is the virtual edition for both on-premises and cloud deployments. This provides an agentless way to take crash consistent, off-array backups directly from PowerStore and send them to PowerProtect DD.
To enable PowerStore remote backup, you need to connect the PowerProtect DD appliance to your PowerStore system as a remote system.
Next, you add a remote backup rule to a new or existing protection policy for the volume or volume group you want to protect, providing the destination, schedule, and retention.
Once a protection policy is created with remote backup rules and assigned to a PowerStore volume or volume group, a backup session will appear.
Under Backup Sessions, you can see the status of all the sessions or select one to back up immediately, and click Backup.
Once a remote backup is taken, it will appear under the Volume or Volume Group Protection tab as a remote snapshot.
From here, you can retrieve it and work with it as a normal snapshot on PowerStore or enable Instant Access whereby the contents can be accessed by a host directly from PowerProtect DD. You can even retrieve remote snapshots from other PowerStore clusters!
This is yet another powerful tool included with PowerStoreOS 3.5 to enhance data protection and data mobility workflows.
For more information on this feature and other new PowerStore features and capabilities, be sure to check out all the latest information on the Dell PowerStore InfoHub page.
Author: Doug Bernhardt
Sr. Principal Engineering Technologist
https://www.linkedin.com/in/doug-bernhardt-data/
SQL Server 2022 – Time to Rethink your Backup and Recovery Strategy
Mon, 19 Sep 2022 14:06:43 -0000
|Read Time: 0 minutes
Microsoft SQL Server 2022 is now available in public preview, and it’s jam-packed with great new features. One of the most exciting is the Transact-SQL snapshot backup feature. This is a gem that can transform your backup and recovery strategy and turbocharge your database recoveries!
The power of snapshots
At Dell Technologies we have known the power of storage snapshots for over a decade. Storage snapshots are a fundamental feature in Dell PowerStore and the rest of the Dell storage portfolio. They are a powerful feature that allows point-in-time volume copies to be created and recovered in seconds or less, regardless of size. Since the storage is performing the work, there is no overhead of copying data to another device or location. This metadata operation performed on the storage is not only fast, but it’s space-efficient as well. Instead of storing a full backup copy, only the delta is stored and then coalesced with the base image to form a point-in-time copy.
Starting with SQL Server 2019, SQL Server is also supported on Linux and container platforms such as Kubernetes, in addition to Windows. Kubernetes recognized and embraced the power of storage-based snapshots and provided support a couple of years ago. For managing large datasets in a fast, efficient manner, they are tough to beat.
Lacking full SQL Server support
Unfortunately, prior to SQL Server 2022, there were limitations around how storage-based snapshots could be used for database recovery. Before SQL Server 2022, there was no supported method to apply transaction log backups to these copies without writing custom SQL Server Virtual Device Interface (VDI) code. This limited storage snapshot usage for most customers that use transaction log backups as part of their recovery strategy. Therefore, the most common use cases were repurposing database copies for reporting and test/dev use cases.
In addition, in SQL Server versions earlier than SQL Server 2022, the Volume Shadow Copy Service (VSS) technology used to take these backups is only provided on Windows. Linux and container-based deployments are not supported.
SQL Server 2022 solves the problem!
The Transact-SQL (T-SQL) snapshot backup feature of SQL Server 2022 solves these problems and allows storage snapshots to be a first-class citizen for SQL Server backup and recovery.
There are new options added to T-SQL ALTER DATABASE, BACKUP, and RESTORE commands that allow either a single user database or all user databases to be suspended, allowing the opportunity for storage snapshots to be taken without requiring VSS. Now there is one method that is supported on all SQL Server 2022 platforms.
T-SQL snapshot backups are supported with full recovery scenarios. They can be used as the basis for all common recovery scenarios, such as applying differential and log backups. They can also be used to seed availability groups for fast availability group recovery.
Time to rethink
SQL Server databases can be very large and have stringent recovery time objectives (RTOs) and recovery point objectives (RPOs). PowerStore snapshots can be taken and restored in seconds, where traditional database backup and recovery can take hours. Now that they are fully supported in common recovery scenarios, T-SQL snapshot backup and PowerStore snapshots can be used as a first line of defense in performing database recovery and accelerating the process from hours to seconds. For Dell storage customers, many of the Dell storage products you own support this capability today since there is no VSS provider or storage driver required. Backup and recovery operations can be completely automated using Dell storage command line utilities and REST API integration.
For example, the Dell PowerStore CLI utility (PSTCLI) allows powerful scripting of PowerStore storage operations such as snapshot backup and recovery.
Storage-based snapshots are not meant to replace all traditional database backups. Off-appliance and/or offsite backups are still a best practice for full data protection. However, most backup and restore activities do not require off-appliance or offsite backups, and this is where time and space efficiencies come in. Storage-based snapshots accelerate the majority of backup and recovery scenarios without affecting traditional database backups.
A quick PowerStore example
Backup
The overall workflow for a T-SQL snapshot backup is:
- Issue the T-SQL ALTER DATABASE command to suspend the database:
ALTER DATABASE SnapTest SET SUSPEND_FOR_SNAPSHOT_BACKUP = ON - Perform storage snapshot operations. For PowerStore, this is a single command:
pstcli -d MyPowerStoreMgmtAddress -u UserName -p Password volume_group -name SQLDemo -name SnapTest-Snapshot-2208290922 -description “s:\sql\SnapTest_SQLBackupFull.bkm” - Issue the T-SQL command BACKUP DATABASE command with the METADATA_ONLY option to record the metadata and resume the database:
BACKUP DATABASE SnapTest TO DISK = 's:\sql\SnapTest_SQLBackupFull.bkm' WITH METADATA_ONLY,COPY_ONLY,NOFORMAT,MEDIANAME='Dell PowerStore PS-13',MEDIADESCRIPTION='volume group: SQLDemo',NAME='SnapTest-Snapshot-2208290922',DESCRIPTION=' f85f5a13-d820-4e56-9b9c-a3668d3d7e5e ' ;
Since Microsoft has fully documented the SQL Server backup and restore operations, let’s focus on step 2 above, the PowerStore CLI command. It is important to understand that when taking a PowerStore storage snapshot, the snapshot is being taken at the volume level. Therefore, all volumes that contain data and log files for your database require a consistent point-in-time snapshot. It is a SQL Server Best Practice for PowerStore to place associated SQL Server data and log volumes into a PowerStore volume group. This allows for simplified protection and consistency across all volumes in the volume group. In the PSTCLI command above, a PowerStore snapshot is taken on a volume group containing all the volumes for the database at once.
Also, a couple of tips for making the process a bit easier. The PowerStore snapshot and the backup metadata file need to be used as a set. The proper version is required for each because the metadata file contains information such as SQL Server log sequence numbers (LSNs) that need to match the database files. Therefore, I’m using several fields in the PowerStore and SQL Server snapshot commands to store information on how to tie this information together:
- When the PowerStore snapshot is taken in step 2 above, in the name field I store the database name and the datetime that the snapshot was taken. I store the path to the SQL Server metadata file in the description field.
- In step 3, within the BACKUP DATABASE command, I put the PowerStore friendly name in the MEDIANAME field, the PowerStore volume group name in the NAME field, and the PowerStore volume group ID in the DESCRIPTION field. This populates the metadata file with the necessary information to locate the PowerStore snapshot on the PowerStore appliance.
- The T-SQL command RESTORE HEADERONLY will display the information added to the BACKUP DATABASE command as well as the SQL Server name and database name.
Recovery
The overall workflow for a basic recovery is:
- Drop the existing database.
- Offline the database volumes. This can be done through PowerShell, as follows, where X is the drive letter of the volume to take offline:
Set-Disk (Get-Partition -DriveLetter X | Get-Disk | Select number -ExpandProperty number) -isOffline $true - Restore the database snapshot using PowerStore PSTCLI:
- List volume groups.
pstcli -d MyPowerStoreMgmtAddress -u UserName -p Password! volume_group show - Restore the volume group where f85f5a13-d820-4e56-9b9c-a3668d3d7e5e is a volume group ID from above.
pstcli -d MyPowerStoreMgmtAddress -u UserName -p Password! volume_group -name SQLServerVolumeGroup restore -from_snap_id f85f5a13-d820-4e56-9b9c-a3668d3d7e5e
- List volume groups.
- Online the database volumes. The following PowerShell command will online all offline disks:
Get-Disk | Where-Object IsOffline -Eq $True | Select Number | Set-Disk -isOffline $False - Issue the T-SQL RESTORE DATABASE command referencing the backup metadata file, using the NORECOVERY option if applying log backups:
RESTORE DATABASE SnapTest FROM DISK = 's:\sql\SnapTest_PowerStore_PS13_SQLBackup.bkm' WITH FILE=1,METADATA_ONLY,NORECOVERY - If applicable, apply database log backups:
RESTORE LOG SnapTest FROM DISK = 's:\sql\SnapTest20220829031756.trn' WITH RECOVERY
Other items of note
A couple of other items worth discussing are COPY_ONLY and differential backups. You might have noticed above that the BACKUP DATABASE command contains the COPY_ONLY parameter, which means that these backups won’t interfere with another backup and recovery process that you might have in place.
It also means that you can’t apply differential backups to these T-SQL snapshot backups. I’m not sure why one would want to do that; I would just take another T-SQL snapshot backup with PowerStore at the same time, use that for the recovery base, and expedite the process! I’m sure there are valid reasons for wanting to do that, and, if so, you don’t need to use the COPY_ONLY option. Just be aware that you might be affecting other backup and restore operations, so be sure to do your homework first!
Stay tuned
There will be a lot more information and examples coming from Dell Technologies on how to integrate this new T-SQL snapshot backup feature with Linux and Kubernetes on PowerStore as well as on other Dell storage platforms. Also, look for the Dell Technologies sessions at PASS Data Community Summit 2022, where we will have more information on this and other exciting new Microsoft SQL Server 2022 features!
Author: Doug Bernhardt
Sr. Principal Engineering Technologist
https://www.linkedin.com/in/doug-bernhardt-data/