Home > Workload Solutions > SQL Server > Guides > Dell ObjectScale and Integrated Systems for Data Analytics using Microsoft Azure HCI and SQL 2022 > Use case 4: Exploring new T-SQL functions
In this use case, we explore data analytics related to new T-SQL functions in SQL Server 2022 by using the weather dataset.
We have used both Wide World Importers and external weather dataset for exploring T-SQL functions.
Date buckets allow data to be grouped according to the ranges of dates, which is especially useful. The DATE_BUCKET function supports parameters that allow for different increments of time to be selected, including hourly, monthly, and yearly. It is possible to allow for periods of several weeks or years to fall into the same bucket by modify the unit of time’s scalar. The start and end dates of the range can be specified, allowing for business data to be precisely analyzed.
This can be helpful for any organization to analyze weekly, monthly, or yearly profitability patterns.
Example of Monthly DATE_BUCKET Query:
Query description: This query returns the average temperature for each city every month beginning on the first day of the month. It groups all the values that occur within the range of a month and then these groups can be aggregated.
SELECT DATE_BUCKET(MONTH, 1, DateOf) AS DayTemp, City,
AVG(Temperature) AS AvgMonthlyTemp
FROM Temperature_View
GROUP BY DATE_BUCKET(MONTH, 1, DateOf), City
ORDER BY DayTemp;
Figure 29. Result of monthly average temperature for the cities
Example Weekly DATE_BUCKET Query:
Query description: This query returns the Weekly counts of customers for each invoice. It demonstrates how custom time periods can be specified by users to acquire the preferred information.
SELECT DATE_BUCKET(WEEK, 1, InvoiceDate) AS InvoiceWeek,
COUNT(CustomerId) AS CustomerCount
FROM WideWorldImporters.Sales.Invoices
GROUP BY DATE_BUCKET(WEEK, 1, InvoiceDate)
ORDER BY InvoiceWeek;
Figure 30. Result of weekly customer count
The MIN and MAX functions do not always permit the retrieval of data in the preferred format. The GREATEST and LEAST T-SQL functions introduced for SQL 2022 allow for the retrieval of the maximum or minimum value across a row, respectively. This is useful for data analysts who could previously only find these values for columns. These functions can be used in SQL stored procedures and functions to select specific parameters.
Example GREATEST Query:
Query description: This query selects the maximum temperature value from a list of columns for the row entry at a specified DateTime which is useful for perform comparisons between the values in a row. The relevant result can then be added as a new column in the resulting table.
SELECT DateTime, 32 + 9/5 * (
GREATEST([San_Antonio], [San_Diego], [San_Francisco], [Kansas_City], [Saint_Louis],
[Las_Vegas], [Los_Angeles], [New_York], Seattle)-273.15) AS HighestTemp
FROM dt_temperature
WHERE DateTime = '2012-10-01 13:00:00:000';
Figure 31. Highest temperature recorded
Example LEAST Query:
Query description: This query selects the minimum temperature from a list of columns for each row entry at a specified Date Time.
SELECT DateTime, 32 + 9/5 * (
LEAST([San_Antonio], [San_Diego], [San_Francisco], [Kansas_City], [Saint_Louis],
[Las_Vegas], [Los_Angeles], [New_York], Seattle)-273.15) AS LowestTempf
FROM dt_temperature
WHERE DateTime = '2012-10-01 13:00:00:000';
Figure 32. Lowest temperature recorded
The GENERATE_SERIES function introduced in SQL Server 2022 allows for the creation of a series of numeric values. This is helpful for querying specific values or creating a new data on fly through joins. It can be used to create completely new data through string concatenation or mathematical operations. It is a powerful function for transforming data and has many use cases.
New String Creation with GENERATE_SERIES:
Query description: This example is the result of the query would be a list of strings representing employee identifiers, each formed by appending a number from the generated series to the string 'employee'. The TOP 20 clause ensures that only the first 20 entries from this series are selected in the output.
WITH Series AS (
SELECT value AS SeriesId FROM GENERATE_SERIES(1,1000)
)
SELECT TOP 20 'employee'+(CONVERT (varchar (10), SeriesId)) AS Employees
FROM Series;
Figure 33. Output for new string creation using GENERATE_SERIES
Finding the First Occurrence of a Repeated Character Sequence Using GENERATE_SERIES:
Query description: GENERATE_SERIES can be used to provide additional information regarding the indexing of values. In this case, the string’s repeat character sequences are associated with their ordinal position and the first occurrence of the character in a sequence of identical characters has its index retained to condense string information.
DECLARE @S VARCHAR(8000) = 'Aarrrggghiiiifffff!';
SELECT value, S
FROM (
SELECT value=1, S=LEFT (@S, 1)
UNION ALL
SELECT value,
CASE
WHEN SUBSTRING(@S, value - 1, 1) <> SUBSTRING(@S, value, 1)
THEN SUBSTRING(@S, value, 1)
END
FROM GENERATE_SERIES(1, 100)
WHERE value BETWEEN 2 AND LEN(@S)
) AS A
WHERE S IS NOT NULL;
Figure 33. Location of first occurrence of a repeated character sequence with help of GENERATE_SERIES
The STRING_SPLIT function can split a string into substrings by using a delimiter. The function creates a column containing the substring values and another column with their ordinal position if the ordinal option is enabled. This can be used to analyze frequent words within a column of strings. It can also be used to select and clean substrings to them in an understandable format. For example, each string in a row might contain an employee’s name and badge number. A string split could be used for all the strings in a column and only retrieve the names of employees to display in a view to a user who should not be privileged to see both the employees’ badge numbers and names by using the ordinal option.
Substring Extraction Example with STRING_SPLIT:
Query description: The city name of each value stored in the customer column is added to the row in this example query. The function splits strings and uses the ordinal property to extract the required substring. This allows data transformation entirely in Microsoft SQL Server.
SELECT D.[Primary Contact], D.Customer, D.FirstName, value City
FROM (
SELECT C2.[Primary Contact], C2.Customer, C2.FirstName, value C2v, ordinal C2o
FROM (
SELECT B.[Primary Contact], B.Customer, B.FirstName, value Bv, ordinal Bo
FROM (
SELECT *
FROM (
SELECT value AS FirstName, C.[Primary Contact], C.Customer
FROM Customer C
CROSS APPLY STRING_SPLIT(C.[Primary Contact], ' ',1)
WHERE ordinal = 1
) AS NS
WHERE Customer <> 'Unknown' AND Customer NOT LIKE '%Head Office%'
) B CROSS APPLY STRING_SPLIT(B.Customer, '(', 1) WHERE ordinal = 2 ) B2
CROSS APPLY STRING_SPLIT(Bv, ')', 1)
WHERE ordinal = 1) D CROSS APPLY STRING_SPLIT(C2v, ',', 1)
WHERE ordinal = 1;
Figure 35. Output of the city name of each value stored in the customer column