This is the second in the three part series: Database? How?
In Part 1 of this series, we discovered that using a Microsoft SQL Server (MS SQL or SQL Server) database will provide tools to handle large data sets, small data sets, complex data sets, dynamic data sets and even data that we share with other systems. But just what are those tools? How do I leverage SQL Server to processing these types of data with DriveWorks? What keywords should I put into my search engine to learn more More MORE? That’s what we’ll answer in this column.
Why Use a Database?
Performance is one good reason to utilize a database. And we’ve established that SQL Server can run data crunching tasks very quickly. But the price that you pay comes in the form of the time required to have DriveWorks connect to MS SQL, give information to your SQL Server, and receive information back from the SQL Server. The ways to minimize these time penalties address those three specific steps.
To reduce the amount of time to connect to MS SQL (and transfer data), place your instance SQL Server on the network very close to the DriveWorks machine that will be making the requests. That will be the machine that is running your DriveWorks Service and your DriveWorks Live Server and your DriveWorks Autopilot. This could be to the point of even having them on the same machine, in some cases. To reduce the amount of time required to pass information back and forth between DriveWorks and MS SQL, look ahead and plan the way that you store your data and then try to bring back only the information that you need with as few connections as you can.
Use QueryDataValues Function
The QueryData function will return a list of values in a single column of a database result table. The QueryDataValues function, by contrast, will return the entire result table. In many cases, you can utilize QueryDataValues to bring back all of the information that you need instead of bringing back each column or value with a separate QueryData rule. At the same time, a smart query will only bring back the columns and rows that we need. Let SQL Server do the filtering and the column manipulation and even some of the calculation and string work.
And this is where the tricky bit comes in, getting MS SQL to perform the work for you before returning your results. There are only three things that you will ever want to do with tabular data. You can read data from your database to get lists of options or find values for certain combinations of parameters. You can write information to your database for DriveWorks or any other system to read. And you can update information that is already in the database when something changes (we’ll let that include deleting data). And all of this must be done with a special database language called SQL (Structured Query Language).
(Note: SQL syntax can be difficult to remember, if it is not something that you use frequently. This is why sites like w3schools are on so many peoples’ bookmark bars.)
Reading information from SQL uses the SELECT statement. QueryData and QueryDataValues are typically used to send SELECT queries to SQL Server and present the results back to DriveWorks. The other SQL statements can be called with the DBExecute function, and their names are pretty predictable. An INSERT statement inserts a row into a table, an UPDATE statement updates existing data, and a DELETE statement removes rows from a table. Note that these SQL statements are portrayed to act on a row as a whole. Database people say that a row is returned, added, updated or deleted. One thing that is important to note about DBExecute is that it returns the number of rows that were affected (deleted, added, updated) by the query. No data is returned with DBExecute.
The Power of MS SQL and Driveworks…
The power of MS SQL and DriveWorks lies in having SQL Server sort, filter, cross-reference, cleanse and augment the data before it gets back to DriveWorks. The QueryData wizard will walk you through the creation of a simple SELECT statement. One step in the process allows you to use a “Grouping”. These include Min (smallest value), Max (largest value), Count (returns only a number of rows that match the criteria) and Distinct (removes duplicates). SQL also has other groupings, like TOP (select the first n or n% of results), AVG (averages the resulting values) and SUM (totals the resulting values) to allow you to bring over only what you need.
SQL also provides string and arithmetic functions to process your data before it comes into DriveWorks. These functions are performed on each and every row or value before the values are returned. So your SELECT statement can manipulate strings to concatenate, grab portions of the string, replace parts of the string, change cases, reformat, trim, pad, and more. Your SELECT statement can also perform math on the value in a single column (ex. absolute value, rounding) or with values from multiple columns within a row.
The concept of a column alias, means that rather than just returning the values in a column named ProductSeries, you can create your own column during the query and return that to DriveWorks. In a recent DriveWorks implementation, Razorleaf used SQL SELECT queries to compile part numbers by pulling values from different columns (and different tables). So rather than returning columns with the product series number, and the configuration code, and the material code, and the color code, and the finish code, and the diameter and the length, we had SQL Server do all of the work (formatting the dimensions to three digits, adding dashes, looking up the finish codes for the given material and finish, etc.) and return a single column called PartNumber or a list of values to a QueryData.
Use the JOIN clause
Combining multiple tables of data is another forte of SQL Server. The JOIN clause allows SQL server to take multiple queries (tables or other SELECT statements) and merge them in a variety of ways. Joins can be used to pull extra information from another table. For example, we could have SQL Server retrieve each part in the table of parts, look at its material, go to the material properties table and find the density of that material, then return all of that information as one table.
JOIN clauses can also be used to combine similar tables. For example, I could have SQL Server return a table of all of the material names and their densities from the Steels table, the Aluminums table and the Plastics table, also returning the Alloy Designations from the Steels and Aluminums table and the mold style for the materials in the Plastics table. SQL Server will return a table with four columns for the material name, density, alloy designation, and mold style. When a table is missing a column, that row will simply return a NULL value for that column.
Most famously, JOIN clauses can be used to overlap tables. In some cases, this can be used to create a list of combinations between tables. For example, I can use a JOIN to bring a table of 20 materials and 42 coatings to create a table with 840 rows, all the unique combinations of material and coating. JOIN clauses can return common values, merge tables based on common values, or return unique rows. Any search for the term “SQL JOIN” will return a series of Venn diagrams showing all of the ways that a JOIN can be used.
A powerful way to use JOIN clauses lies in the creation of database views. A database view is simply a table that you have SQL create for you using a SELECT statement. This could be a simple SELECT statement that gives you a smaller view, only showing certain columns and filtering for certain conditions. Or this could be a large or complex view that uses many JOIN clauses to collect data from many tables, relating them and building a single table for DriveWorks to query. The primary benefit of a view is that you build the table ahead of time, in the database itself, and any changes made to any of the tables that the view references are automatically reflected in the view. This means that a complex query that you run often can be ready and waiting for you with no computational time required.
Database views can also be used to build very large data tables that you would otherwise not consider building or maintaining. Thinking back to the idea of compiling part numbers, people often wish that they had one table with all of their available part numbers. But you would never consider doing that because there are 893,742 of them. Remember that MS SQL is good at handling large data sets. And with just a few JOIN clauses in a view, you can have a Part Number column that combines all of the material codes in the materials table, all of the handle types from the hardware table (where the hardware type = “handle”), all of the caster types also from the hardware table (where the hardware type = “caster”), all of the fabric types from the fabrics table, and all of the colors in the colors table (but only when the value in the manufacturer column of the fabric matches a value in the available manufacturers column of the colors table). SQL Server can build hundreds of thousands of combinations in seconds. But even if it takes a few minutes, once that table is created, SQL Server will maintain it. So now DriveWorks has access to a complete, automatically updated table with every part number in your catalog.
Microsoft SQL Server is designed to manage, manipulate, aggregate and serve large amounts of data. That’s what it is designed to do. Structured Query Language (SQL) holds the key to making SQL Server mold our data into a more usable, more optimal form. In this part of our series, we looked at how we can leverage just a few of the functions inside of SQL to massage our data before bringing it into DriveWorks. And with database views, SQL Server allows us to create new forms of tables that we can use to simplify our DriveWorks queries even more. In the next part, we will take a look at how we can structure and filter our tables.