In part 1 of this series, we looked at the reasons behind choosing Microsoft SQL Server to manage data with DriveWorks. And in part 2 of the series, we looked at the mechanisms available within DriveWorks to pull data from SQL. This installment will look at how to get information from DriveWorks into SQL, and we will get a bit more technical and open-ended, exploring some of the more advanced tools available within Microsoft SQL Server.
The SQL Server Data Export Output Document
DriveWorks provides a perfectly workable solution for writing information to a SQL table in the form of the SQL Server Data Export Output Document. Found in Stage 4: Output Rules > Documents – Data section, this output document provides the ability to write one or multiple rows to a single table. Rows can be inserted, updated, or skipped depending upon the results of the rules in the control fields. We can now dynamically drive the server, database, and authentication with rules.
There are a few important things to note about the output document. The first is that it is very dependent upon the schema (i.e. column design) of the table. If the table in SQL changes, you must update the document, so that it can adjust to the new columns or new column settings. Another important fact is that it tries its best to work with SQL to manage data types, but ultimately, it’s up to you to make sure that the data that you are sending is good. This means that you may need to use formatting functions like DateTime(), Value(), MRound(), etc. to ensure that you are sending valid data to SQL. SQL is very particular about its data types. When you create a field that is a string, you specify how long the string is. When you create a number column in SQL, you specify the size and precision of the number. If you try to send a number that is too large or has too many decimal places, the entire write will fail.
And this is where we run into the biggest hurdle with the output document. When an output document fails to write to SQL (typically from no fault of DriveWorks), you may get some information in the specification report, but you don’t get any immediate feedback that would allow you to react and rectify the situation. The Release Documents specification task does not currently support status output navigation, so the only way to tell if an output document has written to the database is to try to read the information back in.
It’s All Relative
The biggest change in the history of databases came with the introduction of relational databases. The concept of relational databases boils down to storing data in a number of smaller tables, rather than in one large table. This allows your data to be stored very efficiently, without repetition. It is definitely worthwhile to search out a more complete explanation of “database normalization”, and I highly recommend that you do so, at some point (just not now, because you’re reading this). But for simplicity’s sake, we can look at an example.
Consider something as simple as customer information. We have a customer and they have a company name and a contact name and phone number and an address. This makes for a handy table until we start to realize that we have multiple contacts for Razorleaf and each of those contacts has multiple phone numbers. And, in the other direction, many of those contacts have the same address. When we look at our table, we see the same company name and address repeated over and over. There are a few problems with this. First, when we discussed bringing in information to DriveWorks, we noted that we want to bring over as little information as we can because that is what causes our users to wait. This means that bringing over the same address again and again is a waste of our users’ time. Secondly, when we need to change the address, we need to change it in multiple places. This means that it will be more effort, and introduces the possibility of those addresses becoming desynchronized, when they should always be the same. If we take the address out and put it into another table, then we get a small table with each unique address only stored once. The price that we pay lies in complexity. Now, in order to retrieve our customer address list, we need to pull information from multiple tables. We also get a load of complexities in that we must have a record in the customer table before we can have an address in the address table, which makes it more complicated to add, edit and delete companies and addresses.
Working with fully or partially normalized databases will make things a lot more complicated. Let’s be really up front about that. But the reality is that it is the proper way to maintain your data. There are certainly arguments to be made for simpler, wide tables, but I will leave that decision in your hands to be argued on a case-by-case basis.
Dealing with Normality
Once you get into the world of normalized databases, you realize that many tasks require multiple steps. For example, in order to add a new customer, you would have to add a record to the customer table, then another record to the contacts table, then one to the address table, then a record the phone table. And deleting requires just as many steps, but in reverse.
This brings up two issues. The first issue is that your writes are typically dependent upon one another. You can’t add an address until there’s a company to which you can link that address. And in order to write to the address table, you need to know the correct link, i.e. key, for that company. And worse, if the write to the company table fails, then there really is no point in going on to write to any other tables.
So, when it comes to working with normalized databases, it turns out that the best practice lies in DriveWorks allowing SQL to do as much of the data handling as possible. We do this by having DriveWorks provide the information to SQL in one command, and allowing SQL to perform all of the steps that it needs to do to read, write, edit and delete records from whichever tables it requires.
This is all handled through the SQL stored procedure. Stored procedures are, in all senses of the word, programming. As a result, if your company has a SQL resource, the best scenario is to let that person handle writing of the stored procedures. There are plenty of online sources to help you write stored procedures, but they can be a bit tricky to troubleshoot.
Apart from, “let someone else write them”, the best practice that we’ve developed in our use of stored procedures is to make sure that your stored procedures return something. With the DriveWorks DBExecute command, you can have DriveWorks run a stored procedure, but all DBExecute returns is the number of records affected. Running DBQuery, instead, will allow your stored procedure to return a table of data.
We craft our stored procedures to return a table of data that we use to determine the success of the operation. SQL stored procedures support transactions and Try…Catch blocks, so in the unfortunate event of a problem, our stored procedures will ensure that the database is not partially updated, and information is returned about the error. While in a fortunate event, the changes are all applied at once, then the stored procedure returns a happy “error” code and the result of the operation. This result could be a new key created in one of the tables or the number of records deleted from another. Since DBQuery returns a DriveWorks array, we can use all of our standard DriveWorks table functions to work with the return data. In the interest of full disclosure, we actually do not use DBQuery. Instead, we have written a specification task for executing a stored procedure that returns a table of information. But regardless of how you get information back from SQL, you will have the ability to react to the success or failure of the SQL interaction before moving on.
Normalized databases are a powerful way to manage large and complex data. The important aspect of working with normalized databases is to let SQL do as much of the work as possible. This will reduce the number of calls that DriveWorks has to make to SQL, boosting performance. Plus, it reduces the number of areas where a problem can occur. SQL has a host of other tools, that I encourage you to explore, for automating or performing these steps, like triggers and functions and views (oh my!). All of these allow you to reduce the number of calls and the information that you have to pass from DriveWorks, boosting the performance of your DriveWorks implementation.
If you have any questions about DriveWorks, send us a note and we will connect you with our DriveWorks team!