An Introduction to DB2 for OS/390 Version 7 >>
 
By Craig S. Mullins

Well, IBM is hard at work on the next version of mainframe DB2 — namely DB2 for z/OS Version 8. Although V8 has been out for several months now on Linux, Unix, and Windows platforms, it was just recently announced in January 2003 for mainframe platforms. There is a reason for this — DB2 for z/OS Version 8 is a huge and complex beast.

This article will provide you with an overview of the major features and functions of DB2 for z/OS V8. Keep in mind, though, that this short article is merely an overview of some of the great features you can expect in DB2 V8. IBM has made available a Version 8 Technical Preview document that is almost 300 pages long. With that in mind, let’s take a look at some of the major components of this new version.

Architecture

One of the biggest impacts of V8 will be the requirement to be running a zSeries machine and z/OS v1.3 (or greater). DB2 V8 will not support old hardware nor will it support OS/390. Additionally, DB2 customers must migrate to V7 before converting to V8. There will be no IBM-supported capability to jump from V6 (or an older version) directly to V8 without first migrating to V7.

Owing to these architectural requirements, DB2 will have the ability to support large virtual memory. This next version of DB2 will be able to surmount the limitation of 2GB real storage that was imposed due to S/390’s 31-bit addressing. Theoretically, with 64-bit addressing DB2 could have up to 16 exabytes of virtual storage addressability to be used by a single DB2 address space. Now there is some room for growth!

Broader usage of Unicode is another architectural highlight of DB2 V8. V7 delivered support for Unicode-encoded data, but V8 forces its use. If you do not use Unicode today, you will when you move to V8. This is so because the DB2 system catalog will be implemented using Unicode. In fact, the DB2 catalog has some dramatic changes including some table spaces with larger page sizes and long names.

Actually, support of long DB2 object names is another significant architectural change in V8. DB2 V8 significantly increases the maximum length of most DB2 object names. For example, instead of being limited to 18 byte table names, you will be able to use up to 128 bytes to name your DB2 tables; the same limit applies to most DB2 objects and special registers including views, aliases, indexes, collections, schemas, triggers, and distinct types. The limit for columns is 30 bytes, a table space is still 8 bytes, and packages are still 8 bytes, unless it is a trigger package, which can be 128 bytes. This brings a lot of flexibility, but also a lot of reworking of the DB2 catalog tables.

One such reworking requires the use of table spaces with 8K, 16K, and 32K page sizes. Therefore, the system catalog in DB2 V8 will require use of the BP8K0, BP16K0, and BP32K buffer pools.

Administration

As with each new version, DB2 V8 offers new functionality that helps DBAs administer and manage their databases and subsystems. This release contains many enhancements to the DB2 objects that DBAs must manage including sequence objects, variable length index keys, expanded partitions, new types of partitioned indexes, new partition management, and materialized query tables (also known as automated summary tables). Also, index keys can comprise up to 2000 bytes — so more data can be indexed using a single index. Each of these features delivers more functionality but also presents implementation and maintenance challenges.

Another useful administration feature of DB2 V8 is known as Online Schema Evolution. Today, there are many types of DB2 changes that require the DBA to DROP and then re-CREATE the object in order to implement the change. Schema evolution enables the DBA to make more types of changes to database objects using native DB2 features. For example, DBAs will be able to add and rotate partitions of partitioned table spaces and to expand the length of numeric and character columns using the ALTER statement. Basically, schema evolution provides more support for a variety of changes to be made directly using ALTER statements.

Other administration highlights include support for up to 4096 partitions per partitioned table space, row-level security, session variables (for global security), and Data Partitioned Secondary Indexes (usually shortened to DPSI and pronounced “dipsy”). DPSIs are significant because they are geared to resolve one of the biggest management headaches encountered by DB2 DBAs — dealing with non-partitioned indexes (NPIs) on tables in a partitioned table space. A DPSI is basically a partitioned NPI.

So, with a DPSI the index will be partitioned based on the data rows. The number of parts in the index will be equal to number of parts in the table space — even though the DPSI is created based on columns different from those used to define the partitioning scheme for the table space. Therefore, partition 1 of the DPSI will be for the same rows as partition 1 of the table space and partition 1 of the partitioning index, and so on.

These changes to DB2 V8 provide many benefits including:

      • The ability to cluster by a secondary index
      • The ability to drop and rotate partitions easily
      • Potentially less overhead in data sharing

Non-partitioning indexes (NPIs) historically have caused DB2 performance and availability problems, especially with utilities. But DPSIs help to solve many of these problems. With DPSIs there is an independent index tree structure for every partition. So utilities do not have to share pages or index structures. And logical drains on indexes are now physical at the partition level. This helps utility processing in several quite useful ways. For example, you can run a LOAD by partition with no contention because the DPSI is partitioned the same way as the data and the partitioning index. And when reorganizing with DPSIs there BUILD2 phase is not needed. Even your recovery procedures may be aided because you can copy and recover a single partition of a DPSI.

But DPSIs are not magical objects that solve all problems. Indeed, changing an NPI to a DPSI may cause some queries to perform worse than before. Because the DPSI partitions the data by the partitioning key, some queries will need to examine multiple partitions of the DPSI as opposed to the single NPI it previously used. On the other hand, if the query has predicates that reference columns in a single partition only then performance will likely improve because only that one DPSI partition needs to be probed.

Of course, not every index on a partitioned table should be a DPSI. So, as we will need to begin analyzing our data access and utility processing requirements to determine when to use NPIs versus when to use DPSIs. A single partitioned table can have both DPSIs and NPIs defined on it. You will get better utility processing with DPSIs. But you will have to examine your queries to determine predicate usage and the potential performance impact of DPSIs before using them.

The new security features are interesting, too! With row-level security, DB2 can support applications that need a more granular security scheme. For example, you might want to set up an authorization scenario such that employees can see their own data but no one else’s. To complicate matters somewhat, you might also want each employee’s immediate manager to be able to see his payroll information as well as all of his employee’s data, and so on up through the org chart. Setting up such a security scheme is next to impossible with current DB2 versions, but it is straightforward using row level security in DB2 V8.

Finally, Version 8 delivers clustering and partitioning independence. This means that you can cluster data in a different order and based on different columns than the data is partitioned on. In other words, the clustering index key can be different than the partitioning index key. Actually, you will not need a partitioning index because a partitioned table space can be partitioned without specifying an index to accomplish the partitioning.

Programming and Development

Numerous SQL and programming features are being added to DB2 V8 that make the job of programming both easier, but at the same time, more complex. This may sound like a paradox, but it is true. Great new features will make programming simpler once they are learned, but it will take time and effort to train the legions of DB2 developers on this new functionality, and when and how best to use it.

For example, some of the V8 SQL improvements, will include the ability to get diagnostic information, true SEQUENCEs, dynamic scrollable cursors, scalar fullselect, multiple DISTINCT clauses, qualified column names on the SET clause of INSERT and UPDATE statements, the ability to mix EBCDIC, ASCII, and Unicode columns in a single SQL statement, and the ability to SELECT from an INSERT statement.

In addition, V8 will offer significant changes to the SQL system limits. Firstly, as we have already mentioned, DB2 will now offer long name support for database objects. But it does not stop there. DB2 V8 expands the maximum length of SQL statements to support up to 2 megabytes. This is a major change that permits much more complex SQL statements to be written, optimized, and run within DB2. Additionally, V8 increases the length of literals and predicates to 32K and will support joining up to 255 tables in a single SQL statement.

Actually, that last one has caused some confusion, so let’s clarify things. Prior to DB2 V6, the limit for tables in a SQL statement was 15. DB2 V6 actually increased the limit for tables in an SQL statement from 15 to 225, but in a restricted manner: each SQL statement could consist of up to 15 query blocks each directly or indirectly identifying 15 base table references. The grand total number of tables for a single SQL statement was 225, however no query block can exceed 15 base table references whether direct or indirect. DB2 V8 removes this restriction and each SQL statement can reference up to 225 tables in total. The limit has been raised to such a high number to accommodate ERP vendors such as Peoplesoft and SAP, whose applications were designed originally for other RDBMS packages, such as Oracle, that have higher limits than DB2. Just because the limit has been increased does not mean you should write queries that access such a large number of tables. The performance of such queries will likely be poor and difficult to manage.

The ability to SELECT from an INSERT statement is an intriguing new feature. To understand why we first need to present some background data. In some cases, it is possible today to perform actions on an inserted row before it gets saved to disk. For example, a BEFORE TRIGGER might change data before it is even recorded to disk. But the application program will not have any knowledge of this change that is made in the trigger. Identity columns and user-defined defaults have similar effects. What if the program needs to know the final column values? Today, this is difficult and inefficient to implement. The SELECT FROM INSERT syntax in DB2 V8 solves this problem. It allows you to both insert the row and retrieve the values of the columns with a single SQL statement. It performs very well because it performs both the INSERT and the SELECT as a single operation. Consider the following example:

     SELECT COL5
     FROM
     INSERT (COL1, COL2, COL5, COL7) INTO SAMPLE_TABLE
     VALUES('JONES', 'CHARLES', CURRENT DATE, 'HOURLY');

The data is inserted as specified in the VALUES clause, and retrieved as specified in the SELECT. Without the ability to select COL5, the program would have no knowledge of the value supplied to COL5, because it was assigned using CURRENT DATE. With this new syntax the program can retrieve the CURRENT DATE value that was just inserted into COL5 without adding overhead.

One of the most intriguing new features of DB2 for z/OS V8 is the ability to code recursive SQL. A recursive query is one that refers to itself. The best way to quickly grasp the concept of recursion is to think about a mirror that is reflected into another mirror and when you look into it you get never-ending reflections of yourself. This is recursion in action.

Recursion is implemented in DB2 using common table expressions (CTEs), which also are new to DB2 V8. A CTE can be thought of as a named temporary table within a SQL statement that is retained for the duration of that statement. There can be many CTEs in a single SQL statement but each must have a unique name. A CTE is defined at the beginning of a query using the WITH clause.

Let’s look at an example to clarify this topic. Consider the following data that represents a management hierarchy:

MGR_ID EMP_ID EMP_NAME
-1 1 DAN
1 2 BILL
1 3 CRAIG
1 4 NEIL
2 5 BECKY
3 6 TERRY
6 7 CHRIS
6 8 LARRY

The MGR_ID for the top-most node is set to some value indicating that there is no parent for this row, in this case –1 is used. Now that we have loaded the data we can code a query to walk the hierarchy using recursive SQL. Suppose we need to report on the entire organizational structure under CRAIG. The following recursive SQL using a CTE will do the trick:

WITH EXPL (MGR_ID, EMP_ID, EMP_NAME) AS

(

SELECT ROOT.MGR_ID, ROOT.EMP_ID, ROOT.EMP_NAME

FROM   ORG_CHART   ROOT

WHERE  ROOT.MGR_ID = 3
 

UNION ALL
 

SELECT CHILD.MGR_ID, CHILD.EMP_ID, CHILD.EMP_NAME
 FROM EXPL PARENT, ORG_CHART CHILD
 WHERE PARENT.EMP_ID = CHILD.MGR_ID
 )
 

SELECT DISTINCT MGR_ID, EMP_ID, EMP_NAME
 FROM EXPL
 ORDER BY MGR_ID, EMP_ID;

The results of running this query would be:

MGR_ID EMP_ID EMP_NAME
1 3 CRAIG
3 6 TERRY
6 7 CHRIS
6 8 LARRY

Let’s break this somewhat complex query down into its constituent pieces to help understand what is going on. First of all, a recursive query is implemented using the WITH clause (using a CTE). The CTE is named EXPL. The first SELECT primes the pump to initialize the “root” of the search. In our case, to start with EMP_ID 3, that is LIL BOSS.

The next SELECT is an inner join combining the CTE with the table upon which the CTE is based. This is where the recursion comes in. A portion of the CTE definition refers to itself. Finally, we SELECT from the CTE. Similar queries can be written to completely explode the hierarchy to retrieve all the descendants of any given node.

Recursive SQL can be very elegant and efficient. However, because of the difficulty developers can have understanding recursion, it is sometimes thought of as “too inefficient to use frequently.” But, if you have a business need to walk or explode hierarchies in DB2, recursive SQL will likely be your most efficient option. What else are you going to do? You can create pre-exploded tables, but this requires denormalization and a lot of pre-processing which will not be efficient. Or you might write your own code to walk a hierarchy. This, too, is fraught with potential problems. You will probably retrieve more data than you need, causing inefficient I/O. And how will you assure that your code is more efficient than DB2?

If every row processed by the query is required in the answer set (“find all employees who work for LIL BOSS”), then recursion will most likely be quite efficient. If only a few of the rows processed by the query are actually needed (“find all flights from Houston to Pittsburgh, but show only the three fastest”) then a recursive query can be quite costly. The bottom line is that you should consider recursive SQL when business requirements call for it. But be sure that suitable indexes are available and always examine your access paths.

Also, as noted in the initial architecture section, 64-bit virtual addressing will greatly increase the amount of memory available to DB2. And IBM is making major enhancements to the internal SQL control block structures, so that DB2 will use memory more efficiently. So more memory, used more efficiently, should translate into the more efficient execution of DB2 SQL.

And, as with every previous new DB2 version, IBM is making significant enhancements to improve application performance. DB2 V8 optimization enhancements are scheduled to include sophisticated query rewrite capabilities to support materialized query tables, sparse indexing to improve star join performance, support for parallel sort, and better support for queries with data type and length mismatches which would have caused less efficient access paths in previous releases.

For Java programmers DB2 V8 offers expanded functionality in the form of support for both Type 2 and Type 4 Java drivers. Both will be updated to support the JDBC/SQLJ 3.0 standard which brings enhanced support for things like SAVEPOINTs and WITH HOLD cursors, as well as improvements to connection pooling, and a long list of other expanded features.

And finally, in DB2 V8 more XML support is being pushed into the DB2 engine. This should include support for some built-in XML publishing functions such as XMLELEMENT and XML2CLOB (among others).

Synopsis

DB2 V8 is the most significant new version of DB2 ever in terms of new functionality and the amount of new engine code that IBM is writing. This should result in a bevy of improvements that will make our databases and applications more reliable, more available, and more efficient. And as users of DB2 that should excite us all! Stay tuned to z/Journal for more in-depth coverage of DB2 V8 over the course of this year.

--

Craig S. Mullins is a data management strategist and consultant with Mullins & Associates. He has 20 years of experience dealing with data and database technologies. He is the author of the books Database Administration: The Complete Guide to Practices and Procedures and the best-selling DB2 book, DB2 Developer’s Guide (currently in its fifth edition). Craig can be reached via his Web site at www.craigsmullins.com or at craig@craigsmullins.com.


Contributors : Craig S. Mullins
Last modified 2005-04-25 11:21 AM
Home