If you are one of the many who have not yet seriously embraced journaling but who now have a need to do so, it can be terrifying to take the first step. Why is that? Perhaps it stems from the stories that abound about those who have jumped into the deep water without adequate planning. For years, the only advice to the timid was simply to turn journal protection on, monitor how rapidly your journal receivers filled and your disk space was consumed, and then revise your strategy accordingly. What was missing was a way to dip your toes in the water and get a sense of the journal behavior, consumed disk space, and overhead before you jumped. Such advice is missing no longer.
A new workload estimating tool gives you a sense of the quantity of additional disk writes, disk traffic, and even Remote Journal communication line traffic before you actually enable journal protection for your files. Armed with this insight, you're in a better position to plan for the anticipated journal overhead.
The Pseudo Journal tool is not a capacity planning tool; it does not make specific hardware recommendations. Rather, it is merely intended to help you gain insight into the increased workload that will ensue when you enable journaling. Using the estimated workload increase provided by this tool, you can feed the estimated increase of disk writes into a planning tool. You can then take the next step and verify your plans with a formal benchmark. The IBM System i Benchmark Center can help you with sizing, capacity, and performance planning (ibm.com/servers/eserver/iseries/benchmark/cbc/index.html).
Let's take a closer look at the fear that grips many new journal users. At its heart, journaling is a means by which i5/OS monitors the actions that your applications take against a set of database files and then beats your application to the punch. That is, the journal grabs a copy of the row image you are about to modify, tucks away an instance of that image, and sends the image to a journal receiver. To ensure that this copy is a reliable starting point for recovery, the row image (constituted as a journal entry) is written to disk before the matching database change takes place. This is traditional write-ahead logging and is the bread-and-butter of so-called "local" journaling. In short, the journal receiver houses a copy of the row image before the database does!
What is it that makes this behavior the source of performance fear and disk-consumption concern? The answer is simple: The journal receiver resides on a disk, which is a mechanical device. Its platter rotates, and any time spent waiting for disk rotation is idle time that slows your application. As a consequence, journaling introduces additional idle time and consumes extra space. It is that simple.
There are, however, steps you can take to help curb idle time. By reducing the quantity of additional disk writes, you reduce the amount of idle time. The first, and often most important, natural planning question to answer is how many additional disk writes will journaling introduce to your application. The second natural question to answer, especially for a batch job, is how much additional elapsed time will these extra disk writes cost.
Because the primary source of journal-related disk writes often comes from reacting to database operations, such as adding new rows to a table, updating existing rows residing within a table, or deleting rows, you can get a fairly accurate estimate of both the quantity of additional disk writes and the related quantity of disk traffic by counting the number of database adds, updates, and deletes an application performs. IBM has built a tool that does just that. The tool is called the Pseudo Journal tool, and it comes with a pair of matching CL commands one to enable the tool (Start Pseudo Journal STRPSJRN) and one to interpret and display the results (Display Pseudo Journal Data DSPPSJDTA). If you prompt the STRPSJRN command, you see a screen that looks similar to Figure 1.
The tool is the result of work performed by a very special set of college students who spent 11 weeks in Rochester, Minnesota, working at the IBM development laboratory in the summer of 2007. They helped refine this tool and turn it from a concept into a working set of commands.
To formulate estimates about the quantity of additional disk writes and the related quantity of disk traffic, you have to tell the tool which files you want it to monitor. Thereafter, it simply wakes up periodically and takes a peek at odometers maintained deep within your files by the i5/OS microcode layer. That is, every physical file and/or SQL table has an underlying set of odometers one for adds, one for updates, and one for deletes. Each time a new row is added, deleted, or modified, the matching odometer value climbs. By comparing the current reading with the former reading, you can easily calculate the delta.
By default, once journal protection is enabled, every new add, update, or delete of a row deposits a matching journal entry, and every journal entry hightails it out to disk lickety-split. Knowledge of this behavior makes it easy to estimate the projected quantity of extra disk writes.
Similarly, armed with information regarding the row width for each physical file being monitored, and with some good hunches regarding how much extra descriptive metadata each journal entry will drag along, you can perform some simple arithmetic and predict a similar quantity of likely disk traffic (in bytes). Combine that with a gauge of the elapsed time between odometer readings, and you can predict an average quantity of bytes per unit of time. Add some insight regarding how remote journaling works and how many bytes of descriptive data it appends per set of journal entries, and you can similarly derive a good estimate of the ensuing remote journal communication traffic. Make a few assumptions about rotational delay for disks and/or response time for I/O Adaptor (IOA) write cache, and you are on your way to estimating what the journal overhead and traffic volume will be for your application.
The tool takes these factors into account and makes these estimations for you. As you can see, the tool really is not magic, but rather a matter of physics and mathematics. Figure 2 shows a high-level flow diagram of how the tool operates.
The tool consists of two commands. You can find the two CL commands (one to identify the files of interest and the other to summarize the resulting observations), the matching software, as well as an installation and a usage tutorial at ibm.com/systemi/db2/journalperfutilities.html (scroll to the section called "Journal Planning and Sizing Tool Pseudo Journal"). The installation guide contains two tutorials, and you should take a peek at both. The installation tutorial takes you through the process of downloading the save file and installing the matching commands. Similarly, the usage tutorial walks you through an example of how to display the results.
With the tool in place, your next step is to identify the physical files you are contemplating for journal protection. In many cases, these are the files that your application modifies. Armed with an application that you suspect is the most aggressive modifier of such files, tell the Pseudo Journal tool the identities of the files it should monitor. Then run the application as the Pseudo Journal tool looks over your shoulder.
How many files should you monitor? Asking the tool to monitor a dozen or so of your most frequently modified files might be sufficient to help you get a handle on what the journal overhead and behavior will likely be. On the other hand, there might be applications for which you want the tool to monitor every one of your physical files residing within a designated library. It can do that! One caution: The more files you ask the tool to monitor, the more times it must reach down and extract odometer readings from every file in the list. That is, the tool awakes on a periodic basis. When it does, it performs a thorough inventory of all the files you asked it to monitor it skips none!
That's why in addition to telling the tool which files to watch, you can also instruct the tool how often to wake up and take a peek. Trying to extract odometer readings from 10,000 files takes more elapsed time than doing so for only 10 files. Therefore, if you have a lot of files, you should lengthen the sampling interval so that the tool has a fighting chance of completing its work from the first engagement before it's asked to take a second set of readings.
As a rough rule of thumb, for every additional 50 files you ask the tool to monitor, give it a full second to sample all the odometers. Hence, wake it up no more often than every five seconds if you are monitoring 250 files.
The next step is easy; you simply kick off your application and run it as usual. The Pseudo Journal tool monitors the database activity (i.e., rows added, rows updated, rows deleted) for the time periods you designate.
Assume that a 10-minute examination of database activity is representative of normal traffic. You then kick off the data collection phase and advise the Pseudo Journal tool how often to pop its head up, how many samples to take, and how to collect statistics. For example, if you want to plot disk journal traffic in five-second increments across a 10-minute range, you want 12 samples per minute for 10 minutes a total of 120 data points (Figure 3). It's just that simple to take an interactive measurement.
On the other hand, if you want to get a sense of the potential journal traffic generated by an overnight batch job, but you don't want to stick around until 2 a.m. to start collecting data, you can submit the tool to batch and let it work while you snooze (Figure 4). Overnight data collection starts at the designated time, and when you return the next morning, you have to do only the summarize/display step.
Under the covers, whether the Pseudo Journal tool is enabled or not, the operating system constantly maintains some statistics about the arrival rate of new rows, updated rows, and deleted rows. The Pseudo Journal tool is merely waking up at the intervals you designate and making a copy of these constantly growing counters. Each physical file keeps a separate set of counters. Therefore, the tool can break out the results on a per-file basis or summarize the results on an application basis.
For example, if at the time you kick off the Pseudo Journal tool, the counters for PF #1 are 12 adds, 14 updates, and four deletes seen thus far, and assuming that your application tends to add three new rows, update five rows, and delete one row every second, the first trio of measurements recorded is <12, 14, 4>. Five seconds later, the new statistics harvested are <12+(5*3)=27, 14+(5*5)=39, 4+(5*1)=9>. And 10 seconds into the run, you probably harvest values such as <42, 64, 14>.
Armed with a series of similar sampled settings of these internal counters, the Pseudo Journal tool can estimate the matching quantity of disk writes for a write-ahead logging behavior. In this example, after a 10-second run, the resulting estimate reveals that the traditional journal behavior needed to schedule 90 disk writes ([42 + 64 +14] - [12 + 14 + 4]= 90). Hence, for the application that you asked the tool to monitor, you can expect the system to average nine (90/10 = 9) additional disk writes per second with journaling enabled.
Can you handle this additional disk traffic? How much will it slow your application? If you have sufficient write cache in your disk controllers such that the IOA write cache can comfortably absorb this extra disk traffic, you can probably expect disk write response times (think of this as idle time for your application) of 90 * 0.3 microseconds during this 10-second period.
The Pseudo Journal tool performs these calculations under the covers for you. It also attempts to estimate the extra CPU load that occurs because of the presence of journal activity. For example, assume that the model of machine on which you're executing the tool concludes that the extra path length associated with depositing a journal entry requires 12 microseconds of CPU consumption. The tool then multiplies your 90 additional disk writes (one per journal entry produced) times 12 microseconds. The tool then compares that total against the total duration of the run to estimate the extra percentage of load placed on your CPU.
In addition, if each row image were 100 bytes wide, and if your journal settings elected to collect matching descriptive data for each journal entry (e.g., who made the change, when did it occur, what program was used), each matching journal entry would probably consume 170 bytes on disk (100 for your row image, 70 for the metadata). During this 10-second run, your application would tend to populate 15,300 bytes (170 * 90) within your journal receiver. Again, the Pseudo Journal tool handles all these calculations for you. Now that you have a feel for anticipated journal behavior during the designated 10 seconds (and if you believe those 10 seconds were representative), you can extrapolate and estimate the total disk space for a longer run.
Although short measurement periods, such as the one outlined in this example, are nonintrusive and can give you a starting point for estimating journal overhead, real applications often have peaks and valleys. An application may add new rows for the first 10 minutes, turn its attention to updating rows for the next 15 minutes, and finally conclude by deleting some rows. The application may have periods when it primarily reads existing rows and updates very few. As a consequence, a mere 10-second sample may not suffice. In such an instance, whereas calculating total disk space consumption for the journal receiver may need only a starting and ending sample of the add, update, and delete counters, the effort to properly size disk arms and IOA write cache to be able to absorb peaks and valleys may justify collecting a large quantity of samples and then graphing the results. That way, you aren't sizing merely for averages but for peak times as well.
The Pseudo Journal tool has a graphing option to let you do just that (Figure 5). By "sizing" your machine for the peak periods, you can rest comfortably knowing that you are ready to absorb the ensuing highs and the lows.
Unless you are already a savvy journal user, your first attempt to use the Pseudo Journal tool will likely leave you troubled and frightened. How can you possibly absorb all the extra disk writes the tool tells you to plan for? The short answer is that it's tough unless you know about the secret weapon.
Yes, there are indeed many instances in which the journal is tempted to hurry off to disk with a single journal entry tucked under its wing, but it doesn't have to be that way! There is another option, and a more efficient approach, known as Journal Caching, and this option is enabled by installing i5/OS Option 42. Savvy journal users know it well, and this option has saved their bacon on many occasions. i5/OS Option 42 is the chargeable option for enabling the secret weapon: Journal Caching.
For experienced journal users, a well-known performance tuning practice is to customize the journal environment to make each trip to disk highly productive. Journal Caching lets you achieve that objective.
There are two primary considerations during the planning stage:
Both principles are essential for optimal journal performance, and you need to consider both during the planning stage. But how can the Pseudo Journal tool help you with this analysis?
The Pseudo Journal tool estimates the rate of arrival of new bytes headed to disk (which may help you size your IOA write cache), but it also performs a "what if" calculation to compare behavior (in terms of disk writes) with and without the journal caching feature installed. By examining this portion of the tool output, you are better equipped to decide whether the extra cost of i5/OS Option 42 is justified for your shop. For many applications, the answer is often a resounding yes!
You can use the raw rate of arrival of journal image bytes (as revealed by the Pseudo Journal tool) during the sizing exercise to help select sufficient IOA write cache. Each disk arm serviced by the IOA can cache up to 100 MB, and 128 KB can be written from the cache to the disk surface with each disk rotation. It's primarily a matter of physics to calculate, based upon the rotational speed of the disk, how many disk arms and how much IOA write cache to configure.
It can be harder to wrap your head around the matter of employing main memory as a staging area, and that is why the Pseudo Journal tool returns the following two values:
The magnitude of difference between these two styles of behavior (with and without journal caching enabled) is staggering for some environments. For example, Figure 6 shows an instance of output from the Pseudo Journal tool for an application that helps a movie rental store track its inventory. Notice the estimated benefit for this application when journal caching is enabled.
There might be times when the challenge is to "prune the roses." If you discover that far too much anticipated journal traffic is destined for journal A, you might be worried that the applications you are monitoring will soon feed the single journal far more rapidly than it can absorb the many disk writes. What should you do?
Turn to the Pseudo Journal tool, of course! Because it tracks the projected data arrival rate on a per-physical-file basis, you can use its collected data to identify which patch of roses is the most productive to isolate. Essentially, you are trying to reroute some of that traffic to journal B.
This approach makes sense when you conclude that a single journal simply cannot absorb new disk writes and matching journal traffic as rapidly as your application can produce them. When you reach this conclusion, it makes sense to employ two journals rather than one. But which physical files should you isolate from the rest? Though many factors may ultimately influence your decision (e.g., which files are tightly coupled, which files are pummeled at different times of the day), it can be helpful to narrow your choices from many files to just the one or two that generate the majority of the journal traffic. But which ones are they? Here, the Pseudo Journal tool can come to your aid as well. The secret is to tell the Pseudo Journal tool how you want the results rolled up.
The Pseudo Journal tool can summarize its findings in many ways. The *SUMMARY option on the DSPPSJDTA command rolls all collected data into one set of findings, oblivious to which file the journal traffic came from. This option might be the simplified view that you want to start with when you are initially sizing your total journal environment. But if you are engaging in a more sophisticated planning exercise, the *LIST option lets you break out the estimates of journal traffic on a per-file basis (i.e., it produces a list of files that shows the contributions of each separately). With this option, you can easily compare the estimates for how many bytes and how many disk writes file #1 contributes with those for file #2. The *LIST option makes the most sense when you're searching for a bully a file that hogs the journal. Figure 7 shows the files sorted in descending order, beginning with the most aggressive contributor.
If your planning objective is to establish a logical-replication-flavored high availability (HA) relationship between a source/production and target machine, knowing about disk writes on the source side is only your first step. Your whole motivation for turning on journal protection may actually be driven by a desire for a target replica that you can switch to if the production copy of your critical files becomes inaccessible.
In such an instance, what you have seen from the Pseudo Journal tool is only the first step. What you really want to see is an estimate of the corresponding remote journal traffic (i.e., the bytes that flow from the source side to the target side in a logical-replication-driven HA environment). The good news is that the Pseudo Journal tool anticipated your needs and can provide an estimate of both the average bytes per unit of time and the anticipated peak flow across this remote journal connection (Figure 8).
The Pseudo Journal tool represents a new tool for your toolbox. Equipped with this tool, you can approach journal-sizing engagements with confidence. No more back-of-the-envelope calculations for you.
Larry would like to acknowledge the assistance provided by the summer students who served as tool builders: Edgar Castro, Rogelio Ceja, Jorge del Rio, Tyler Kaye, and Aldo Ulises Vega.
After more than 30 years of experience leading the design efforts for System i journal support at IBM, Larry Youngren recently retired from IBM and now lectures and consults on high availability issues. You can e-mail Larry at journal_guru@yahoo.com.