Write Your Own Custom CSV Export Utility--Without APIs

Article ID: 57060

Every day, I get questions about CPYTOIMPF. For many situations it seems, CPYTOIMPF is almost, but not quite, the right tool for the job. I keep wanting to tell people, "If it doesn't work the way you want it to work, write your own tool that works your way." But of course I can't do that, at least not until I'm ready to teach them how to write their own!

This article attempts to do exactly that--show you a quick and dirty way to write your own tool for generating CSV files. I write the entire tool as a QShell script, which is an easy way to do it and obviates the need to delve into complex APIs. You can, of course, take my code and modify it to your heart's content and thereby create your own program for generating CSV (or similar) files exactly the way you want them to work.

The bulk of this article deals with how the QShell script works, so you can understand what it does and how to write your own. If you're not interested in understanding the inner workings, feel free to skip down to the section titled "Trying the Utility" near the end of the article.

The Basic Premise

This tool works on the premise that you can pretty easily write an SQL statement that generates a CSV document. For example, consider a physical file that's named CUSTMAS and has three fields, CUSTNO, NAME, and SALESREP. To generate a CSV from that file, you could run the following SQL statement:

Select Trim(Char(CustNo)
       ||',"'||trim(Name)||'","'||trim(SalesRep)||'"'
  from CUSTMAS

In SQL, the || operator means to concatenate data together. This relatively simple SQL statement outputs only one field from SQL's perspective, but that field is created by concatenating the customer number (after converting it to character and trimming any extra spaces) followed by a comma, followed by the customer's name in quotes, followed by another comma, followed by the sales rep's name in quotes.

The result of running that statement might be something like this:

400,"ACME INC","ROBERTS, BOB"
455,"CANDLE MARKETS","KLEMENT, SCOTT"
98114,"FREDONIA LIQUIDS","FRYE, EGGBERT"
... etc ...

To build your own tool that can generate and output a CSV file by using this SQL support therefore involves the following steps:

  1. Get a file name as a parameter.

  2. Get a list of the field names in that file.

  3. Generate an SQL statement from the list of fields. For character fields, add quotes around the field. For all fields, insert commas between the fields.

  4. Run the SQL statement so that the output goes to an IFS file.

QShell and the DB2 Utility

I chose QShell for this tool for several reasons:

  • It has a script language, which is typically used for "job control" purposes, similar to the way we use CL programs in traditional programming on the System i or the way you'd use a .bat file in MS-DOS.

  • Because it's more of a job control language, it's more easily understood by system managers or operations folks. If I wrote the same thing in RPG or C, you'd have to be a programmer to understand it.

  • The language syntax is well-suited to string manipulation.

  • It can easily run dynamic SQL statements with Qshell's db2 utility.

  • Working with IFS data from QShell is easy, and that task is very difficult from native CL.

My Qshell script starts like this:

#!/usr/bin/qsh

LIB=$1
FILE=$2
FIELDLIST=

QShell treats any line that starts with a # symbol as a comment. Anything, that is, except for the characters #! when they are the first two characters in the script. So what does #! do? Well, you see, Unix doesn't have well-defined object types like the System i does. All disk objects are stored in "files." They do not have a *PGM object type--so how do they know whether something is program code to be executed or a script to be executed? The answer is in the first two bytes of each disk object. The Unix environment reads the first two bytes and looks for a "magic number." Different types of programs start with different magic numbers. The magic number for interpreted shell scripts happens to be the same as the hex values that make up the characters #! -- so all shell scripts should start with the #! characters.

Following the #! is the path name (in IFS format) to the script interpreter. You can have different programs that interpret different scripts. The System i supports QShell scripts, of course. The PASE environment also comes with three different scripting languages. Perl and PHP are also script languages that can be run on the System i. So the path name that follows the #! at the top of the script tells the system which script interpreter to run so that it will be the right interpreter for the language you wrote your script in. In this case, /usr/bin/qsh signifies QShell.

If you're familiar with OCL, you might think of the $1, $2, $3, and so forth variables as analogous to the ?1?, ?2?, ?3? variables. When you code ?1? in OCL, it inserts the value of the first parameter passed to the OCL procedure. The same is true of $1 in QShell.

Anytime you see a dollar sign ($) character in a Unix shell script, it means "insert a variable here." The $1 and $2 variables are special variables that insert the first parameter from the command line and the second parameter from the command line, respectively. Consequently, the following code sets the LIB variable to the value of the first parameter, and the FILE variable to the value of the second parameter.

LIB=$1
FILE=$2

Notice that I didn't precede FILE or LIB with a dollar sign. That's because I didn't want to retrieve the current value of the LIB or FILE variable. I didn't want to write the output of those variables to that point in the script. Instead, I wanted to retrieve the values of the first and second parameters and store those values into the FILE and LIB variables.

Next, I do this:

FIELDLIST=

This creates the FIELDLIST variable, if it doesn't already exist, and it sets it to nothing (an empty variable).

When I'm writing code, I like to write a little bit of it and then test it to make sure I got that part right. That makes it easier for me to determine bugs as I go along, because I know which parts of my program work properly, and know which parts I've changed (and therefore are most likely to have problems). So, in this case, I'd write my code that loads the parameters as part of a simple script that just prints the parameter values on the screen. To do that, I type EDTF '/home/klemscot/mkcsv.sh'. This creates a file named mkcsv.sh in my home directory. Inside mkcsv.sh, I type the following code:

#!/usr/bin/qsh

LIB=$1
FILE=$2
FIELDLIST=

echo "You've selected $LIB/$FILE"

To try it out, I type STRQSH to start Qshell. I switch to the directory in which I placed the mkcsv.sh file and at the prompt, I type this:

./mkcsv.sh parm1 parm2 parm3

The output from this test should look like this:

You've selected MYLIB/CUSTMAS

Retrieving the List of Fields

You'll recall that I want to build an SQL statement that contains all the fields of a given file. To do that, I need to know what those fields are! Fortunately, the system contains a file called SYSCOLUMNS, which contains all the field names in all the files on the system. It also contains their data types, which I need in order to determine whether quotes should or shouldn't be placed around a given field in the CSV output.

From the Qshell command prompt try the following command:

db2 "select sys_cname, coltype from qsys2.syscolumns where sys_dname='your lib here'
and sys_tname='your file here'

The output of this command should look something like this:

 SYS_CNAME  COLTYPE      
 ---------- --------     
 CUSTNO     NUMERIC
 NAME       CHAR
 SALESREP   CHAR

    3 RECORD(S) SELECTED.

As you can see, this has the field names and their data types. But unfortunately, it also has some unnecessary extra stuff. It has headings above the returned fields, and it has a record count that would only be in my way. To get rid of this extra stuff, I change the SQL statement to insert a ! character at the start of each record returned. Then I use another tool, called grep, to specify that it return only lines that start with that ! character. Finally, I use a third QShell tool, cut, to chop off the extraneous ! character from the start of the record.

The output from the db2 command is merely written to the stdout of the program, so if you run this script now, as-is, it should show the results on the screen.

Here's my QShell script so far:

#!/usr/bin/qsh

LIB=$1
FILE=$2
FIELDLIST=

get_fields() {
   db2 "select '!'||sys_cname||' '||coltype from qsys2.syscolumns \
          where sys_dname='$LIB' and sys_tname='$FILE'" \
    | grep '^!' \
    | cut -b 2-
}

get_fields

The phrase get_fields() { indicates the start of a "function" or "subprocedure" or "subroutine" type of tool. Everything between the { symbol and the next } symbol is considered part of the get_fields routine.

It's also worth noting that lines that end in the \ character are continued on the next line, much like when you end a line of CL code with the + character.

When I use get_fields at the bottom of my script, I'm actually running the subroutine that precedes it. So without the cut, the preceding script should output the following:

!CUSTNO NUMERIC
!NAME CHAR
!SALESREP CHAR

The cut utility cuts characters out of the input strings. In this case, I've specified -b 2-, which tells cut that it should output only byte 2 and higher from each line of text. Effectively, I've just removed byte 1 (the exclamation point) from my SQL output.

Whenever you see multiple QShell commands separated by a single | character, it means that the output of the previous tool will be sent as input to the next tool. This is called a pipe. So, in the preceding example, the output of the db2 tool is piped to the grep tool, which removes any records on which the start of the line isn't a ! symbol. The grep tool's output is then piped to the cut tool, which chops off the starting ! character from each line. Any data left is printed on the screen, which lets me debug my script at this point--I can see what it's outputting. At this point, the output should be the list of fields and data types and nothing else.

Building an SQL Statement

Now I add another subroutine to my QShell script. This subroutine reads the list of fields and types (from the preceding section) and uses them to build the field list used in the SQL statement.

To do that, I use the QShell read tool to read the output of the get_fields routine. I do that in a loop that looks like this:

    while read field type; do
       # add field to list of SQL fields
    done

The while statement is a loop. The "while" calls a program (in this case, the read utility), and if that program exits normally, it executes the code between the do and the done, then calls the program again, and if it still exits normally, it executes the code between the do and the done again. It continues this loop until the read utility does not end normally but instead returns a failure code. With that in mind, you can see how the preceding code will read all the data from this routine's input source, one record at a time.

The words "field" and "type" in the preceding loop are parameters passed to the read utility. This tells the read utility that the first thing it reads from each record should be stored in a variable named field. The second thing it reads should be stored in a variable named type. The idea here is that the output of the get_fields() routine that lists the field and type will be read, one record at a time, into these two variables.

Here's the script so far. If I run it, it shows me the list of fields that I want to use in an SQL statement:

#!/usr/bin/qsh

LIB=$1
FILE=$2
FIELDLIST=

get_fields() {
   db2 "select '!'||sys_cname||' '||coltype from qsys2.syscolumns \
          where sys_dname='$LIB' and sys_tname='$FILE'" \
    | grep '^!' \
    | cut -b 2-
}


build_field_list() {

    FIELDLIST="'!'"

    while read field type; do

       case "$type" in
       CHAR)
         QUOTE=Y
         ;;
       WCHAR)
         QUOTE=Y
         ;;
       VARCHAR)
         QUOTE=Y
         ;;
       *)
         QUOTE=N
         ;;
       esac

       if [ "$QUOTE" = "Y" ]; then
          FIELDLIST="${FIELDLIST}||','||'\"'||trim($field)||'\"'"
       else
          FIELDLIST="${FIELDLIST}||','||trim(char($field))"
       fi

    done
}

get_fields | build_field_list
echo ${FIELDLIST}

As you can see, I've piped the output of the get_fields routine to the build_field_list routine. That way, the list of fields and their types becomes the input of build_field_list, and therefore the while loop will read that list of fields.

Also, you may have noticed that I sometimes have ${FIELDLIST} in my code. When I code ${FIELDLIST} it does the same thing as $FIELDLIST. Remember, the dollar sign always inserts the value of a variable--so $FIELDLIST would insert the value of the FIELDLIST variable. Likewise, ${FIELDLIST} does the same thing; it inserts the value of the FIELDLIST variable.

If they do the same thing, why is there an alternative syntax? Consider the following example:

$day_of_week=mon
echo "$day_of_weekday"

The intention here was to output the string "monday" by inserting the value of the day_of_week variable immediately before the word "day." Unfortunately, it didn't work, because the system is now looking for a variable named day_of_weekday. That's where the braces come in. With the alternate syntax, I can write it like this:

$day_of_week=mon
echo "${day_of_week}day"

In this case, "monday" is successfully output, because the {} characters make it clear where the variable name stops and where the literal text begins. So now it's looking for a variable named day_of_week and is following it with the string "day".

Anyway . . . back to the CSV script. If you run the script that I've shown so far, using my sample CUSTMAS file, you get the following result:

'!'||','||trim(char(CUSTNO))||','||'"'||trim(NAME)||'"'||','||'"'||trim(SALESREP)||'"'

It's not pretty, but it's exactly what I need to insert into my SQL statement to make it generate the CSV formatted output. Now that I have that data loaded into my FIELDLIST variable, all that's needed is to run an SQL statement that uses that field list to format an actual database file into CSV format.

make_csv() {
   RUNSQL="select ${FIELDLIST} from ${LIB}.${FILE}"
   echo $RUNSQL
   db2 "$RUNSQL" \
   | grep "^!" \
   | cut -b 3-
}

This subroutine takes the FIELDLIST variable and uses it in an SQL statement. The SQL output is piped to grep to remove the column headings and record count and then to the cut utility to remove the leading exclamation point, as I explained earlier.

After adding that subroutine, I change the last two lines of the script so that they call this subroutine instead of simply echoing the field list. Those two lines now look like this:

get_fields | build_field_list
make_csv

Trying the Utility

You can download my script, named mkcsv.sh, from the following link:
http://www.pentontech.com/IBMContent/Documents/article/57060_633_MkCsv.zip

Once you've downloaded and unzipped it, you need to upload the mkcsv.sh code to a location in your IFS. For testing purposes, I put it in my home directory, which is /home/klemscot, but in production, I'd install it in the /usr/local/bin directory of my IFS.

Now enter QShell by typing the following command:

STRQSH

Then, add /usr/local/bin to your QShell PATH by typing the following:

export PATH=$PATH:/usr/local/bin

If you use a directory other than /usr/local/bin, make sure you use your directory name in the PATH statement.

Now that you've installed the mkcsv.sh script, you can run it from within QShell like this:

mkcsv.sh my-lib my-file > my-ifs-output-file

For example, if I have a file named CUSTMAS in the ACCTLIB library and I want to output it to a file named /tmp/acct-custmas.csv, I type:

mkcsv.sh ACCTLIB CUSTMAS > /tmp/acct-custmas.csv

If you want to append to the end of an existing IFS file (instead of replacing it), you can replace > in these examples with >>.

If you want to create the CSV file with a particular CCSID, make sure the IFS file doesn't already exist, and then type the following:

touch -C 1252 /tmp/acct-custmas.csv

The touch utility creates an empty file that's already marked with a CCSID. In the preceding example, I used CCSID 1208. When the mkcsv.sh script writes data to this file, it automatically translates it to 1252 as the data is written.

If you'd like to run this QShell script (or any other QShell tool) from a CL program, you can do so with the CMD parameter of the STRQSH command. For example:

STRQSH CMD('/usr/local/bin/mkcsv.sh ACCTLIB CUSTMAS > +
            /tmp/acct-custmas.csv')

I hope that with this script as an example, you'll be able to generate your CSV files exactly the way you want them, every time!

ProVIP Sponsors

ProVIP Sponsors