Table of Contents
DFsqlload provides a convenient method for migrating data from the proprietary DFdiscover storage to storage in a relational database. Although this link is not maintained in real time, the relational side can be synchronized with the proprietary side whenever required. Data integrity is maintained in that only changes made to the data on the validated DFdiscover side are kept. These changes are reflected in the relational copy only after synchronizing with DFsqlload. DFsqlload can be scheduled using cron or run interactively as required. In this way, snapshots of the database at a known time can be generated and used for whatever purpose the user deems important, much in the same way as DFsas provides snapshots of the database in SAS format.
DFsqlload is only useful in situations where one of the following four database products are being used or are planned on being used in a particular computing environment.
MySQL - a common, multi-platform, open source relational database product.
PostgreSQL - another multi-platform, open source relational database product that has more sophisticated features than MySQL, which may make it better suited for server environments.
Oracle - commercial relational database product.
Microsoft SQLServer - commercial relational database product.
All four products are network-aware, and provide native as well as JDBC and ODBC-based connectivity from client applications running on any platform.
DFsqlload is one of three data export utilities available in DFdiscover. DFsqlload provides an easy to use export facility for relational databases. DFexport.rpc is a flat-file data export utility, and DFsas provides DFdiscover export capability for the SAS environment.
All methods can be used at any time and produce a true equivalent to the data stored in DFdiscover at the time the export is performed. This chapter will focus on how this is accomplished with relational databases.
There are many applications written for relational databases for many purposes. The two main applications benefiting from DFsqlload are reporting and analysis. Writing custom reports for DFdiscover is possible, but can take effort compared to the many report-writing and spreadsheet applications with relational database connectivity on the market today. For Windows users, there is Microsoft Office, Crystal Reports, Visual Basic, to name just a few of the applications available. UNIX users have OpenOffice, many java-based report writers, and Perl at their disposal. Mac users in most cases can pick the best from both worlds. There are hundreds of applications out there, some of which you may already be using.
DFdiscover is a validated system. The ways to get data into DFdiscover have been rigorously tested and proven accurate. Providing for one-way movement of data out of DFdiscover allows DFdiscover to maintain the integrity of the DFdiscover database by limiting the ability of outside processes to compromise the validity of the system.
Plates: DFdiscover keeps all data from a given form type together as one record type or plate. There may be, for example, blocks of information on a particular plate that repeat and would be better modeled with separate tables. DFsqlload makes no attempt to do this. When DFsqlload is run, each plate becomes its own relational table.
Fields: DFdiscover plates are made up of fields, each field storing data according to the layout of the plate on the paper CRF form. When a DFdiscover study is setup, you will recall that each plate is imported into the setup tool as a PS or PDF representation of the printed page that will ultimately be used to collect data for the study you are designing. The same fields defined during DFdiscover setup are used to define column names for the relational tables DFsqlload will create. There are some differences in the rules for naming columns in relational tables, but these differences are handled by DFsqlload as it has its own set of rules for doing so. DFsqlload provides two options for the creation of tables. One is to export all data as if it were character string data. The other option is to create columns of the same data type as the source fields.
Coding: DFdiscover has the ability to use multiple missing value codes for different purposes. Data values with corresponding labels may be used to represent different discreet conditions. Dates may be incomplete and as such, have rules for imputation. None of these concepts are an integral part of relational databases and require different ways to handle these situations as they arise. DFsqlload provides options for passing both coded values and labels to the relational side.
Schemas and Tablespaces: DFdiscover keeps the information for any given study together under one study number. There is no sharing of data between studies, so in cases, for example, where two studies share the same sites, the sites database is duplicated for the two studies. Each study database is independent of the others. Some relational database products allow for multiple groups of potentially similar data in the same database through use of the concept of the database schema. DFsqlload is aware of these possibilities and supports multiple schemas if the relational database product supports them. Tablespaces provide for another layer of hierarchy in data modeling and are supported as well.
It is very simple to create a relational database equivalent to a DFdiscover study once a working relational database environment is established on your network. On the relational database side, you will need to know the following:
What relational database product am I using - MySQL, PostgreSQL, MS SQLServer or Oracle
What is the name of the server hosting the relational database
In the case of Oracle, what tablespace has been assigned to me for this purpose by my DBA.
In the case of Oracle and Postgres and MS SQLServer, what database will be used to store my DFdiscover schemas
What are my login credentials for the relational database server I am using.
With the answers to these questions, simply run DFsqlload as shown and a relational database equivalent to your DFdiscover study will be ready to use. Using Oracle (the default) as an example, the command would be:
% DFsqlload bluto:mystudies:foo.bar:olive:popeye 254
which will create a snapshot of study 254 in the mystudies database,
in the foo schema with the tablespace name bar
on the server bluto with user credentials username
olive and password popeye.
By default, all columns are typed as closely as possible to the data types used by DFdiscover during setup. All dates are imputed. No coded values are translated. No missing codes are translated. No optional tables are created. If these defaults are acceptable, then this is all you need to know.
The default settings for DFsqlload should be adequate for most users. However, if the purpose of using DFsqlload is to provide a relational database view of a study to a set of specialized users with different requirements, then you might want to get as much information as possible from the DFdiscover side to the relational database side. For this scenario, you will need to use some of the options DFsqlload offers.
Answers to the following questions will affect how DFsqlload is used. See also the reference page for DFsqlload elsewhere in this guide.
Q: | What is the target database type? |
A: |
DFsqlload supports four popular SQL server platforms - Oracle,
PostgreSQL, MS SQLServer and MySQL. Your target
database will need to be one of these four types. Once this is known,
DFsqlload is directed to create
a set of relational tables for a given database type using the
|
Q: | Is it important to have the data type of each column in my relational tables match as closely as possible the data types for each field in my DFdiscover plates? |
A: |
The default behavior of DFsqlload is to preserve data typing. The older
version of DFsqlload that was shipped with DFdiscover 3.7 and 3.7.001 did not
preserve data typing and all user fields were converted to type
To preserve Data Typing - omit this option or use |
Q: | My DFdiscover setup makes extensive use of codes and value labels for these codes. How can I make this information accessible from relational tables? |
A: |
Relational tables rely on other relational tables when a code has a
corresponding label and it is the label that you want to report. Rather than
create a separate table for every coded variable, DFsqlload offers two options
which may be used together or separately depending on the specific requirements.
The |
Q: | My DFdiscover setup includes subject aliases. How can I access this information from relational tables? |
A: |
Use the
The |
Q: | In my DFdiscover setup, I have defined a number of missing value codes that are important for correct interpretation of the data. How do I make those codes available in relational tables? |
A: |
How missing codes are handled depends upon the |
Q: | How are dates handled by DFsqlload? |
A: |
By default, DFsqlload creates date columns using the correct data type for the
target system. A string representation of a date can be output in a separate
column using the option |
Q: | How do I find out if something went wrong? |
A: |
DFsqlload writes extensive logging information. When DFsqlload encounters a
problem, the problem is written to
If you use the Complete records will be rejected if the following conditions are encountered.
These cases will also appear in the .drf file if the |
Q: | Can I add tables to my SQL database outside DFdiscover? |
A: | Yes. DFsqlload will ignore them. |
Q: | What if the DFdiscover study schema is changed? |
A: | DFsqlload recreates the tables it needs each time it is run. Be careful when changing DFsqlload program options from run to run. SQL tables created from a previous run are not recreated if they are not required by the current run. |
Q: | What happens if I modify the definition of one of the SQL tables used by DFdiscover? |
A: |
DFsqlload will drop the table (and all of your changes) and recreate it from
the current |
Q: | How does DFsqlload update my SQL tables? |
A: | If there have been no changes to the data definitions, the data is dropped from the SQL table and reloaded from DFdiscover. This occurs even if there have been no data changes for that DFdiscover plate since the last time DFsqlload was run. |