Chapter 9. DFsqlload: DFdiscover to Relational Database Tables

Question

Q:

What is the target database type?

Answer 1

A:

DFsqlload supports four popular SQL server platforms - Oracle, PostgreSQL, MS SQLServer and MySQL. Your target database will need to be one of these four types. Once this is known, DFsqlload is directed to create a set of relational tables for a given database type using the -flavor option. If the target database type is Oracle, then this option is not required as DFsqlload assumes Oracle by default. Otherwise,

Target is a PostgreSQL database - use -flavor postgresql
Target is a MySQL database - use -flavor mysql
Target is a MS SQLServer database - use -flavor mssql
Target is an Oracle database - omit this option, or use -flavor oracle

Answer 2

A:

The default behavior of DFsqlload is to preserve data typing. The older version of DFsqlload that was shipped with DFdiscover 3.7 and 3.7.001 did not preserve data typing and all user fields were converted to type VARCHAR. If you want to retain this behavior, then you will need to use the -type option or the applications you have written that use any of the tables created by these older versions of DFsqlload will probably not work. If you are writing new applications, but want your old applications to work as well, you can get DFsqlload to create both typed and untyped tables in your target relational database.

To preserve Data Typing - omit this option or use -type typed To provide both typed and untyped tables - use -type both To provide just untyped tables - use -type untyped

Answer 3

A:

Relational tables rely on other relational tables when a code has a corresponding label and it is the label that you want to report. Rather than create a separate table for every coded variable, DFsqlload offers two options which may be used together or separately depending on the specific requirements. The -coding option controls the inclusion of label data. By default, you get just the codes. You can create a column for the code and a column for the label corresponding to that code using the option -coding both. If all you want is the labels, use the option -coding label. The other way to get value/label into your relational tables is to create the optional DFCODING table. This is done using the -table dfcoding option. This optional table contains all codes and labels for all DFdiscover fields with type CHOICE. See the DFsqlload reference page for more details.

Answer 4

A:

Use the -table dfsubjectalias option to request creation of the optional DFSUBJECTALIAS table containing two columns, DFpid and DFalias. Thereafter it is an easy SQL join statement to include the subject alias together with the subject id.

The DFSUBJECTALIAS table is also created by default if subject aliases are defined when loading of all tables is requested.

Answer 5

A:

How missing codes are handled depends upon the -type option. If the target tables are untyped, then by default, codes are output in the relational tables as is. To get the labels corresponding to a missing value code, use the option -missing label. If the target tables are typed, then all missing codes are converted to NULL and logged as such, regardless of how the -missing option is used. If the -table dfnullvalue option is used, a record of each substitution is created in the optional DFNULLVALUE table.

Answer 6

A:

By default, DFsqlload creates date columns using the correct data type for the target system. A string representation of a date can be output in a separate column using the option -date both. If just the string representation is required, use the -date untyped option. Partial dates (i.e., dates where the day or month are missing) are imputed by default, according to the rules specified in the DFdiscover setup. If imputed dates are not desired, you can turn it off using the -noimpute option, in which case any partial dates will be converted to NULL.

Answer 7

A:

DFsqlload writes extensive logging information. When DFsqlload encounters a problem, the problem is written to stderr by default, unless overridden with the -q option. In either case, problems are logged in the DFsqlload log file for a given run. If DFsqlload encounters problems with DFdiscover data, it replaces the problem data with a NULL value, writes the substitution to the log file and optionally creates a record for the substitution in the DFNULLVALUE table. The following identifies typical problems and how they are handled by DFsqlload.

Any field that is blank or contains only white space (space, tab) will be converted to a NULL. These substitutions are not logged.
All missing value codes are converted to NULL if the target tables are typed (the default). This is applied consistently, even if the missing value code happens to be legal for some field types.
Any value wider than the storage width defined for the field in the DFdiscover schema is converted to NULL.
If a field has a format defined in the DFdiscover schema, values are checked for adherence to the format and are converted to NULL if they do not conform.
Invalid dates are converted to NULL.
If date imputation is not defined in the DFdiscover schema, partial dates are converted to NULL. Any imputed dates that are not legal dates are also converted to NULL.
If a check or choice box contains an undefined code, it is converted to NULL.

If you use the -d drfname option, a .drf file will be created containing a reference to each DFdiscover record having one or more non-blank substitutions to NULL.

Complete records will be rejected if the following conditions are encountered.

the record does not contain the correct number of fields
any of the DFdiscover fields are blank or invalid
a record with the same keys has already been imported

These cases will also appear in the .drf file if the -d drfname option is used.

Answer 8

A:

Yes. DFsqlload will ignore them.

Answer 9

A:

DFsqlload recreates the tables it needs each time it is run. Be careful when changing DFsqlload program options from run to run. SQL tables created from a previous run are not recreated if they are not required by the current run.

Answer 10

A:

DFsqlload will drop the table (and all of your changes) and recreate it from the current DFschema and data in the DFdiscover study database.

Answer 11

A:

If there have been no changes to the data definitions, the data is dropped from the SQL table and reloaded from DFdiscover. This occurs even if there have been no data changes for that DFdiscover plate since the last time DFsqlload was run.

Prev		Next
Chapter 8. DFsas: DFdiscover to SAS®	Home	Appendix A. Copyrights - Acknowledgments

Chapter 9. DFsqlload: DFdiscover to Relational Database Tables

9.1. Introduction

9.1.1. Overview

9.1.2. About DFsqlload

9.2. DFsqlload and Relational Database Concepts

9.2.1. Why Relational Databases?

9.2.2. Why is DFsqlload a one-way street?

9.2.3. Relational Database Concepts

9.3. Using DFsqlload

9.3.1. DFsqlload defaults - a quick tutorial

9.3.2. DFsqlload in Detail

9.3.2.1. DFsqlload Options

Q:	What is the target database type?
A:	DFsqlload supports four popular SQL server platforms - Oracle, PostgreSQL, MS SQLServer and MySQL. Your target database will need to be one of these four types. Once this is known, DFsqlload is directed to create a set of relational tables for a given database type using the `-flavor` option. If the target database type is Oracle, then this option is not required as DFsqlload assumes Oracle by default. Otherwise, Target is a PostgreSQL database - use `-flavor postgresql` Target is a MySQL database - use `-flavor mysql` Target is a MS SQLServer database - use `-flavor mssql` Target is an Oracle database - omit this option, or use `-flavor oracle`
Q:	Is it important to have the data type of each column in my relational tables match as closely as possible the data types for each field in my DFdiscover plates?
A:	The default behavior of DFsqlload is to preserve data typing. The older version of DFsqlload that was shipped with DFdiscover 3.7 and 3.7.001 did not preserve data typing and all user fields were converted to type `VARCHAR`. If you want to retain this behavior, then you will need to use the `-type` option or the applications you have written that use any of the tables created by these older versions of DFsqlload will probably not work. If you are writing new applications, but want your old applications to work as well, you can get DFsqlload to create both typed and untyped tables in your target relational database. To preserve Data Typing - omit this option or use `-type typed` To provide both typed and untyped tables - use `-type both` To provide just untyped tables - use `-type untyped`
Q:	My DFdiscover setup makes extensive use of codes and value labels for these codes. How can I make this information accessible from relational tables?
A:	Relational tables rely on other relational tables when a code has a corresponding label and it is the label that you want to report. Rather than create a separate table for every coded variable, DFsqlload offers two options which may be used together or separately depending on the specific requirements. The `-coding` option controls the inclusion of label data. By default, you get just the codes. You can create a column for the code and a column for the label corresponding to that code using the option `-coding both`. If all you want is the labels, use the option `-coding label`. The other way to get value/label into your relational tables is to create the optional `DFCODING` table. This is done using the `-table dfcoding` option. This optional table contains all codes and labels for all DFdiscover fields with type `CHOICE`. See the DFsqlload reference page for more details.
Q:	My DFdiscover setup includes subject aliases. How can I access this information from relational tables?
A:	Use the `-table dfsubjectalias` option to request creation of the optional `DFSUBJECTALIAS` table containing two columns, `DFpid` and `DFalias`. Thereafter it is an easy SQL join statement to include the subject alias together with the subject id. The `DFSUBJECTALIAS` table is also created by default if subject aliases are defined when loading of all tables is requested.
Q:	In my DFdiscover setup, I have defined a number of missing value codes that are important for correct interpretation of the data. How do I make those codes available in relational tables?
A:	How missing codes are handled depends upon the `-type` option. If the target tables are untyped, then by default, codes are output in the relational tables as is. To get the labels corresponding to a missing value code, use the option `-missing label`. If the target tables are typed, then all missing codes are converted to `NULL` and logged as such, regardless of how the `-missing` option is used. If the `-table dfnullvalue` option is used, a record of each substitution is created in the optional `DFNULLVALUE` table.
Q:	How are dates handled by DFsqlload?
A:	By default, DFsqlload creates date columns using the correct data type for the target system. A string representation of a date can be output in a separate column using the option `-date both`. If just the string representation is required, use the `-date untyped` option. Partial dates (i.e., dates where the day or month are missing) are imputed by default, according to the rules specified in the DFdiscover setup. If imputed dates are not desired, you can turn it off using the `-noimpute` option, in which case any partial dates will be converted to `NULL`.
Q:	How do I find out if something went wrong?
A:	DFsqlload writes extensive logging information. When DFsqlload encounters a problem, the problem is written to `stderr` by default, unless overridden with the `-q` option. In either case, problems are logged in the DFsqlload log file for a given run. If DFsqlload encounters problems with DFdiscover data, it replaces the problem data with a `NULL` value, writes the substitution to the log file and optionally creates a record for the substitution in the `DFNULLVALUE` table. The following identifies typical problems and how they are handled by DFsqlload. Any field that is blank or contains only white space (space, tab) will be converted to a `NULL`. These substitutions are not logged. All missing value codes are converted to `NULL` if the target tables are typed (the default). This is applied consistently, even if the missing value code happens to be legal for some field types. Any value wider than the storage width defined for the field in the DFdiscover schema is converted to `NULL`. If a field has a format defined in the DFdiscover schema, values are checked for adherence to the format and are converted to `NULL` if they do not conform. Invalid dates are converted to `NULL`. If date imputation is not defined in the DFdiscover schema, partial dates are converted to `NULL`. Any imputed dates that are not legal dates are also converted to `NULL`. If a check or choice box contains an undefined code, it is converted to `NULL`. If you use the `-d drfname` option, a .drf file will be created containing a reference to each DFdiscover record having one or more non-blank substitutions to `NULL`. Complete records will be rejected if the following conditions are encountered. the record does not contain the correct number of fields any of the DFdiscover fields are blank or invalid a record with the same keys has already been imported These cases will also appear in the .drf file if the `-d drfname` option is used.
Q:	Can I add tables to my SQL database outside DFdiscover?
A:	Yes. DFsqlload will ignore them.
Q:	What if the DFdiscover study schema is changed?
A:	DFsqlload recreates the tables it needs each time it is run. Be careful when changing DFsqlload program options from run to run. SQL tables created from a previous run are not recreated if they are not required by the current run.
Q:	What happens if I modify the definition of one of the SQL tables used by DFdiscover?
A:	DFsqlload will drop the table (and all of your changes) and recreate it from the current `DFschema` and data in the DFdiscover study database.
Q:	How does DFsqlload update my SQL tables?
A:	If there have been no changes to the data definitions, the data is dropped from the SQL table and reloaded from DFdiscover. This occurs even if there have been no data changes for that DFdiscover plate since the last time DFsqlload was run.