DFsqlload

DFsqlload
Prev	3.4. Alphabetical Listing	Next

DFsqlload — Create table definitions and import all data into a relational database

Synopsis

DFsqlload [-flavor oracle|postgresql|mysql|mssql] [-d drfname] [-q] [ [-ignore_mssql_priv] | [-ignore_mysql_priv] ] [-type typed|untyped|both] [-coding code|label|both] [-missing code|label] [-date typed|untyped|both] [-noimpute] [-missed] [-table all|DFcoding,DFnullvalue,DFsubjectalias]] {param} {study}

Options

-flavor oracle\|postgresql\|mysql\|mssql	Type of target database. The default is `-flavor oracle`.
-d drfname	A DFdiscover retrieval file to use to record problems encountered during data import.
-q	Quiet mode. Instructs the program to suppress all warning messages. The default, without this option, is to write warning messages to standard error.
-ignore_mssql_priv\|ignore_mysql_priv	Ignore administrator privileges. For MS SQLServer or MySQL, allows tables to be loaded without requiring administrator privileges on the database.
-type typed\|untyped\|both	Type of SQL tables to be created. If set to `typed` (or by default) then tables use field data types as defined in `DFschema`. Table names used are `DFTABLE_###` for regular plates, `DFQC` for plate 511 and `DFREASON` for plate 510. The field names use unique names for user data fields (8 ~ `NF`-3, where `NF` is the total number of fields defined for a given plate), and generic names for DFdiscover fields. If set to `untyped`, all user data fields for regular plates are converted to VARCHAR. Table names used are `DFPLATE_###` for regular plates, `DFQC` for plate 511 and `DFREASON` for plate 510. The field names use unique names for user data fields, and generic names for DFdiscover fields. If set to `both`, both typed and untyped tables are created for regular plates. Only one `DFQC` for plate 511 and one `DFREASON` for plate 510 are created. Both of these tables adhere to typed rules.
-coding code\|label\|both	Coded field specification. If set to `code` (or by default), then there is no translation for coded fields. if set to `label`, then the label corresponding to a given code is imported, including QCs and REASONs. If no label is defined for a given value, then the value is used. If both are requested, then two columns are imported for the coded field, one for the value and the other for the corresponding label.
-missing code\|label	Missing value specification only applies to untyped tables. If set to `code` (or by default), then no translation is done for missing codes. If set to `label`, then codes are translated to their corresponding label. If no label is defined for a given code, the code itself is imported. If the target tables are typed, missing value codes are automatically replaced with `NULL` values in all cases.
-date typed\|untyped\|both	Date type specification for user date fields. If set to `typed`, then the true type defined in `DFschema` for dates is used (the default setting). If set to `untyped`, then dates are converted to strings and imported as type `VARCHAR`. If set to `both` then two columns are imported for the date field, one for the typed date and one for the untyped string representation.
-noimpute	No imputation for partial dates replaces partial dates with NULL values for typed dates only. The default is to impute partial dates for typed dates according to the imputation method specified in `DFschema`.
-missed	Include missed records from the database in the output. Typically missed records are not included as they contain no actual data.
-table all\|DFcoding,DFnullvalue,DFsubjectalias]	List of optional tables to create. If set to `DFcoding`, the optional table `DFCODING` is created. Value and label pairs are imported to this table. If set to `DFnullvalue`, the optional table `DFNULLVALUE` is created. All problem fields that are converted to `NULL` are imported to this table. If the option `-d drfname` is specified, this table is created by default. If set to `DFsubjectalias`, the optional table `DFSUBJECTALIAS` is created. Columns `DFpid` and `DFalias` are imported to this table. If set to `all`, then all optional tables are created and imported.

Required Options

The following two options are required and must appear in order at the end of the option list:

param	A set of parameters of the form `server:database:schema[.tablespace][:username:password]`. The value of these parameters vary depending on the target database, and are detailed in Table 3.12, “Database Parameters”.
study	the DFdiscover study number, from which data records are to be exported.

Table 3.12. Database Parameters

Parameter Description Postgres MySQL Oracle MS SQLServer

server The server name to connect to required required a place holder. Oracle will lookup tnsnames.ora based on database name required

database The database name required see Schema required required

schema The DFdiscover study name required required required required (must be the same as the database name)

tablespace The alternative storage tablespace for Oracle ignored ignored see Schema ignored

username:password

The database login user name and password

optional.

If both username and password specified, login as specified username with specified password.
If only username specified, login as specified username, lookup the the specified user's password from file ~/.pgpass.
If neither username nor password specified, lookup OS user's name and OS user's password from file ~/.pgpass.
If only password is specified, this is an error.

optional.

If both username and password specified, login as specified username with specified password.
If only username specified, discard the specified username and using OS user's name and lookup OS user's password from file ~/.my.cnf. The same applies if neither username nor password are specified.
If only password is specified, this is an error.

optional.

If both username and password specified, login as specified username with specified password.
If only username specified, discard the specified username, using OS user's name and lookup the password from file ~/.orapass identified by database name. If ~/.orapass does not exist or cannot find the password, use Oracle external credential. The same applies if neither username nor password are specified.
If only password is specified, this is an error.

Oracle external credentials

The user must have an OS account. In the Oracle initialization file, parameter initDB_NAME.ora must be set as follows

remote_os_authent = true

optional.

If both username and password are specified, login as the specified username with the specified password.
If only username is specified, login using the OS user's name and lookup the OS user's password from the file ~/.mssqlpass.
If neither username nor password is specified, login using the OS user's name and lookup the OS user's password from the file ~/.mssqlpass.
If only password is specified, login using the OS user's name and the specified password.

Exit Status

DFsqlload exits with one of the following statuses:

0	DFsqlload was able to (re-)create the schema, create all of the tables, and import all of the data without error.
1	The schema and tables were created and the data were imported, but one or more errors were encountered in the imported data.
2	A more serious error was encountered which prevented some or all data from being imported.

Description

DFsqlload is a command-line solution that creates all of the table definitions and imports all of the data into a relational database. One SQL table is created for each DFdiscover plate. In addition, three tables are added to record logging information, qc and reason for change data. Optional tables are created to note any problem fields or store value and label pairs for coded fields.

When run repeatedly, existing SQL tables are compared with the current DFschema file. Unchanged tables are truncated i.e., the data is removed but the table definitions remain. If any changes were made, the existing DFdiscover-defined table is dropped and re-created. DFsqlload calls DFexport.rpc to export all primary records to a temporary file, and then loads the database tables plate by plate.

SQL Database Setup

The following operations are performed on the target database.

At least one and optionally three meta tables are created in the target database.

DFSQLLOAD: Each DFsqlload run creates an entry in this log table. This table is also the lock table that prevents more than one DFsqlload process from operating on the same schema. A NULL value for column DFFINISH indicates that a DFsqlload process is running. If DFsqlload terminates abnormally, this record must be removed manually before starting a new DFsqlload process. The following SQL statements are used internally to create this table. The statements use syntax specific to PostgreSQL. Similar tables are created for MySQL, MS SQLServer and Oracle implementations of DFsqlload.

CREATE TABLE <schema>.dfsqlload
(
  dfuser varchar(30) NOT NULL,   -- the database user
  dfstart timestamp(0) NOT NULL, -- the start date and time
  dffinish timestamp(0),         -- the finish date and time (NULL if process running)
  dfoption varchar(500),         -- the options used with the DFsqlload command
  dfnull int4,                   -- the total number of problem fields converted to null
  dferror int4,                  -- the total number of records discarded or rejected
  dfstatus int2,                 -- the status: 0=process completed, 1=running
  CONSTRAINT dfsqlload_pk PRIMARY KEY (dfstart)
) WITH OIDS;

DFNULLVALUE: This is an optional table where all problem fields that are converted to NULL are recorded. The following SQL statements are used internally to create this table.

CREATE TABLE <schema>.dfnullvalue
(
  dfpid int4 NOT NULL,            -- study subject id
  dfplate int2 NOT NULL,          -- plate
  dfseq int4 NOT NULL,            -- sequence/visit number
  dftable varchar(30) NOT NULL,   -- name of sql table where substitution was made
  dffield varchar(30) NOT NULL,   -- name of sql field where substitution was made
  dfvalue varchar(###),           -- the original value exported from DFdiscover
  dfproblem varchar(###),         -- the reason the value was converted to NULL
  dfraster varchar(12),           -- the image id of the source CRF page
  CONSTRAINT dfnullvalue_pk PRIMARY KEY (dfpid, dfplate, dfseq, dftable, dffield)
) WITH OIDS;

The ### above represents the maximum length encountered in the data source.

DFCODING: This is an optional table where all value and label pairs for coded fields are stored. The following SQL statements are used internally to create this table.

CREATE TABLE <schema>.dfcoding
(
  dfplate int2 NOT NULL,          -- plate
  dffield varchar(30) NOT NULL,   -- sql table column name
  dfcode varchar(###) NOT NULL,   -- code value for this column
  dflabel varchar(###),           -- code label for this column
  CONSTRAINT dfcoding_pk PRIMARY KEY (dfplate, dffield, dfcode)
) WITH OIDS;

The ### above represents the maximum length encountered in the data source.

Verify any existing SQL tables. The following tasks are performed in the verification of existing SQL tables.
1. All tables with the prefix DF are treated as DFsqlload-defined tables. For any given run of DFsqlload, only the tables relevant to the current job are used.
2. Existing SQL table definitions are compared to the DFschema file. DFsqlload will truncate any unchanged tables and re-create changed tables. Changes that may cause DFsqlload to drop a table are as follows:
  - add, delete, reorder fields
  - rename field
  - change data type
  - any change to field size or format causing storage changes in the target database
  - missing code or label changes
  - code or label changes to coded fields
  - SQL tables that have been directly user-modified
3. All privileges (if any) for tables create by DFsqlload are backed up and restored.
4. If other tables reference any tables created by DFsqlload, the foreign keys will either be dropped (Postgresql, MS SQLServer) or disabled (MySQL, Oracle) and saved to the log file in the form of a database-specific SQL statement. The user can choose to execute these statements to restore the foreign keys manually, as DFsqlload does not do it automatically.
5. Other database objects - triggers, views, procedures, etc., which depend upon DFsqlload-defined tables are ignored.

Files used by DFsqlload

All DFsqlload created files are in the study/working_dir/DFsqlload_logs directory by default. If this directory does not exist, DFsqlload will create it at the default location. The file names use time stamp in the format yymmdd_hhmiss as a prefix, where mm is two-digit month and mi is two-digit minute. The time stamp, which is also the value of DFSQLLOAD.DFSTART, is the unique identifier of this DFsqlload process.

Log file: yymmdd_hhmiss.log. This file records the detailed loading progress including schema setup, SQL table definitions, record count, and rejected records with error messages in the following formats:

Regular plates: Record (id, seq, plate, image): error message

QCs and REASONs: Record (id, seq, plate, image, field): error message
DFdiscover Retrieval File: If path is included, this file will be in the specified location. If the file already exists, the existing file will be removed. If the file cannot be created or written, DFsqlload will report an error and continue the loading process. The file is in standard DRF format. The first four fields are Id, Visit, Plate, and Image. The combination of id, visit, and plate is unique. The 5th field records the list of problems. The record level errors appear first, followed by the field level problems. The field level problems are derived from the DFNULLVALUE table, which is created by default if a .drf file is requested. Problem types are summarized below.
1. Record level errors
  - error - incorrect number of fields: The number of fields of a record in plt###.dat does not match the DFschema definition.
  - error - invalid DFdiscover field: The value of DFdiscover leading (1~7) or trailing (NF-2~NF) field is blank or invalid.
  - error - duplicate primary record: Two or more records have the same id, seq, plate combination.
  - error - rejected by database: Any field error that was not identified by DFsqlload, but rejected by the database.
2. Field level problems
  - missing value
  - too wide
  - bad format
  - invalid date
  - partial date
  - undefined code
  - data/type conversion
  The format used in the .drf file is Table1Name: FieldName (problem), FieldName (problem),..., Table2Name: F ieldName (problem),...
3. QC or REASON specific error
  - error - primary record was rejected.
Data file: yymmdd_hhmiss_plt### (for regular plates)

yymmdd_hhmiss_DFreason (for plate 510)

yymmdd_hhmiss_DFqc (for plate 511)

This is the data source of current loading plate created by DFexport.rpc, and is removed automatically after the data is loaded into the SQL database.
Error files: yymmdd_hhmiss_plt###.err (for untyped table)

yymmdd_hhmiss_tbl###.err (for typed table)

yymmdd_hhmiss_DFreason.err (for plate 510)

yymmdd_hhmiss_DFqc.err (for plate 511)

These files record the records rejected by the SQL database along with database generated error messages.
Rejected keys: yymmdd_hhmiss.tmp. This file records the keys (plate, seq, id) of primary records discarded by DFsqlload or rejected by the SQL database. If a REASON or QC record matches one of the key combinations in this file, that REASON or QC record will not be loaded into the SQL database and a message will be written to the log file:

Record (id#,seq#,plate#,image,field#): discarded by DFsqlload.

DFRECORD (error - primary record was rejected).

This file is removed automatically.

Database login credentials files. The following database-specific files may be used to store login credentials.

Postgres: ~/.pgpass. This is a Postgres standard file located in the user's home directory and the file permission must be 600; otherwise, Postgres will ignore it. This file specifies the database login credentials in the format:
```
host:port:database:username:password
```
The username can be any valid database user, for example:
```
parkcity:5432:test_db_name:user_a_name:user_a_password
parkcity:5432:test_db_name:user_b_name:user_b_password
```
MySQL: ~/.my.cnf. This is MySQL standard file located in user's home directory and the file permission should be 600. This file specifies the program groups and options for each group. A typical group is [client]. In this group the database password for OS user can be specified:
```
[client]
password=my_pass
...
```
The global options can be specified in the global option file /etc/my.cnf or DATADIR/my.cnf.
Oracle: tnsnames.ora, ~/.orapass. tnsnames.ora is the Oracle standard net services file located in ORACLE_HOME/network/admin directory by default. Oracle client will lookup net service names from this file. This file can also be in ANY_DIR/network/admin and the environment variable ORACLE_HOME is set to ANY_DIR.

~/.orapass is NOT an Oracle standard file. It is provided for convenience for DFdiscover users to store their database password. This file should be in the user's home directory and file permission should be 600. The format is:
```
db_name:password
```
for example,
```
test_db_name:test_db_password
production_db_name:production_db_password
```
MS SQLServer: ~/.mssqlpass. This is not an MS SQLServer standard file. It is provided as a convenience to DFdiscover users for storing their database password. This file needs to be kept in the user's home directory and the file permission needs to be 600. The format of this file is:
```
server_name:password
```
There are two types of entries that can be used. Both the server name and the password to use can be specified as follows:
```
my_server:my_server_password
```
or a password can be specified for any server this user connects to as in the following example:
```
*:any_server_password
```

Tables

The following details apply to tables and fields created by DFsqlload.

Table Names. Table names follow these rules. DFSQLLOAD is a log table and is used to store the details of a given DFsqlload for a particular schema. The table for plate 510 is named DFREASON. The table for plate 511 is named DFQC. All other plates are named DFPLATE_nnn (untyped) or DFTABLE_nnn (typed), where nnn is the plate number. Optional tables created are DFNULLVALUE for the storage of any problem field data conversions and DFCODING for the storage of value/value label pairs. All table names are in uppercase letters. Note that the table names are case sensitive for MySQL on UNIX platforms only.
Table Types. Table types used for each of the supported database products are as follows: For PostgreSQL and Oracle, all tables are type relational table. For MySQL, all tables are type InnoDB table.
Field Names for study tables. Field naming follows these rules. For fields 1 to 7 and the last three fields for a plate, generic variable names are used. For all fields in between, unique variable names are used. Any field names matching an SQL keyword get an _ (underscore) appended. This is target product dependent. Refer to the documentation for the product you are using for a complete list of the relevant keywords. Any non-alphanumeric characters are replaced with _ (underscore). If a field name starts with a digit, DF_ is prepended. Field names are truncated to 30 chars. A sequence number is appended to each non-unique field name.

DFdiscover type to SQL type mapping: DFsqlload will map DFdiscover types to SQL types according to Data typing when creating SQL tables for typed columns. Untyped columns are mapped to VARCHAR (PostgreSQL, MySQL) and VARCHAR2 (Oracle).

Table 3.13. Data typing

DFdiscover type	PostgreSQL	Oracle	MySQL	MS SQL Server
string	VARCHAR(n)	VARCHAR2(n)	VARCHAR(n)^[a]	VARCHAR(n)
check	INT2	NUMBER(p)	INT2	SMALLINT
choice	INT2	NUMBER(p)	INT2	SMALLINT
integer	INT4	NUMBER(p)	INT4	INT
float	NUMERIC(p,s)	NUMBER(p,s)	DECIMAL(p,s)	DECIMAL(p,s)
vas	NUMERIC(p,s)	NUMBER(p,s)	DECIMAL(p,s)	DECIMAL(p,s)
number(nn:nn)	TIME	VARCHAR2(n)	TIME	VARCHAR(n)
date	DATE	DATE	DATE	DATETIME
timestamp	TIMESTAMP	DATE	DATETIME	DATETIME
^[a] MySQL converts VARCHAR(n) to CHAR(n) (n < 4) or TEXT (n > 255).

SQL column naming convention: DFsqlload will follow the rules defined in Fields for typed SQL tables and Fields for untyped SQL tables for DFdiscover field name to SQL column naming convention when creating SQL tables.

Table 3.14. Fields for typed SQL tables

Field	Field Number	Field Name	Field Type	Missing Code/Label	Partial Date	Impute
Coded field: Code	1~7,NF-2~NF	Generic	True Type	No
Coded field: Label	1~7,NF-2~NF	Generic	VARCHAR	No
Coded field: Code ^[a]	1~7,NF-2~NF	Generic	True Type	No
Coded field: Label ^[b]	1~7,NF-2~NF	U_Generic	VARCHAR	No
Coded field: Code	8~NF-3	Unique	True Type	No
Coded field: Label	8~NF-3	Unique	VARCHAR	Yes
Coded field: Code^[a]	8~NF-3	Unique	True Type	No
Coded field: Label^[b]	8~NF-3	U_Unique	VARCHAR	Yes
Date field: Typed	8~NF-3	Unique	True Type	No	No	Yes
Date field: Untyped	8~NF-3	Unique	VARCHAR	Yes	Yes	No
Date field: Typed^[a]	8~NF-3	Unique	True Type	No	No	Yes
Date field: Untyped^[b]	8~NF-3	U_Unique	VARCHAR	Yes	Yes	No
Other fields	8~NF-3	Unique	True Type	No
Other fields	1~7,NF-2~NF	Generic	True Type	No	No	No
^[a] The first field, if `-coding both` or `-date both` is specified. ^[b] The second field, if `-coding both` or `-date both` is specified.

Table 3.15. Fields for untyped SQL tables

Field	Field Number	Field Name	Field Type	Missing Code/Label	Partial Date	Impute
Coded field: Code	1~7,NF-2~NF	Generic	True Type	No
Coded field: Label	1~7,NF-2~NF	Generic	VARCHAR	No
Coded field: Code^[a]	1~7,NF-2~NF	Generic	True Type	No
Coded field: Label^[b]	1~7,NF-2~NF	U_Generic	VARCHAR	No
Coded field: Code	8~NF-3	U_Unique	VARCHAR	Yes
Coded field: Label	8~NF-3	U_Unique	VARCHAR	Yes
Coded field: Code^[a]	8~NF-3	Unique	VARCHAR	Yes
Coded field: Label^[b]	8~NF-3	U_Unique	VARCHAR	Yes
Date field: Typed	8~NF-3	U_Unique	True Type	No	No	Yes
Date field: Untyped	8~NF-3	U_Unique	VARCHAR	Yes	Yes	No
Date field: Typed^[a]	8~NF-3	Unique	True Type	No	No	Yes
Date field: Untyped^[b]	8~NF-3	U_Unique	VARCHAR	Yes	Yes	No
Other fields	8~NF-3	U_Unique	VARCHAR	Yes
Other fields	1~7,NF-2~NF	Generic	True Type	No	No	No

Locking: DFsqlload uses the table DFSQLLOAD to prevent two processes from working on the same database schema. A NULL entry in DFSQLLOAD.DFFINISH indicates that another process is running or terminated abnormally. If DFSQLLOAD does not exist, DFsqlload will create it. Upon completion, DFsqlload updates DFSQLLOAD.DFFINISH to the finish time and the DFSQLLOAD.DFSTATUS to zero (normal exit process).
DFdiscover Retrieval File: The first line in the DRF contains a two-field comment. The first field contains the username and creation timestamp. The second field identifies the creator of the DRF as DFsqlload and lists the parameters used to access the SQL database. The next line is a comment describing the format of the DRF records to follow. One DRF data record is created for each DFdiscover record with data import problems. Each DRF data record is identified by Id, Visit, Plate and Image. The 5th field records the SQL table name, followed by the field name and problem description for each problem encountered. Multiple field/problem descriptions are separated by commas.
Date fields: DFsqlload converts two-digit years to four-digits based on the cut off year specified for each date field in DFschema, or the year the study began (%B) if not specified. DFsqlload imputes partial dates according to the imputation method specified for each date field.
Number of fields: DFsqlload creates a minimum of 10 fields (1~7,NF-2~NF) for each SQL table. The maximum number of fields is limited by the database: PostgreSQL 1600, MySQL 1000, Oracle 1000, MS SQLServer 1024. DFsqlload will report errors for DFdiscover records that do not meet these field requirements and will continue the loading process.

Schema

Postgres. A schema is a namespace that contains DFdiscover study tables. Schema names are DFdiscover study names. If the schema name specified in the DFsqlload command line does not exist in the Postgres database, DFsqlload will create the schema as specified. DFsqlload will never drop existing schemas.

Oracle. A schema is a database user who is also the owner of DFdiscover tables that belong to one study. The schema specified in the command line must already exist in Oracle, unless the tablespace name (must exist in database) is also specified. If the schema does not exist, DFsqlload will create the schema and assign the specified tablespace as the schema's default tablespace. The DFdiscover study tables will be created, if specified, in the specified tablespace, otherwise in the schema's default tablespace. DFsqlload will check the schema's quota privilege on storage tablespace and grant unlimited quota privilege on that tablespace. DFsqlload will never drop existing schemas.

MySQL. There is no schema in MySQL. A MySQL database maps to a DFdiscover study database. In the command line, the schema name must be the same as the database name. Note that the database name is case sensitive whether MySQL is running on a UNIX or Windows platform. If the specified database does not exist in the MySQL database, DFsqlload will create the database as specified. DFsqlload will never drop existing MySQL databases.

MS SQLServer. There is no schema in MS SQLServer in version 2000 and older. A SQLServer database maps to a DFdiscover study database. In the command line, the schema name must be the same as the database name.

Examples

Example 3.72. Import study 254 into mySQL

Import the validation study val254 into mySQL on host talisman.

% DFsqlload -flavor mysql -q talisman:val254:val254:root:mysql 254

Prev	Up	Next
DFsendfax	Home	DFstatus