DFexport.rpc

DFexport.rpc — Export data records from one or multiple plates from a study data file

Synopsis

DFexport.rpc [-A] [ [-w] | [-a] ] [ [-c] | [-j] ] [-d] [-e] [-J] [-m] [-p] [-q] [-h] [-k] [-z] [-s status_list] [-v #, #-#] [-n #, #-#] [-I #, #-#] [-V #, #-#] [-C yy/mm/dd-yy/mm/dd] [-M yy/mm/dd-yy/mm/dd] [-L lostcode] [ [-U fieldname_list] | [-G fieldname_list] ] [-f fieldnum_list] [-H header_list] {study} {plate(s)} {outfile}

Description

DFexport.rpc can be used to export whole data records or selected fields, from specified plates of a specified study database. Data records are exported in ASCII text format. DFexport.rpc does not provide support for joining fields from different plates or studies. To use DFexport.rpc users must have 'Export Data' permission, which is granted by an administrator on a study-by-study basis (see System Administrator Guide, Server). Users with export permission will only receive records for which they have get permission, which may be restricted by level, site, subject, assessment and plate. DFexport.rpc provides no warning that some records cannot be exported as this would itself provide information about the existence of restricted data records.

DFexport.rpc may appear in a shell script used to create reports stored in the study reports directory, to be run in DFexplore from the Reports View, or in a shell script run using dfexecute in an edit check. In these cases the permissions of the DFexplore user running the report or edit check will be applied. Thus the behavior and output may differ depending on the permissions of the user who runs the script.

If the same script is run from the UNIX shell, the permissions associated with the UNIX login name apply. Thus, if a user has different UNIX and DFexplore login accounts with different permissions, the user may get different results depending on whether they run the script in the UNIX shell or DFexplore environments.

Options

-A

output the subject alias, if defined, in place of the subject ID. If there is no subject alias, output the subject ID.

-w, -a

output mode. The output is written (-w) to or appended (-a) to the output file. If the output file does not exist, it is created and written to for either output mode. If the output file already exists, it is overwritten with the -w mode or appended to with the -a mode. The output mode is ignored if the output file is standard output (indicated by -).

-c, -j

default date format. Without this option, date values are output in the format specified by their variable definition, and for partial date values, without any date imputation. With this option, date values are output in calendar (-c) format or julian (-j) format. In calendar format, the existing variable format is preserved except that 4-digit years are substituted for occurrences of 2-digit years. If the date value is not present, a blank value, "", is output, or if the date value cannot be imputed, the value * is output. In julian format, the date value is output as the number of days since a fixed point in the past. The fixed point in the past is the same for all invocations of the command allowing for date arithmetic. If the date value is not present, the value 0 is output, or if the date value is illegal, the value -1 is output. In both types of formatting, date rounding is also performed as specified by the variable definition.

-d

decode coded variables. With this option, all requested variables that are coded are output with their decoded labels. The default behavior, without this option, is to output the code itself. This option can be invoked at the individual variable level by appending the :d qualifier to the variable.

-e

outfile extension. With this option, the output filename specified will have a text file extension (.txt) added to the end of the user defined outfile name. Outfile extension is ignored if the output file is standard output indicated by '-'. The outfile extension is also ignored if only a single plate is specified. In the case of a single plate, the outfile name will be exactly as specified.

If this option is used along with the csv (-z) option, the csv extension (.csv) is added to the end of the specified output instead of the text extension (.txt).

-J

Outer join with requested fields that may not exist for current plate. With this option, if the user specifies one or more fields that do not exist for a plate that is requested, data for that plate will still be exported but non-existent fields will have a data field containing the missed subsitution code and a header containing "NOT_DEFINED". If this option is omitted, plates that do not contain one or more of the fields requested will not be exported.

-m

match data record validation level. This option is only relevant when exporting metadata records: reasons (plate 510) and queries (plate 511); and is ignored when exporting data records.

Meta-data records may have different validation levels from the underlying data records to which they are attached. When metadata records are exported using the -m option, the validation level of each exported metadata record is changed to match the validation level of the data record to which it is attached.

-p

trailing pipe. For backwards compatibility, exported data records can be terminated by a trailing |, if one is not already present. This allows exported records to maintain the original DFdiscover record format so that they can be, for example, easily imported. A trailing pipe is only needed to make data records compatible.

[Important]Important

DFdiscover does not expect a trailing pipe to be present in query records (plate 511) or reason for change records (plate 510). Hence, the -p option should not be used when exporting query records or data change records if the intention is to re-import them into the database using DFimport.rpc.

-q

quiet mode. This option instructs the program to execute in quiet mode, silencing all warning messages. The default, without this option, is to write warning messages to standard error. Warning messages are generated upon a request to export a field that does not exist in the record.

-h

header. Include as the first record in the output file, a delimited record of variable names for the columns in the data records. The variable names are, by default, the alias variable names defined in the study data dictionary combined with -H. If fields are extracted using -G, the header contains the variable names combined with -H.

This option cannot be used with an export of the new record queue, plate 0, as the variable names are not fixed. It can however, be used with the export of query records (plate 511) and reason for data change records (plate 510). The headers for query and reason for data change records are defined in the study schema.

The trailing character is different in data records and queries. Data records are terminated by a final | while queries are not. This also applies to the header records. Thus, when exporting data records the header record is terminated by a trailing | while this is omitted when exporting queries.

If multiple plates are requested and the -h option is used, a warning message will appear if the output is written to standard output. The warning message will notify users that the per plate headers will be interleaved with multiple plate output data.

-k

keys only. Output only the key fields for each data record. This includes, in order: id, plate, visit, status, and validation level, as | delimited fields. This option is a shorthand notation for the equivalent option -f "7,5,6,1,2". This option cannot be used in conjunction with -f, -G, or -U.

-z

Comma Separated Variables (CSV) format. This option exports all records so that they are compliant with this popular format. If this option is used in conjunction with the file extension option, the extension added to the end of the outfile name is the csv extension (.csv).

Required Options

The following three options are required and must appear in order at the end of the option list:

study

the DFdiscover study number, from which data records are to be exported.

plate(s) #, #-# all

plate numbers of the data files to export. A single plate can be specified or a list of plates can be specified to run in one invocation. If the same plate is listed more than once, the plate data is only exported once. The keyword 'all' can be used to export all existing plates within the specified study including the special reserved plates.

Special reserved plate numbers include: 0 for new records (received but not yet reviewed), 501 for returned Query reports, 510 for reason for data change records, and 511 for queries. If plate 511 is exported, the validation level of all exported records is updated to reflect the current validation level of the record that the notes are attached to.

outfile

output file that the exported records are written to. The user executing DFexport.rpc must have permission to create or modify this file. In the case of multiple plates being exported, the outfile name is used as the base filename from which a unique filename is constructed by appending each three digit, zero-padded plate number. If the output file is given as -, exported records are written to standard output rather than a file.[8]

Record Selection Options

The following options allow specific records to be selected from the output. With no options specified, all records are output. With options specified, only records matching the selection criteria are output. If multiple options are specified, only those records that match all criteria are output.

-s status

record status. The recognized status keywords consist of the legacy terminology: clean, dirty, error, CLEAN, DIRTY, ERROR, missed, primary, secondary, all and the current terminology: final, incomplete, pending, missed, all. Any combination of legacy and current terminology can be given as a quoted string. The default is records of all statuses.

If plate 511 is exported, the available status keywords are the same but their meaning (as related to query status) is translated by the table:

Table 3.8. Record and query status equivalence

record statusdeletefinalincompletependingsecondary [a] secondary [b] secondary [c]
query statusdeletenewin unsent reportresolved NAresolved irrelevantresolved correctedin sent report

[a] For backwards compatibility, old keyword CLEAN is supported.

[b] For backwards compatibility, old keyword DIRTY is supported.

[c] For backwards compatibility, old keyword ERROR is supported.


-v #, #-#

validation level. By default, records at all validation levels are exported, independent of the maximum validation level defined for the user. Specifying this argument causes only those records that match a validation level or are within a validation level to be exported.

-I #, #-#

subject ID. If this option is specified, only those records which have a subject ID matching the selection criteria are output. It is not valid to select records using -I in the same invocation as -n (selection by site). If this is done, DFexportrpc; will exit with an error message.

-n #, #-#

site ID. This option allows selection by site ID, site range or any combination thereof. However, it is not valid to specify -n in the same invocation as -I (subject ID). If this is done, DFexport.rpc will exit with an error message.

-V #, #-#

visit/sequence number. Specifying this option selects records to output by the visit number. Only those records matching the visit selection criteria are output.

-C yy/mm/dd-yy/mm/dd

creation date. If this option is specified, only those records which have a creation date matching the selection criteria are output.

Note that the selection criteria apply only to the creation date of data records and not queries.

-M yy/mm/dd-yy/mm/dd

modification date. If this option is specified, only those records which have a modification date matching the selection criteria are output.

Note that the selection criteria apply only to the modification date of data records and not queries.

For both -C and -M, the term today can be used to represent today's date.

-L lostcode

missing value code for missed records.

Missed records are exported if the keywords 'missed', 'lost', or 'all' are specified with the status option, -s. Missed records can be exported in 2 different formats. The first format is comprised of the missed data fields (i.e. including the missed category code and the text field containing the reason the record was classified as missed). The second format is comprised of all of the user defined data fields that appear on the CRF for that plate. The first format is the default and will be used whenever missed records are requested and the -L option is not used. If missed records are requested and the -L option is specified, missed records are exported with the lostcode value inserted into each user data field after field 7 (subject ID) - the trailing 3 system fields are exported with status 0, and the missed record creation and modification timestamps. The lostcode will also be used if the user specified fields using the -f and/or -U|G options along with the -J option and if one or more of those fields do not exist.

The only exception to this is when missed records are requested along with one of the field selection options (-f, -G, or -U). In this case, the second format is always used. If a substitution code has not been specified with the -L option, the generic DFdiscover default missing value code (*) is used, and a warning message is written to standard error, each time the substitution is made.

The lostcode specified with the -L option may consist of one or more alphanumeric characters. It is not permissible to use control characters. Also, the DFdiscover delimiter (|) cannot be used, unless data is being exported in CSV format with the -z option.

Note that specifying the -L lostcode option will not, by itself, cause missed records to be exported. This decision is controlled entirely by the status option. The -L option only indicates that the second format is desired and is to be used with the specified substitution code.

[Note]Argument parsing

Argument parsing in DFexport.rpc ignores extra, repeated delimiters. Any combination of space and comma is collapsed to one delimiter. For example:

DFRASTER,,,DFVALID = DFRASTER,DFVALID
DFRASTER  DFVALID = DFRASTER,DFVALID
DFRASTER ,,  ,,DFVALID = DFRASTER,DFVALID
,,DFRASTER ,,  ,,DFVALID = DFRASTER,DFVALID

are equivalent.

Field (variable) Selection Criteria

The following options allow fields (variables) to be selected from the output. With no options specified, all fields are output, unless -k has already been specified. With options specified, only the requested fields are output. If multiple options are specified, all of the requested fields are output, in the order requested.

-f #, #-#

field number. Specifying this option selects for output from each record only those fields requested by their number. Output fields can be re-ordered by specifying the field numbers in the desired order. Fields can be repeated within the option. This option cannot be repeated in the command-line.

Command-line referencing of field numbers using NF (last field) or relative to NF is also possible. Some examples of NF usage are as follows:

  • 2-NF.  include fields 2 to the last field inclusive

  • NF-1.  include the second last field

  • 5-NF-1.  include fields 5 to the second last field, inclusive

  • NF-5-NF-1.  include fields fifth from the last to the second last field, inclusive

-U fieldname_list, -G fieldname_list

field name. Specifying this option selects for output from each record only those fields requested by their field name. If -U is used, fields are selected by their alias variable name. If -G is used, fields are selected by their variable name. Field names can be repeated within the option. These options cannot be mixed or repeated.

-H header_list

field name headers. This option specifies the variable names to appear in the header (-h must also be specified) for new fields that are created as a result of one or more split modifiers.

Wherever a field number or field name is expected in field selection criteria, a constant value may also be referenced by inserting it in single quotes, as in 'AB'. The purpose of this is to insert the constant value in the specified field location into the output records. The 'value' notation allows constant values to be exported in much the same way that a variable value can be. Since it is possible for a constant value and variable name to be the same, a constant value must always use the 'value' notation. For example, the specification:

-G "date,'AD',DFRASTER"

requests the variable with the name date, followed by the constant value AD and the raster image ID number. The output might appear as:

99/06/17|AD|9924/00001001

If a blank field is being exported, it must be represented by 2 single quotes with nothing inside.

DFexport.rpc will ignore the special meaning of any characters that would otherwise be delimiters. For example, 'a,b' is the constant value a,b, not two separate constants.

If -h is present, the column name for each constant value will be the value itself, as in the following specification and output:

% DFexport.rpc -h -G "date,'AD',DFRASTER" 254 1 - | head -2
date|AD|DFRASTER
99/06/17|AD|9924/0001001

If -H is also present, the column name for each constant value will be replaced with the next -H value, if one is available. For example:

% DFexport.rpc -h -G "date,'AD',DFRASTER" -H "when" 254 1 - | head -2
date|when|DFRASTER
99/06/17|AD|9924/0001001

Date Modifiers

By default, date fields are output in the format defined for the field in the study data dictionary. If -c or -j is specified, the default output format is changed to 4-digit year format with imputation (-c) or julian format (-j). It is possible to override the default format for selected fields by following the field number or field name specification with a :c, :j or :o modifier. For example, the specification: -G "VisitDate, VisitDate:c, VisitDate:j" requests the variable with name VisitDate to be output in its default format, in calendar date format, and finally in julian format. The :c modifier applied to a field that already uses a 4 digit year format applies date imputation only. When using -c or -j to change the default output format for dates, note that there is no field level modifier that restores the original date format defined in the study data dictionary.

The :o modifier can be used in an analogous fashion to :c and :j. The effect of the :o modifier will be to cause the original date value to be output without any imputation.

A modifier cannot be applied to a non-date variable nor can it be applied to a range of field numbers or field names; it must be applied to a single field number or field name at a time.

Variable Decoding

It is possible to output the decoded label for a variable's value by appending the :d modifier to any variable that is defined with coding.

Example 3.31. Output the coded and decoded values for the sex variable

-G "sex, sex:d"

Any coded value that cannot be decoded is output as is.

The modifier cannot be applied to a non-coded variable nor can it be applied to a range of field numbers or field names; it must be applied to a single field number or field name at a time.

String Splitting

It is possible to split a string field into multiple, shorter string fields by appending a :numxcharscw modifier to the field number or field name specification. The modifier includes a specification of the number of fields to split the input string into (num), the maximum width in characters of each output field (chars), and whether or not splitting should be done on character (c) boundaries or (w) boundaries. If word boundaries are used, a word is delimited by any combination of space or tab characters. Word splitting will always split at the word boundary closest to but less than the maximum width. If there is no word boundary in the substring, the string will be split at the character boundary.

Example 3.32. Split a string field into 5 fields

This example splits the comments field into 5 200-character segments at character boundaries:

-G "comments:5x200c"


If a field to be split contains a missing value code, the first output field will contain the complete missing value code and the remaining fields will be blank.

If a field is split to create additional fields and a header record is requested with -h, field names for the new fields are identical to the original field name unless -H is specified. If -H is given, the field names for the new fields are assigned in order from the option list. If the option list contains fewer names than there are new fields, the remaining fields are assigned names identical to their original field names.

Extracting Sub-Strings

New fields can be created consisting of substrings of database fields. The following example creates a data field from the 4th and 5th characters of field 22.

-f 22:x4.2

Substrings can be extracted from any field type: strings, dates, or numbers. Substrings are extracted from the string value of the field. Numeric fields will be zero-padded to their store width before the substring extraction is done. In the following example, the first three digits of the subject ID field are extracted.

-f 7:x1.3

Comma Separated Variables (CSV) Format

CSV is a popular format used for sharing data records between different software programs. By selecting the -z option, DFexportrpc; will generate output records that are compliant with this format. The requirements of the format are:

  • The record delimiter is a newline character (this is also true of records exported in the default, non-CSV format).

  • Each field within a record is separated by a comma.

  • Leading and trailing spaces adjacent to comma separators are ignored.

  • Fields containing commas as part of their value are enclosed in double quotes.

  • Fields containing double quotes are enclosed within double quotes, and the embedded double quotes themselves are each represented by a pair of consecutive double quotes.

  • Fields containing embedded line breaks are enclosed within double quotes.

  • Fields with leading or trailing spaces are enclosed in double quotes.

The first record in the CSV output file may consist of a header record containing field names. Each field in the header will also be comma-delimited and follow the requirements above.

Exit Status

DFexport.rpc exits with one of the following statuses:

0

The command was successful.

24

The user does not have the necessary permission to execute the command.

31

The requested plate does not exist in the database.

36

The required command-line arguments were not present or were incorrectly specified.

> 0

The command failed because the database server could not be contacted, or communication with the database server failed.

Examples

Example 3.33. Export all records from plate 1 of study 255 to standard output

% DFexport.rpc -s all -h 255 1 -
DFSTATUS|DFVALID|DFRASTER|DFSTUDY|DFPLATE|DFSEQ|PID|INIT|VDATE|DFSCREEN|DFCREATE|DFMODIFY|
1|1|9807/1234567|255|1|0|99001|SCL|98/01/25|1|98/02/10 12:34:12|98/02/12 12:34:12|
2|4|9811/0005001|255|1|1|99002|RRN|98/02/12|2|98/02/10 15:03:34|98/03/01 11:23:14|
5|2|9831/0004012|255|1|1|99002|RRN|98/12/12|2|98/07/02 13:45:20|98/07/05 09:21:44|
1|3|9809/0044002|255|1|0|99003|*|98/02/03|1|98/02/10 14:23:01|98/02/10 14:23:01|

Example 3.34. Export, with header, all primary records at validation levels 1-2 from plate 1 of study 255 to standard output

% DFexport.rpc -h -s primary -v "1-2" 255 1 -
DFSTATUS|DFVALID|DFRASTER|DFSTUDY|DFPLATE|DFSEQ|PID|INIT|VDATE|DFSCREEN|DFCREATE|DFMODIFY|
1|1|9807/1234567|255|1|0|99001|SCL|98/01/25|1|98/02/10 12:34:12|98/02/10 12:34:12|
1|3|9809/0044002|255|1|0|99003|*|98/02/03|1|98/02/10 14:23:01|98/02/10 14:23:01|

Example 3.35. Export the first 3 and the 7th fields from plate 1 to standard output

% DFexport.rpc -f "1-3,7" 255 1 -
1|1|9807/1234567|99001
2|4|9811/0005001|99002
5|2|9831/0004012|99002
1|3|9809/0044002|99003

Example 3.36. Export the VDATE date field in 4-digit year format and export the subject initials in 3 single-character fields. Include the header record and define names for the newly created fields

% DFexport.rpc -c -G "VDATE,INIT:3x1c" -H "middle,last" 255 1 -
VDATE|INIT|middle|last
1998/01/25|S|C|L
1998/02/12|R|R|N
1998/12/12|R|R|N
1998/02/03|*||

Example 3.37. Export the primary records for subject IDs 99001 and 99002 selecting the fields from DFSTUDY to VDATE inclusive

% DFexport.rpc -s primary -I "99001,99002" -G "DFSTUDY-VDATE" 255 1 -
255|1|0|99001|SCL|98/01/25
255|1|1|99002|RRN|98/02/12

Example 3.38. Show all forms of manipulation for a partial date value in the variable DateCompleted1 for study 251

% DFexport.rpc -j -G "DateCompleted1,DateCompleted1:c,DateCompleted1:o" 251 1 -
2450845|1998/02/01|98/02/00
2451297|1999/04/29|99/04/29

Example 3.39. Create a DFdiscover Retrieval File (ID99001_plate10.drf) for study 254, all primary records for plate 10, for subject ID 99001

% DFexport.rpc -f "7,6,5" -I 99001 254 10 ID9901_plate10.drf

The following is the contents of the file ID9901_plate10.drf.

99001|1|10
99001|2|10
99001|3|10
99001|6|10
99001|9|10
99001|12|10

Example 3.40. Export all primary records for plates 1-3, 7 and 8 to standard output

% DFexport.rpc -f "7,6,5,3" 254 "1-3,7,8" -
99001|0|1|0915/000T001
99003|0|1|0915R000S001
99004|0|1|0915/000V001
99001|1|2|0915/000T002
99004|1|2|0915/000V002
99001|1|3|0915/000T003
99004|1|3|0915/000V003
99005|1|3|0000/0000000
99001|30|7|0915/000T009
99004|30|7|0915/000V009
99001|51|8|0915/000T010

When exporting multiple plates in one invocation, the outputs are always sorted in ascending order of the plate numbers, for example if the plate argument is changed to "7,8,3-1", the same output will be seen.


Example 3.41. Export all primary records for all plates in a study and create separate text files for each set of plate data.

% DFexport.rpc -e -f "7,6,5,3" 254 all Study254_

The following are the output files created by the above command

Study254_000.txt
Study254_001.txt
Study254_002.txt
Study254_003.txt
Study254_004.txt
Study254_005.txt
Study254_006.txt
Study254_007.txt
Study254_008.txt
Study254_009.txt
Study254_010.txt
Study254_011.txt
Study254_020.txt
Study254_501.txt
Study254_510.txt
Study254_511.txt

As seen above, all plates are exported in one invocation of the command. Special reserved plates are also included when the keyword 'all' is specified for the plates argument. A three digit, zero padded, plate number is also appended to the end of the outfile name, and is optionally followed by a file extension.


Example 3.42. Export, in CSV format, a subset of fields for plate 3 of study 254 and write the results to standard output

% DFexport.rpc -s all -z -f 1-7,59,63-66 254 3 -
2,1,9807/0047003,254,3,1,99001,,0,0,"knee surgery, hip replacement",2
1,1,0347R0012001,254,3,1,99101,"""other"" surgery",1,1," carotid   endarterectomy  ",2

Note that field values containing commas (knee surgery, hip replacement) are enclosed within double quotes. Fields containing double quotes (such as "other" surgery) are enclosed within double quotes and the embedded double quotes themselves are represented by a pair of consecutive double quotes.


Example 3.43. Export only missed records for plate 1, inserting the missing value code "NA" into the fields of each missed record exported. Export the results to standard output.

% DFexport.rpc -s missed -L "NA" 254 1 -

0|7|0000/0000000|254|1|0|20100|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|0|2018/01/15 12:35:23|2018/01/15 12:35:23
0|7|0000/0000000|254|1|0|20101|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|NA|0|2018/01/15 12:35:23|2018/01/15 12:35:23

Note that missing value code insertion is only performed for user-defined fields after field 7 (subject ID).


Limitations

DFexport.rpc and DFexport are the recommended ways to export data records from the study database for subsequent examination or analysis. Never read the data files directly as they may contain old copies of records scheduled for removal.

Since DFexportrpc; may only be run server-side by an authenticated user, it does not enforce user permissions as strictly as DFexport does. For example, DFexportrpc; ignores the Hidden/Masked field property while exporting field values.



[8] Data written to standard output will also include "inline" header metadata if either of the -h or -x options are specified.