DFexport

DFexport — Client-side, command-line interface for exporting data by plate, field or module; exporting change history; or exporting components of study definition

Synopsis

DFexport [-S server] [-U username] [-C password] [ [-Mname moduleName] | [-Mnum moduleID] | [-Pnum [ [#] | [qc] | [reason] | [new] | [missed] ] ] ] [-status #, #-#] [-level #, #-#] [-visit #, #-#] [ [-site #, #-#] | [-subject #, #-#] | [-alias string] ] [-plate #, #-#] [-create list] [-modify list] [-resolve list] [ [-Pattern string] | [-pattern string] ] [-expr string] [ [-ALL criteria] | [-ANY criteria] ] [ [-Fnum #, #-#] | [-Fname fieldname_list] | [-Falias fieldalias_list] ] [ [-joinALL plt#[moduleFields]...] | [-joinANY plt#[moduleFields]...] ] [-usealias] [ [-c] | [-j] ] [-d] [-p] [-z] [-h] [-D] [-H header_list] [-L lostcode] [ [-w] | [-a] ] [-o [ [outfile] | [-] ] ] [-e errlog] {study#}

DFexport also has a descriptive mode for the output of schema information.

DFexport [-S server] [-U username] [-C password] [-listfields] { [ [-listmodules] | [-listplates] ] [ [plt#,plt#~plt#] | [all] ] } {study#}

DFexport has a history mode for the output of change history related to a subject, visit, or plate.

DFexport [-S server] [-U username] [-C password] {-history} {-subject #} [-visit #] [-plate #] {study#}

Options and Description

DFexport exports data from an individual study database. Minimally, the user must specify the study database and also at least one of the plates or modules containing the data of interest. Data records are exported in ASCII plain text format to a specified file, or the command-line output.

Differences and Similarities compared to DFexport.rpc

DFexport and DFexport.rpc share many features yet they are also unique in several important ways.

  • DFexport is available as both a server-side and client-side application; DFexport.rpc is server-side only.

  • DFexport is able to export descriptive information from a study schema with the -listfields, -listmodules and -listplates options.

  • DFexport is able to export, with the -history option, the history of changes for a subject, visit within subject, or plate within visit of a subject.

  • DFexport includes support for joining data fields from the same module definition that are instanced over one or more plates.

  • DFexport is able to export all records marked as missed in one command execution using the -Pnum missed option. In DFexport.rpc this would require looping through each of the plates individually, selecting missed records at each step of the loop.

  • User permissions for selecting by validation level are enforced in DFexport.

  • Selecting queries and reasons by creation and modification date is supported in DFexport; in DFexport.rpc only data records may be selected in this way.

  • Many options are specified using the same notation as DFexport.rpc, while others are specified using unique notation.

  • It is possible for DFexport and DFexport.rpc to produce different results for the same selection criteria. DFexport enforces user permissions for the user specified with the -U username option or the DFUSER environment variable. DFexport.rpc uses the UNIX login name of the command-line.

General Approach

DFexport can export data in a plate-centric manner, like DFexport.rpc, using -Pnum, or in a module-centric manner that is unique to DFexport using -Mname or -Mnum. Many options are available to create data sets for a wide variety of needs. To get the most out of DFexport it can be useful to approach it's use with the following in mind:

  1. Preparation.  Access to exported data is available only to authenticated users with appropriate database permissions, Authentication and Database Permissions. Plate, module and field name information is available from DFexport itself, Schema Listings.

  2. Where is the data?  Specifying the source of the data, whether it is plate-based or module-based, is required, Data Source.

  3. Do the data records need to be filtered?  In many cases not all of the rows in a data set are of interest and need to be filtered further, Record Selection by Keys and Filters.

  4. Data Fields.  By default, all of the data fields in the specified data source will be exported. It is possible to specify a subset of the fields to export, Extracting/Combining/Concatenating Fields for Output. It is also possible to add field, module, plate, visit, image, site, and study metadata (including user-defined properties) to the export data, Including Metadata in Output.

  5. Output Formatting.  For export, the appearance of data fields can be modified without modification of the data source, Output Formatting.

  6. Output Destination.  Finally, the export data is written to a destination, Output Options.

Authentication and Database Permissions

Authentication.  To authenticate, DFexport requires the username and password for connection to a specific database server. These may be supplied as:

  1. command-line options, the user can specify -S servername, -U username and -C password when the program is run, or

  2. environment variables, the user can set the variables DFSERVER servername, DFUSER username and DFPASSWD password before the program is run.

Specification of command-line options takes priority if both command-line options and environment variables are supplied. Refer to User Credentials for more information.

Database Permissions.  Subsequent to authentication with a specific database server, use of DFexport is allowed (or disallowed) by the Server - Export Data permission, or the intersection of both DFexplore - List view and DFexplore - Print/Save Data permissions. These permissions are granted to a role (and then user) by an administrator on a study-by-study basis (see System Administrator Guide, Server). An authenticated user with one of these role permissions will then be able to export data records from the set of site and subject IDs granted by their user permissions, which may further be limited by visit, plate and level restrictions associated with the user's role. DFexport provides no warning that some records cannot be exported due to permission restrictions as this would itself provide information about the existence of restricted data records. For roles without 'Show Hidden Fields' permission and fields with the Hidden or Masked property enabled, DFexport will output an empty (NULL) value when exporting such data fields, and it will also skip over reasons and queries attached to such fields.

Schema Listings

This feature is unique to DFexport and is not available in DFexport.rpc. DFexport is able to output database schema information. The available information includes plate numbers and module names, and may optionally also include field number, name, alias and data type for data fields in the requested plates or modules. This descriptive information can be useful reference material when a user wishes to further refine an export.

To obtain a listing of all defined plate numbers, use -listplates all. [1] To obtain a listing of all defined module names, use -listmodules all. In both cases:

  • the output is a space-delimited list of values (either plate numbers or module names) matching the requested criteria,

  • a subset can be specified by replacing all with individual, lists, or ranges of plate numbers in the format #, #-#.

Example 3.15. List all module names used in plates 100 through 200

%  DFexport -listmodules 100-200 10
Chemistry Eligibility Enrollment Header LabData PhysicalExam SocialImpact Urinalysis

The output is sorted by module name and each module is listed exactly once, even when instanced more than once. Modules that are not instanced on one of the selected plates are not included in the output.


Information about data fields in the specified plates or modules can be requested by additionally including the -listfields option. This option may only be used in conjunction with the -listmodules and -listplates options.

Example 3.16. List schema information for data fields of all plates

%  DFexport -listfields -listplates all 253
Plate 001: Blood Pressure Screening Visits
 Field  Name                       Alias                      Type
 -----  -------------------------  -------------------------  -------
    1   DFSTATUS                   DFstatus1                  Choice
    2   DFVALID                    DFvalid1                   Choice
    3   DFRASTER                   DFraster1                  String
    4   DFSTUDY                    DFstudy1                   Number
    5   DFPLATE                    DFplate1                   Number
    6   DFSEQ                      DFseq1                     Number
    7   ID                         ID001                      Number
    8   PINIT                      PINIT001                   String
    9   AGE                        AGE                        Number
   10   SEX                        SEX                        Choice
   11   RACE                       RACE                       Choice
   12   RACEOTH                    RACEOTH                    String
   13   S1DATE                     S1DATE                     Date
...

This is the beginning of the output - actual output would be much longer as the user has requested this information for all defined plates. Field information is included both for user-defined fields and the fixed DFdiscover fields.


Example 3.17. List schema information for data fields of all modules

%  DFexport -listfields -listmodules all 253
Module: BloodPressure (ID=5022): Blood Pressure Readings
 Field  Name                  Alias                 Type     Used in Plates
 -----  --------------------  --------------------  -------  ------------------
    1   DFSTATUS              DFSTATUS              Choice   
    2   DFVALID               DFVALID               Choice   
    3   DFRASTER              DFRASTER              String   
    4   DFSTUDY               DFSTUDY               Number   
    5   DFPLATE               DFPLATE               Number   
    6   DFSEQ                 DFSEQ                 Number   
    7   DFPID                 DFPID                 Number   
    8   DFMNAME               DFMNAME               String   
    9   DFMID                 DFMID                 Number   
   10   DFMREF                DFMREF                Number   
   11   SBP1                  SystolicReading1      Number   1,5
   12   SBP2                  SystolicReading2      Number   1,5
   13   SBP3                  SystolicReading3      Number   1
   14   DBP1                  DiastolicReading1     Number   1,5
   15   DBP2                  DiastolicReading2     Number   1,5
   16   DBP3                  DiastolicReading3     Number   1
   17   SDAT                  ReadingDate           Date     1
   18   BPARM                 BPwhichArm            Choice   5
...

This is the beginning of the output - actual output would be much longer as the user has requested this information for all defined modules.

The module listing provides the module name, BloodPressure (useful in conjunction with -Mname), the module number, 5022 (useful in conjunction with -Mnum) and the field numbers and names defined in the module, DFSTATUS, ... (useful in conjunction with -Fname or -Fnum).


History of all Changes

This feature is unique to DFexport and is not available in DFexport.rpc. DFexport is able to output the history of all changes made to the data for a specific subject, visit or plate. The available information includes when the change was made, who made the change, what was changed, any reason associated with the change and any queries related to the data. The output is always to a file, in Excel format, and hence option -o outfile.xlsx is required.

To obtain a history listing use -history. The history of changes is specific to a single subject, and includes all data, query and reason changes for all data elements in records idenitfied by the subject ID. The output can be further filtered by visit, -visit #, and plate, -plate # .

DFexport is designed to export history of changes for exactly one subject. To export history for multiple subjects, DFaudittrace is available.

Data Source

Specifying the data source for the records to export is required. Data can be exported by plate number, using the -Pnum option, or by module, using either the -Mname or -Mnum option. Within any of these data sources, the default behavior is to export all of the defined fields. It is possible to further select the exported fields using one of the -Fname, -Falias or -Fnum options. Field name, alias and number information can be obtained from a previous invocation that includes the -listfields option.

-Pnum #

Plate number. This is equivalent to the required plate number in DFexport.rpc. Exactly one plate number value may be specified. The value is the actual plate number or from this list of keywords:

  • qc: export all queries

  • reason: export all reason records

  • new: export all records awaiting first validation

  • missed: export all records marked as missed across the entire database

If a plate selector is not given, the data source must be specified by module name or module number.

-Mname or -Mnum

Module name or module number. Export data from the specified module. Exactly one module name or module number may be specified. Schema listing options -listfields -listmodules may be used to determine the module name or number of interest.

If a module selector is not given, the data source must be specified by plate number.

Record Selection by Keys and Filters

There are many options available for selecting subsets of data records for export. Several of these options are similar or identical to their counterparts in DFexport.rpc. [2] In all cases, record selection and filtering is from the set of records allowed by the user's database permissions.

The available selection mechanisms and filters are:

-level #, #-#

validation level. By default, records at all validation levels permitted by the user permissions are exported. This option further selects a subset of those records, namely those having a validation level that falls within the specified range.

-site #, #-#

site number. This option selects records by site number, site numbers, site number range or any combination thereof. The site number of any data record is derived by dereferencing the subject ID in the study's sites database.

This option may not be combined with -subject #, #-# or -alias string - doing so will cause DFexport to exit with an error message.

-subject #, #-#

subject ID. This option selects records by subject ID, subject IDs, subject ID range or any combination thereof.

This option may not be combined with -site #, #-# or -alias string - doing so will cause DFexport to exit with an error message.

-alias string

subject alias. This option selects records for a single subject alias.

This option may not be combined with -subject #, #-# or -site #, #-# - doing so will cause DFexport to exit with an error message.

-visit #, #-#

visit/sequence number. This option selects records by visit number, visit numbers, visit number range or any combination thereof.

-status status

record status. Select records by status keyword or status numeric value. The recognized status keywords are final, incomplete, pending, missed, all. [3] The default is all statuses, using keyword all, or by excluding this selector. A single status keyword or multiple, comma-delimited status keywords can be specified but it is not possible to specify a range of statuses. It is also possible to reference a status by it's numeric value, and in that case a range of numeric values may be specified.

If plate 511 is exported, the available status keywords are the same but their meaning (as related to query status) is translated by the table:

Table 3.7. Numeric value, record status keyword, query status and reason status equivalence

Numeric valueRecord status keywordQuery status equivalentReason status equivalent
1finalnewapproved
2incompletein unsent reportrejected
3pendingresolved NApending
4secondary [a] resolved irrelevant-
5secondary [b] resolved corrected-
6secondary [c] in sent report-

[a] For backwards compatibility, old keyword CLEAN is supported.

[b] For backwards compatibility, old keyword DIRTY is supported.

[c] For backwards compatibility, old keyword ERROR is supported.


-plate #, #-#

plate number. Select from module instances, queries, new, missed or reason records by plate number.

It is important to note that this is not the same as the required plate number selector in DFexport.rpc. In that environment, the plate number selects the data file source for the export. In DFexport, this option filters the data source which may be modules, complete data records, queries, reason, new or missed records.

Example 3.18. Export all missed records in plates 1 through 10

%  DFexport -S server -U username -Pnum missed -plate 1-10 -o - 254

Example 3.19. Export all ESIG modules in plates 100 through 110 and 200

%  DFexport -S server -U username -Mname ESIG -plate 100-110,200 -o esigmodules.txt 123

The output, from study 123, is written to the file esigmodules.txt in the current directory.


-create yy/mm/dd-yy/mm/dd

creation date. Select data, query or reason records by their creation date. The keyword today can be used to include records that were created today.

-modify yy/mm/dd-yy/mm/dd

modification date. Select data, query or reason records by their last modification date. The keyword today can be used to include records that were last modified today.

-resolve yy/mm/dd-yy/mm/dd

resolution date. Select query records by their resolution date. Queries that are not yet resolved will not have a resolution date and hence will always be excluded if this selection filter is used. The keyword today can be used to include query records that were resolved today.

-Pattern|pattern string

Filter records to export only those that match the specified pattern string. For inclusion in the export, a fragment of the record must exactly match the entire pattern string. The -Pattern option requests a case sensitive match, while -pattern requests a case insensitive match. [4]

-expr

Include only records that match the expression. The expression has the same format and meaning as the expression builder in DFexplore. For a complete description of the available expression syntax refer to DFexplore User Guide, List ViewSearching Data Records SearchingDataRecordsSearching Data Records.

-ALL, -ANY

Specify one or more, simplified filter criteria. Each filter references fields from one plate and multiple filters may be combined to include fields from other plates. The result of these selection criteria is a set of subject IDs matching all of the criteria (-ALL) or at least one of the criteria (-ANY). That set of subject IDs is then used to further filter the output.

Each criterion must be specified using the notation:

plate#:$(fieldname) op value

where:

  • plate# is a plate number from the set of plate numbers defined for the study, followed by a : separator,

  • $(fieldname) is the name of a user data field defined for the plate number,

  • op is an operator from the set of operators: >, >=, ==, !=, <, <=, and

  • value is a single value consistent with the data type of the field (string and data values should be enclosed in "" to prevent interpretation by the command environment).

To specify multiple criteria use | as a delimiter. The -ANY and -ALL options specify the resulting match behavior: with the -ALL option, every criterion must evaluate to true for a match to occur; otherwise, at least one criterion must evaluate to true.

Example 3.20. Export eSignature records for eligible, african-american males

The SEX (1=male) and RACE (2=african-american) are defined on plate 1 while ELIG (1=yes) is defined on plate 2.

-Mname eSignature -ALL '1:$(SEX) == 1 | 1:$(RACE) == 2 | 2:$(ELIG) == 1'

The fields used in the criteria are, by default, not part of the output, nor are they required to be. In fact, unless the criteria are all from the same source, this is not possible with this option.

Extracting/Combining/Concatenating Fields for Output

Frequently, only a subset of the fields in the data source are of interest. These can be selected using one of the -Fnum, -Fname or -Falias options. Special handling of data fields that are defined in a module which is then instanced across more than one plate is required.

-Fnum #, #-# Select by field number. From each exported record, select for output only those fields requested by their number. Output fields can be re-ordered by specifying the field numbers in the desired order. Field numbers can be repeated. It is possible to request field numbers using NF notation which selects by field number relative to the last field of the record. Some examples of NF usage are:

  • 2-NF.  field numbers 2 to the last field inclusive

  • NF-1.  the second last field

  • 5-NF-1.  fields 5 to the second last field, inclusive

  • NF-5-NF-1.  fields fifth from the last to the second last field, inclusive

This option may not be combined with -Fname or -Falias - doing so will cause DFexport to exit with an error message.

-Falias fieldname_list, -Fname fieldname_list Select by field name or field alias. From each exported record, select for output only those fields requested by their name or alias. Output fields can be re-ordered by specifying the field names or aliases in the desired order. Field names/aliases can be repeated. This option may not be combined with -Fnum - doing so will cause DFexport to exit with an error message. Similarly, only one of -Falias or -Fname may be specified - any other specification or combination will cause DFexport to exit with an error message.

-joinANY, -joinALL The default behavior of DFexport is to export data from one source, a plate or a module. However special handling is needed in the cases where a module instance is distributed across multiple plates. With the -joinALL and -joinANY options it is possible to combine and export data from a data source that is defined across plates in one invocation and into one output file. The join keys (the "join key triple") are always the subject ID, the sequence number and the module reference instance number and must match across data sources. These keys are always exported as the first three fields in each output record when joining occurs. Multiple data sources are specified using | as a delimiter. The specification for a single data source uses the notation:

plate#[fieldlist]

where:

  • plate# is a plate number from the set of plate numbers defined for the study,

  • fieldlist is a list of one or more comma-separated field numbers or names, enclosed with [ and ].

The -joinANY and -joinALL options specify the resulting output behavior: with the -joinALL option, the database must contain a data record for every data source; otherwise, no data is output for that join key triple. Conversely, specifying -joinANY will export data so long as the join key triple appears in at least one data source.

Example 3.21. Re-construct the demographics module for output

The DEMOGRAPHICS module is instanced with fields on plates 1 and 2. Export the data, with a specific field ordering and some output formatting.

-Mname DEMOGRAPHICS -joinANY '1[PINIT, SEX] | 2[RACE, RACE:d, RACEOTH] | 1[AGE, DOB:c]'


Concatenation with + By default, each selected field is output with optional formatting (see the section called “Output Formatting” and is separated from adjacent selected fields by the delimiter. Alternatively, selected fields can be concatenated, output with no delimiter at all. To specify concatenation, use the special concatenation symbol, +, between fields to concatenate. For example, to concatenate the site identifier and the subject identifier, use

DFSITE_ID+SUBJID

or to concatenate fields 9 and 12, use

9+12

It is also possible to concatenate a range of fields. In this case, concatenation has precedence over field range so,

8+9-12

or

8-11+12

or

8+9+10+11+12

are all equivalent. Note that there is no notation to concatenate a range of fields by specifying only the minimum and maximum field numbers; at least one of the numbers must be inidividually specified so that the concatenation operator can be used.

Including Metadata in Output

Study metadata is available for export as columns in each output record. Metadata is specified for inclusion in the output by inserting metadata keywords in the list of fields for export. In all cases, the exported value has the string data type.

The available metadata keywords are grouped by 6 categories: field, page, visit, image, site and study.

  1. field.  These metadata keywords reference field definition properties of a data field. For these keywords only, the keyword is appended to a field name to request a specific property of that specific data field. For example, MHENDAT.DFVAR_PROMPT requests the DFVAR_PROMPT property (the field prompt from the study definition) of the MHENDAT data field. User-defined

    The field properties that are available are:

    • DFVAR_DESC: field description

    • DFVAR_TYPE: field type

    • DFVAR_PROMPT: field prompt

    • DFVAR_UNITS: field units

    • DFVAR_COMMENT: field comment

    • DFVAR_MODNUM: module instance number containing field (Module Instance in setup definition)

    • DFVAR_MODNAME: module name containing field (Module Name in setup definition)

    • DFVAR_MODDESC: module description containing field (Module Description in setup definition)

    • DFVAR_USER#: field user-defined property value, where # is 1-20. User-defined property tags may be used in place of the default name

  2. module.  These metadata keywords reference properties of the module instance.

    • DFMODULE_NAME: module name (Module Name in setup definition)

    • DFMODULE_DESC: module description (Module Description in setup definition)

    • DFMODULE_USER#: module user-defined property value, where # is 1-20. User-defined property tags may be used in place of the default name

  3. plate.  Metadata keywords for a plate include DFPLATE_DESC (plate description) and DFPLATE_USER# (plate user-defined property value, where # is 1-20. User-defined property tags may be used in place of the default name).

  4. page.  These metadata keywords reference page properties of the plate associated with the current data record. There is currently one page property available: DFPAGE_LABEL. The value is the descriptive label of the page. This label is taken from DFpage_map if there is a matching entry for the combination of visit and plate (field number substitution is also performed if requested with the # or #:d notation); otherwise, it is the plate description taken from the study database definition.

  5. visit.  These metadata keywords reference visit properties of the visit associated with the current visit. The current visit is determined by the visit number of the data record, even if the visit number is not included in the export. The visit properties that are available are: DFVISIT_DATE, DFVISIT_TYPE, DFVISIT_ACRONYM, DFVISIT_LABEL, DFVISIT_DUE, DFVISIT_OVERDUE, DFVISIT_REQUIREDPLATES, DFVISIT_OPTIONALPLATES, DFVISIT_ORDERPLATES and DFVISIT_MISSEDPLATE. The value of each property is detailed in Study Setup User Guide, Visit Map.

  6. image.  Each data record has an image attribute, which will have a value if there is an image associated with the data record. For those records, the image properties that are available are:

    • DFIMAGE_ARRIVAL: arrival date/time of the image

    • DFIMAGE_FIRSTARRIVAL: arrival date/time of the image; if there were multiple images over time, the chronologically first image

    • DFIMAGE_LASTARRIVAL: arrival date/time of the image; if there were multiple images over time, the chronologically last (most recent) image

    • DFIMAGE_FORMAT: image format

    • DFIMAGE_SENDER: sender ID for the document that included this image

    • DFIMAGE_PAGES: number of pages in the document that included this image

  7. site.  The site metadata associated with each data record can be determined by connecting the subject ID of the data record with the site record that includes the subject ID in the list of subjects. The site properies that are available are: DFSITE_ID, DFSITE_NAME, DFSITE_CONTACT, DFSITE_ADDRESS, DFSITE_FAX, DFSITE_PHONE, DFSITE_INVESTIGATOR, DFSITE_SUBJECTS, DFSITE_TEST, DFSITE_COUNTRY, DFSITE_BEGINDATE, DFSITE_ENDDATE, DFSITE_ENROLL, DFSITE_PROTOCOL1, DFSITE_PROTOCOLDATE1, DFSITE_PROTOCOL2, DFSITE_PROTOCOLDATE2, DFSITE_PROTOCOL3, DFSITE_PROTOCOLDATE3, DFSITE_PROTOCOL4, DFSITE_PROTOCOLDATE4, DFSITE_PROTOCOL5, DFSITE_PROTOCOLDATE5 and DFSITE_REPLYTO. The value of each property is detailed in Programmer Guide, DFcenters - sites database.

  8. study.  The study metadata is static for the entire study, it is defined at the study database level and does not change for different data record keys. The study properies that are available are: DFSTUDY_NUMBER (study number), DFSTUDY_NAME (study name) and DFSTUDY_YEAR (4-digit year of study start, defined by study admin as a global setting). In addition, DFSTUDY_USER# provides the study user-defined property value, where # is 1-20. User-defined property tags may be used in place of the default name.

Output Formatting

Date Modifiers.  By default, date fields are output "as is" in their defined format. The options -c and -j may be used to cause imputation and to alter/normalize that defined format for all date fields that are exported. The -c (calendar format) option requests that any 2-digit year be replaced with a 4-digit year and imputation be applied. The -j (julian format) option requests that imputation be applied and dates be exported as their equivalent julian value - this is primarily useful for mathematical calculations and comparisons with other date values. On an individual field basis, it is also possible to override the defined format with field level date modifiers. By following the field number, name or alias specification with a :c or :j modifier, it is possible to export the same calendar or julian format result for a single field. The :o modifier requests the "original" date value, and can be useful where -c or -j has been previously specified. DateField and DateField:o produce the same result. Similarly, DateField:c also produces the same result if the field's format already specifies 4-digit years and no imputation is needed.

Example 3.22. Calendar and Julian Date Modifiers

For the field named VisitDate, export the value in default, original, calendar and julian formats.

-Fname "VisitDate, VisitDate:o VisitDate:c, VisitDate:j"

If VisitDate is defined with "imputation to the beginning" and an instance of that field in a data record has the data value 16/01/00, this specification will produce 16/01/00|16/01/00|2016/01/01|2456963.


A field level date modifier cannot be applied to a non-date field nor can it be applied to a range of field numbers, names or aliases; it may only be applied to a single field number, name or alias at a time.

Label Decoding.  For any field that is defined with coding, it is possible to output the decoded label for a field's value by appending the :d modifier to the field number, name or alias.

Example 3.23. Output coded and decoded values

-Falias "sex, sex:d"
2|female


Any coded value that cannot be decoded is output as is. The modifier cannot be applied to a field that has no coding nor can it be applied to a range of field numbers, names or aliases; it must be applied to a single field number, name or alias at a time.

Header Record.  Specifying -h forces the inclusion of a descriptive record, the header, as the first row of output. If -h is specified with -D, the header contains each field's description. Otherwise, if -h is specified with -Fname, the header contains each field alias. If -h is combined with -Fnum, or no field selection criteria are needed:

  • the header contains each field alias if the data source is a plate, as specified with -Pnum

  • the header contains each field name if the data source is a module, as specified with -Mname or -Mnum.

During export it is possible to create one or more new data fields by including options to string split (see String Splitting), extract substrings (see Extracting Sub-Strings) or include fixed, literal values (see Literal, constant values). If -h is specified then these new fields also need names to appear in the header record. This is done with the -H header_list option. This option specifies the variable names to appear in the header record for any new fields. If header_list contains fewer names than there are new fields, the remaining fields are assigned names identical to their original field names.

String Splitting.  It is possible to split a string field into multiple, shorter string fields by appending a :numxcharscw modifier to a field number, name or alias selection. The modifier includes a specification of the number of fields to split the input string into (num), the literal x as a separator, the maximum width in characters of each output field (chars), and whether or not splitting should be done on character (c) boundaries or word (w) boundaries. If word boundaries are used, a word is delimited by any combination of space or tab characters. Word splitting will always split at the word boundary closest to but less than the maximum width. If there is no word boundary in the substring, the string will be split at the character boundary.

Example 3.24. Split a string field into 5 fields

This example splits the comments field into 5 200-character fields at character boundaries:

-Fname "comments:5x200c"


If a field to be split contains a missing value code, the first output field will contain the same missing value code and the remaining fields will be blank. If a field is split to create additional fields and a header record is requested with -h, field names for the new fields are identical to the original field name unless -H is also specified. If -H header_list is given, the field names for the new fields are assigned in order from the header_list.

Extracting Sub-Strings.  New fields can be created from substrings while exporting fields. The modifier notation :xS.L indicates that a substring is being extracted (x), starting at character position S (the first position is 1) and having length L characters. The modifier can be applied to any field number, name or alias selection.

New fields can be created from substrings while exporting fields. The modifier notation :xS.L indicates that a substring is being extracted (x), starting at character position S (the first position is 1) and having length L characters. The modifier can be applied to any field number, name or alias selection. Substrings can be extracted from any field type: string, date, time or numeric. Substrings are extracted from the string value equivalent of the field. Numeric field values are leading zero-padded to their store width before substring extraction is performed. [5]

Example 3.25. Digit extraction from subject ID

In some settings, the subject ID is a concatenation of site ID and a numeric subject identifier. In this example, the leading three digits are the site ID and the trailing four digits. Given PID values of 50001, 60401 and 1232003, the extraction would yield these results.

-Fname "PID:x1.3,PID:x4.4"
005|0001
006|0401
123|2003


Literal, constant values.  A constant value may be inserted into the data export by including it in one of the -Fnum, -Fname or -Falias specifications. The value must always be enclosed with single-quotes, as in 'AB', to prevent any confusion with a matching field name or alias. To insert a blank field into the export, it must be represented by 2 adjacent single quotes, specifically ''.

Example 3.26. Insert AD into the data export

-Falias "EnrollDate,'AD',SubjInit"
2014/12/12|AD|STN
2013/06/15|AD|PRP
2015/03/12|AD|KKW

The constant value AD is inserted between the EnrollDate and SubjInit fields in every exported record.


Enclosed by single-quotes, DFexport ignores the special meaning of any characters that would otherwise be delimiters. For example, 'a,b' is the constant value a,b, not two separate constants. If -h is present, the name used in the header record is the value itself. If -H header_list is also present, the name used in the header record for each constant value will be the next value from header_list, if one is available; otherwise, the value itself will again be used.

Example 3.27. Specify a field name for the header record

-h -H "When" -Falias "EnrollDate,'AD',SubjInit"
EnrollDate|When|SubjInit
2014/12/12|AD|STN
2013/06/15|AD|PRP
2015/03/12|AD|KKW


Trailing Field Delimiter.  For backwards compatibility, exported data records can be terminated by a trailing | if one is not already present. This is specified with the -p option. [6] With the trailing delimiter, exported data records maintain the original DFdiscover record format so that they can be, for example, easily re-imported into DFdiscover in a subsequent step.

Missed Record Handling.  Missed records are include in the export if either -status all or -status missed are specified. To export missed records, there are two output options:

  1. Internal to DFdiscover, missed records are handled in a different structure than other data records (see plt###.dat - missed records and Table 2.7, “Missed record field descriptions” for further information). Exporting them in this structure, when intermingled with plate- or module-based data records, does not result in normalized records - most records have the data structure, while some have the missed record structure. The result is that these records can be difficult to work with post-export.

  2. By specifying the -L lostcode option, DFexport will export missed records by matching the structure of the corresponding plate and inserting lostcode into all of the user-defined data fields. This results in exported records that are normalized and all share the same structure. Post-export, missed records can be identified by the missed status in the status field, or the lostcode in all of the user-defined data fields.

When missed records are exported along with one of the field selection options -Fnum, -Fname, or -Falias, the second output option is always used. If the option -L lostcode has not been specified, the standard DFdiscover missing value code (*) is automatically inserted in each user-defined field of the exported record.

The lostcode specified with -L must be comprised of one or more alphanumeric characters. It is not permissible to use control characters. Also, the DFdiscover delimiter (|) may not be used, unless data is also being exported in CSV format with the -z option. Note that specifying the -L lostcode option will not, by itself, cause missed records to be exported. That is controlled entirely by the -status option. The -L option only indicates that the second, "normalized" output format is desired and is to be used with the specified substitution code.

Output Options

Output File.  By default, output from DFexport is written to standard output (the command-line output). To write the output to a specific file, specify the filename with -o outfile. The user must have filesystem permission to create or modify the file. If outfile is specified as a relative pathname, the file is created relative to the DFexport command invocation directory.[7] It is also possible to be explicit that standard output is the output destination by specifying -o - although this is not necessary.

Output Mode.  Options -w and -a control the output mode. The output is written (-w) or appended (-a) to the output file. This option is ignored if exported records are written to standard output rather than a file. If the output file does not exist, it is first created. If the output file already exists, it is either overwritten or appended to, depending upon the output mode.

Error Output.  Re-direct any error messages to a specific file with the -e errorfile option. By default, error messages are written to standard error and hence would get intermixed with data if the data is being exported to standard output.

Comma-Separated Values (CSV) Format.  CSV is a popular format used for sharing data records between different software programs. By including the -z option, DFexport will generate output that is compliant with this format. The unique requirements of the format are:

  • Each field within a record is separated by a comma.

  • Leading and trailing spaces adjacent to comma separators are ignored.

  • Fields containing commas as part of their value are enclosed in double quotes.

  • Fields containing double quotes are enclosed within double quotes, and the embedded double quotes themselves are each represented by a pair of consecutive double quotes.

  • Fields with leading or trailing spaces are enclosed in double quotes.

  • The record delimiter is a newline character (this is also true of records exported in the default, non-CSV format).

Exported records, in CSV or traditional (non-CSV), format are always delimited by a newline character.

If the -h option is also specified, the CSV requirements are applied to the header record and all of the values in the header.

Exit Status

DFexport exits with status 0 if the command was successful. Exit status 1 indicates that an error occurred - errors are written to standard error or the errlog file if -e was specified.

Examples

In the following examples, only the options unique to the example are specified. For clarity, required options such as authentication credentials, output destination and study number are omitted.

Example 3.28. Select and filter using data from two different plates

Export complete data records from plate 6. Filter those records to include only instances where records with matching keys on plate 1 have a date variable, S1DATE with a value of 01/01/06.

-Pnum 6 -ANY "1:$(S1DATE) == 01/01/06"


Example 3.29. Module-based export

The structure of data exported from a module varies dependent upon whether all of the fields are instanced on a single plate, or on multiple plates.

Consider a study containing a module, BASE, that collects baseline data in fields HEIGHT, WEIGHT, the data across two records as in:

-Mname BASE
73|180||
||120|70
65|145||
||110|65

To generate the previous single record output requires a little more work, specifically:

-Mname BASE -joinALL "1[HEIGHT,WEIGHT] | 2[SYSTOLIC,DIASTOLIC]"
73|180|120|70
65|145|110|65


Example 3.30. Export eSignature records with a specific name

The eSignature name is stored in the eSignature module. Export records signed by 'Jack'. Several specifications are possible as illustrated here.

-Mname eSignature -expr '$(SigName) == "Jack"'

This is the most specific filter - the signature name field must contain exactly "Jack" and nothing else.

-Mname eSignature -expr 'tolower($(SigName)) ~ "jack"'

This is a less specific filter - the signature name field can contain a case insensitive variation of "Jack" somewhere in the value. Note that this also selects values like "Jackson", "Alex Jack" and "Wojack".

-Mname eSignature -pattern "jack"

This is the least specific filter and selects each eSignature module record that contains a case insensitive match for "jack" anywhere in the record.




[1] This output format matches the output from DFlistplates.rpc.

[2] DFexport typically uses a word to specify an option while DFexport.rpc uses an option letter.

[3] For backwards compatibility, the legacy status keywords (clean, dirty, error, CLEAN, DIRTY, ERROR, missed, primary, secondary, all) are currently supported but will be removed in a future release. Do not rely upon the availability of the legacy status keywords.

[4] Matching occurs before any output formatting (such as variable decoding or string splitting) is applied.

[5] Choice and check field values are NOT zero-padded and hence substring extraction for those data types may yield unexpected or unreliable results.

[6] This option has no effect when applied to exported query or reason records.

[7] If DFexport is invoked from a cron facility, absolute pathnames are highly recommended.