DF_stats

DF_stats — Display simple variable statistics for a single plate

Synopsis

DF_stats {study} {-P #} [-n #, #-#] [-S #, #-#] [ [-F #, #-#] | [-N #, #-#] ] [-G visit plate field code1 label1 code2 label2 ] [-z] [-L lines] [-in file] [-ex file]

Description

DF_stats displays simple descriptive statistics for fields on a specified plate.

DF_stats displays descriptive statistics for fields on a specified plate. It includes all final, incomplete and pending records for the specified plate; it does not include secondary records.

The descriptive statistics displayed depend on the field type and include:

  • Choice.  number of cases for each choice code

  • Check.  number of cases checked and not checked

  • Numeric (including VAS).  mean, standard deviation, and range

  • Date.  date range [9]

  • Text.  minimum and maximum length

For all fields, missing data counts are displayed:

  • Missing Values.  fields marked with a missing value code

  • Illegal Values.  fields that fail the legal range specification from the study schema

  • Blank Values.  fields that contain nothing. This does not include choice or check fields where the response is no choice or not checked

  • Missing Records.  cases (in a group specification) for which data is not available

For fields of date type, DF_stats also tabulates the number of invalid values, where invalid means that one or more the day, month, or year parts is impossible or nonsensical.

The specification of field numbers uses study schema field numbers which include the DFdiscover-defined fields (DFstatus, DFvalid, etc.) in addition to user-defined data fields. Use DF_SSschema or DF_SSvars to display field numbers, variable names, description, coding, etc.

When using -G, the code and label to be used for each group are required. It is possible to specify two groups only. Only subjects meeting the grouping specifications will be included in the report. The number of subjects categorized into both groups must be greater than 0, or the report will fail.

Example 2.129. Group by the code in field 9 of plate 1, visit 0

-G 0 1 9 1 male 2 female

This specification will result in 2 groups based on the grouping variable found in field 9 of plate 1. Cases with the value 1 will go into the male group and cases with code 2 will go into the female group. Any cases that have neither of these codes will not be included.


It is legal to specify a group comprised of more than 1 code. The codes that comprise each group are specified as a list of values and ranges, using the standard format #, #-#.

Example 2.130. Group by range of ages

-G 0 1 9 0~15 child 16~99 adult

This example creates 2 groups on the basis of age (which can be found on visit 0, plate 1, field 9). Subjects 15 and under are grouped as children while those 16 and older are grouped as adults. Notice that there is no syntax for specifying an open-ended range. In this example, the upper-limit of 99 on adults would exclude subjects whose age is 100 or greater from the reported results.


If -G is used to display statistics by a grouping variable, it should be used in conjunction with -S. The percentages shown will then be interpretable as the percentage of subjects in each group. Otherwise if more than 1 visit/sequence number is tabulated some subjects may have more records in the plate than other subjects and the percentages will be based on the total number of records rather than the total number of subjects in each group.

It is possible, for purposes of tabulation, to coerce specified string fields to be numeric fields with -N. This allows DF_stats to report the mean, standard deviation, etc. Note however that string values which cannot be coerced to numeric will be counted as legal values but will not be included in the calculation of mean, standard deviation, or range.

Specific subjects can be included or excluded from the statistics by including their ids in an inclusion or exclusion file. In both cases, the file format is simply one id number per line. It is valid to use an inclusion file and an exclusion file and specify them both as options as in:

-in inclusionfile -ex exclusionfile

In such a case, the included subject IDs are those from the inclusion file that do not appear in the exclusion file.

Options

-P #

Select the plate number (required)

-n #, #-#

Select only subjects from specified site IDs

-S #, #-#

Select only specified sequence (or visit) numbers

-F #, #-#, -N #, #-#

Select only specified field numbers. The specified fields are coerced to numbers if -N is used.

-G visit plate field code1 label1 code2 label2

Report stats by group. The visit plate field values select the grouping variable and code1 label1 code2 label2 specify the two required codes.

-L lines

Maximum number of lines per report page

-z

Exclude any coded categories that have zero counts

-in file

Include subject only if ID is listed in file

-ex file

Exclude subject if ID is listed in file

Examples

Example 2.131. Stats for all fields on plate 7 (for all visits)

-P 7

Example 2.132. Stats for all fields on plate 7 at visit 0

-P 7 -S 0

Example 2.133. Stats for fields 8 through 12 inclusive on plate 22 (for all visits)

-P 22 -F 8~12

Example 2.134. Group plate 6 observations by sex (which is visit 0 plate 1 field 9)

-P 6 -G 0 1 9 1 male 2 female

Limitations

DF_stats is limited to a maximum of 100 fields at a time. On plates which have more than 100 fields use -F iteratively to specify a set of 100 fields or less for each execution.

See Also

DF_SSschema
DF_SSvars


[9] The date range displayed for legal dates always uses imputed values where imputation is defined.