dmanip is a program which can perform generalized linear operations on ERP data. It employs an ASCII file, termed a data manipulation command file, containing the specific operations to perform on input data files. The output data are written to a new file, and contain the results of the operations. dmanip can be used to form the following:
- grand averages over subjects
- averages across experimental treatments and waveforms
- difference waveforms
- re-ordering bins with additions and/or deletions
- re-referencing data to another recording channel
- forming "bipolar" derivations
- deleting and/or re-ordering channels
dmanip is easy to use, and the command files are also easily generated. Some of the enhancements over its predecessor manip include :
- Scaling of input data by an arbitrary constant or the number of sums in the input bin.
- Semi-automatic bookeeping for polarity and size (pp10uv).
- Header operations using mnemonics rather than numerical offsets.
- Pre-writing of the output data file to ensure sufficient space.
- Extensive error reporting.
- Use of locked swapped devices to prevent inadvertent destruction of data being averaged and to allow sharing of intermediate swap space with other programs.
- Use of standard program libraries to improve program maintainability.
- Linear operations on input channels to allow average references or current source density analyses after averaging.
The actual use of dmanip is simple; most of the work lies in the generation of the data manipulation command file. dmanip is invoked thus:
dmanip commandfile outputfile [options]opt infile1 infile2 ...
where commandfile is the name of a file containing data manipulation commands (see below, Data Manipulation Command Files ), outputfile is the name of a file (to be created) which will contain the results of the operations, and infile1 infile2 ... represent the input data files which will be used with the command file to generate the massaged data. The options available are:
This "option" is required whenever the data manipulation command file contains only wild-card specifications (e.g. for a grand average). dmanip needs to know the total number of bins prior to allocation of swap space; the #_bins following the -g should be a positive decimal integer one greater than the largest bin or subcondition number in the input data files, since 0 is included too.
This option can be employed to alter the resolution with which the data are stored. The argument "desired_pp10uv" following the -d should be a decimal integer representing the desired "points per 10 microvolts" in the output file. Since dmanip normally forces data to be "positive up" (i.e. positive numbers refer to positive voltages with respect to the reference), this option can also be used to force the data to be written to the output file with the opposite polarity ("negative up") by supplying a negative integer for desired_pp10uv. In any case, the actual polarity of the data is kept track of the header (in verpos), so one need not be concerned with the manner in which the data are stored, assuming they were properly normalized to begin with. By the same token, dmanip flips the polarity of input files where necessary so that "appropriate" operations are performed on those data as specified in the command file. Again, however, if the data were not normalized properly and the data header for one file claims that the data are "positive up" but the data are actually "negative up", these data are going to badly destroy any output files they are used to form.
This option is used to create a file containing names of input files, and specify that file on the command line rather than all of the input filenames separately. This allows the user to circumvent the 256 character line length limit DOS imposes on the command line. Input file names are listed in "in_files", one per line.
Upon invocation, dmanip reads the command file and encodes it into an internal, easily executed form. If any errors are detected, a message describing the problem and the offending line in the command file is printed. Then, dmanip acquires the swap space needed for the intermediate calculations. Next, dmanip checks to see if the outfile already exists. If it does, a message to that effect is given, and dmanip exits, else the outfile is created and pre-written with all zeroes. The pre-writing is done to ensure that the final data can be written after what can be a lengthy processing period, and checking for the prior existence of the output file prevents inadvertent destruction of valuable data or writing onto input data or command files.
The operations specified in the command file are then performed on each input file in turn. If any errors arise during the data processing phase of dmanip (e.g. bin or channel out of range), a message containing a description of the problem and the line of the command file is printed. When no more input files remain, the final processing is performed, the data are written, and dmanip exits normally.
A Data Manipulation Command File (dmcf) is an ASCII file created using an editor, and is comprised of three major sections. The first of these is the channel processing (abbreviated cp). Since all bins in the output data file are constrained to have the same number of channels as well as the same order of channels, the specifications in the cp segment of the dmcf define the number and nature of the data channels in the output file. The cp segment is executed every time new input data are acquired from the current input data file, and this is the point at which the polarity and gain of the data are standardized. Commands in the cp segment involve scaling and then adding or subtracting particular input data channels from one another to form the specified output data channels.
The next major section in a dmcf is termed the repetitive processing (abbreviated rp) section. The specifications in the rp segment are executed once for every input data file. Generally, the commands in the rp segment involve scaling and then adding or subtracting an input data bin (after channel processing has been performed) from an intermediate accumulation area for a specified output bin. These intermediate areas are pre-set to zero prior to the processing of any of the input data files.
The final section of a dmcf, the Final Processing (abbreviated fp) commands, are operations which are best performed after all the input data files have been processed in the rp section of the dmcf. Since there are no input data during this processing, arithemetic operations are restricted to division of intermediate accumulation areas by the number of input files (when forming grand averages), division by the number of sums (when weighted averaging is being performed), or division by a constant. The fp section is also the best place to do housekeeping on the data headers for the output bins. Header commands can be used to attach ASCII descriptors to the data describing their new meaning and/or the way in which they were formed, or to set any of the integer variables (e.g. sums, rejection counts) to new values.
The cp, rp, and fp sections of the dmcf are delimited by their abbreviations, in lower case, alone on a line. The three sections must appear in the order cp, rp, and finally fp. Note that it is permissible to precede any specification (command) or delimiter by blanks or tabs; indeed, the use of indentation is encouraged to improve readability.
When the dmcf is encoded, any lines whose first non-blank and non-tab character is not a digit (0-9), an asterisk (*), or a c, r, or f are ignored (this includes blank lines). Thus, comments can be incorporated in a command file by simply making sure they are not valid specifications. The capability of including comments has its drawbacks, though. It implies that lines which are badly formed or have typographical errors at the beginning may be ignored without warning. Hence, it is recommended that comments have a consistent format in a dmcf to facilitate visual inspection and verification of the file. For instance, one might introduce each comment with a / or ;.
All three sections of a dmcf employ similar syntax to specify operations on the data or the data headers. The meaning of the specifications, however, differs from section to section and will be spelled out in the detailed description of the cp, rp, and fp sections below. Every valid specification line consists of a sequence of arguments (terms, tokens) separated by spaces, commas, or tabs. The first argument must be a non-negative decimal integer, or perhaps and asterisk "*" (see below). This is termed the "output number" here, for purposes of discussion, and refers to either an output bin or output channel. The second argument can be either a solitary equal sign "=", or a mnemonic for a header variable or description. Hence, the second argument of a valid specification line determines whether data operation(s) or a header operation is to be performed. Header operations are described in detail below; we now concentrate on data operations and their syntax.
Following the equal sign on a data operation line are one or more terms indicating the operations to perform. If a term begins with a minus sign "-" or a plus sign "+", the specified input data will be subtracted from or added to the intermediate accumulation area, respectively. A term beginning with a slash "/" indicates that the intermediate area should be divided by the integer which it immediately precedes. Any term beginning with a digit is deigned to have an implicit "+" preceding it, hence, those data are added into the accumulation area.
Besides the (optional) initial operation character, terms are just non-negative integers referring to input data (either input channels or input bins) or non-negative integers preceded by a scaling factor and a scaling symbol, the colon ":". Scaling factors are optional, and their presence is recognized by the colon. Scaling factors consist of positive decimal numbers of magnitude less than 32 having up to three fractional decimal digits. Scaling involves multiplying the input data by the scaling factor prior to its use in other operations. Here are some examples of valid terms:
1.:3 4:4 31.999:43 .25:8 7 while these are invalid : 32.000:10 0.000:3 :4 .:7
Remember, the number to the right of the ":" refers to an input data bin or channel which is to be multiplied by the scaling factor, and that there can be no embedded blanks in a term.
All data addition, subtraction, and division operations are performed in the order they appear on the command lines (after scaling has been performed) from left to right. Thus, scaling has the highest precedence, while addition, subtraction, and division are of equal and lower precedence. If there are too many operations to fit on one line, one may continue on the subsequent line by simply introducing the same output number as the first argument on the line. In fact, one can have as many lines as one needs for each output number, they are all concatenated during command file encoding.
In the channel processing (cp) section of a dmcf, the output numbers refer to output channels, while the terms to the right of the equal "=" sign represent input data channels. No output channels may be skipped, and the output channel specifications must appear in the cp section in ascending numerical order. Here is an example of a simple cp section:
; simply copy the input channels to the output channels ; cp 0 = 0 1 = 1 2 = 2 3 = 3 4 = 4 5 = 5 6 = 6 7 = 7 8 = 8 9 = 9 10 = 10 11 = 11
The comment explains the operation and meaning of this channel processing specification. Note that the current implementation of dmanip does not allow "wild card" constructions as are possible in the rp section (see below), hence it is necessary to list all the input and output channels, even if one wishes only to copy them. Note also that it is possible to delete channels, add new channels, and re-order channels in the cp section.
Suppose we have a data file consisting of 8 channels, and we wish to re-reference the data to the eighth channel (i.e. calculate the ERPs that would have been obtained by using the electrode corresponding to channel 7 as the reference) and delete both it and the seventh channel. We might write:
; ; re-reference these babies to the nose electrode ; and drop the toe electrode (nose-> 7, toe->6) ; cp 0 = 0 -7 1 = 1 -7 2 = 2 -7 3 = 3 -7 4 = 4 -7 5 = 5 -7 ;The output data file would have 6 channels, and the chans header variable will be set appropriately for 6 channels.
As a final example of the possiblities for channel processing, assume we have five channels of input data from Fz, Cz, Pz, C3, and C4 respectively, and we want the output data, after rp and fp processing, to have three channels composed of bipolar derivations (Fz - Cz) and (Cz - Pz), and a CSD (current source density) channel at Cz. We can specify the ouput channels in the cp section as follows:
; ; get Fz-Cz, Cz-Pz, and csd at Cz ; cp 0 = 0 -1 1 = 1 -2 2 = .25:0 .25:2 .25:3 .25:4 -1Note the use of scaling in this example. The CSD channel (2) could have been calculated without using the scaling feature but using instead the ability to divide by constants, thus:
; ; get Fz-Cz, Cz-Pz, and csd at Cz ; cp 0 = 0 -1 1 = 1 -2 2 = 0 2 3 4 -1 -1 -1 -1 /4Note that only terms involving addition or subtraction have a semantic association with input channels; the number following the / is interpreted as the integer 4, and is used to divide the sum of the previous operations by four.
The specifications contained in the repetitive processing section are of similar form to those in the channel processing section, but the meanings of the terms and output numbers are different: they now refer to input bins (after channel processing has been performed) and the intermediate accumulation areas, respectively. In addition, it is possible to use an asterisk "*" in a term as an input bin number or as an output number to denote the number of the current output bin. This construction is known as a "wild-card" specification. It allows simple command files to be used to form grand averages.
The output numbers (preceding the "=") must again be in ascending sequence without any jumps. Wild-card specifications should precede the specifications for output bin 0, if any such specific command lines are present, although wild-card commands will be executed after any commands with specific output numbers. There are no restrictions on the input bins, except that they be present in the input data at run-time. Here are some valid rp specifications:
0 = 4 simply adds input bin 4 of the current file to the running sum for output bin 0, while 1 = 10 -23 adds bin 10 and then subtracts bin 23 of the current input data from the accumulation area for output bin 1. Consider: 2 = 14 24 35 47 /4 This specification might be used to lump together four input bins ( 14, 24, 35, and 47). Note that division involves a constant operand as in the cp section. This calculation could also have been done using scaling rather than division, thus: 2 = .250:14 .250:24 .250:35 .250:47Note that the meaning or function of the rp section depends to some extent on the number of input data files that are supplied on invocation. Since the intermediate accumulation areas are cleared before the first file is processed, if only one input file is specified the examples above will indeed simply copy input bin 4 into output bin 0, form a difference wave of input bin 10 minus input bin 23 for output bin 1, and lump together input bins 14, 24, 35, and 47 to form output bin 2, respectively. On the other hand, if many input files are supplied upon invocation, a specification such as
0 = 4will lump input bin 4 across input files (assuming the appropriate division takes place in fp) and place it in output bin 0. There is actually nothing different about the two cases as far as dmanip is concerned, but the command files are usually different when one specifies multiple input files versus a single input file. Multiple input files are used almost exclusively for grand averages; single input files are used when one wishes to do arithmetic on ERPs on a subject by subject basis.
As mentioned, it is possible to employ "wild-card" specifications in the rp (as well as the fp) sections of a dmcf. These are used almost exclusively to form grand averages, although they can also be useful for setting a header variable in all output bins in one crack. The "*" stands for current output bin, and can be used in place of a bin number on either the input or output side of a data operation specification. As an example, consider this rp section of a dmcf which could be used to form a grand average:
; ; grand average rp section is simple ; rp * = * ; ;Whenever a wild-card output bin is specified, dmanip sequentially runs through all the intermediate accumulation areas and performs the specified action, in this case adding in the same numbered input bin from the current input file. Wild-card constructions can be employed in conjunction with specific processing of specific output bins; this is most often done with header operations. There is an idiosyncratic constraint in this case: any wild-card construction with the "*" as an output number must appear in the rp or fp section prior to any specific processing, but they will be executed after the specific operations when input files are processed.
There is another ad hoc feature of dmanip which can be employed in the rp section of a dmcf; this is scaling by the number of sums in the current input data bin prior to addition. There is a special syntax associated with this specification as follows:
4 = ^4 ^36
meaning scale input data bin 4 by the number of sums in its header, and add it into the accumulation area for output bin 4, then scale the data from input bin 36 (in the current input file) by its number of sums and add it also. This subterfuge has been included to allow one to lump across conditions which might have been averaged together in the first place. Note that this operation cannot be used with subtraction ( the "^", or up-arrow implies addition) or scalar scaling, and has implications for the bookeeping of the sums for the output bin with which it is used (see below, The Output Header: sums, chans, pp10uv, and verpos). The scaling by sums should be used with the special division by the number of sums in the fp section, as detailed below.
The final processing section is executed after all the input files have been processed. Hence, no arithmetic operations other than division of the intermediate areas are allowed. This is a good place to perform header operations, since they will only be performed once, rather than (possibly needlessly) repetitively. Nonetheless, one often wishes to place the header operation specifications in the rp section near the data operations that form the output data for that bin to improve readability. This is O.K., in fact, the extra time it takes to repetitively perform the header operations is negligible.
Division in the final processing section
can employ constants, as usual. However, since one might wish to use a dmcf
for more than one set of data, or one may be lumping across conditions
after scaling by the number of sums in each, two special symbols can be
used after the "/" in the fp section. The first, "n", represents the number
of input files that were processed, while the second, "s", represents the
total sums for those intermediate areas whose rp operations were exclusively
scaling by the number of sums and addition. For example, the grand average
rp section above might employ this fp section:
; ; divide and conquer ; fp * = /nNote that the meaning of the elements in a fp specification line are somewhat different yet from those in the cp and rp section. The data to be operated on are implied, and are those of the output number for the particular line under consideration. Thus, the example above causes dmanip to divide each intermediate area by the number of input files. Again, one should not employ lines of the type:
* = * /n
in the fp section. This is not really inconsistent with the notation used in the rp section, since all operations are implied to be performed on the current contents of the accumulation area whose bin number is that of the left hand side of the "=".
When the fp specifications are exhausted, the intermediate areas are written to the output file, and dmanip terminates.
In some cases all the processing will be completed in the rp section of the dmcf; this can arise when one is performing arithmetic on ERPs from a single subject, and any desired header operations have been performed in that rp section. In these cases it is permissible to have a null fp section consisting of only the fp on a line by itself.
operations are most often performed to redescribe the data in the output
file. One should be very careful when performing header operations, especially
on the integer parameters, as they control the operation of many programs
which process ERP data in the EPL format. Header operations are often specified
in the fp section, but can also appear in the rp section of a dmcf. Unlike
data operations, header operations are specified one per line, and in the
case of header descriptor variables, two lines are required. Like data operations,
the first argument on the line must be the output number. This must appear
in the appropriate ordinal position of the rp or fp section, and can employ
the "*" for a wild-card operation. The second argument should be a lower
case ASCII name. This name should be the mnemonic for the particular header
variable or descriptor which one wishes to alter. Since different mnemonics
and their header variables require different argument formats, they are
grouped here in that manner for reference. The first group consists of header
variables which are single integer values. The mnemonics for this group
chans sums tpfuncs pp10uv verpos odelay totevnt condcode presampling trfuncs totrawrecs totrejects binnumber cprecisand are set thus:
bin hdr_var_name valuewhere bin is the number of the output bin whose header one wishes to alter (* does them all), hdr_var_name is one of those integer header mnemonics listed above, and value is the integer value to which one wishes the header variables set. Although not entirely true, if the name is not familiar, one is probably best off not messing with that variable.
is one header variable which consists of an array of integers. This is,
mnemonically, "rejcounts", and is the number of trials which were rejected
on various bases during averaging. An ASCII description of the rejection
categories appears in the "rejdesc" array of descriptors (see below). The
"rejcounts" can be altered by a line of the form:
bin rejcounts index valuewhere bin is the output bin number, index is the number of the rejection count variable (0-7) and value is the desired replacement integer.
come the ASCII descriptors. The first type consist of the group:
subdesc bindesc condesc expdescwhich stand for subject description, bin desciption, condition description, and experiment description. These are all up to 39 characters in length, but one need not count the characters as any description supplied will be truncated appropriately. These descriptions are set thus:
bin hdr_var_name ASCII description on a line by itselfwhere hdr_var_name is one of the four above, and the description immediately follows the header operation specification line. Note that the header description operations are the only dmcf commands which employ a line without the output number on the left - the description itself. The description may, however, be preceded by any number of blanks and/or tabs which will be deleted prior to being placed in the header.
The final type of header variable are the
arrays of descriptors. These too employ an ASCII description alone on the
subsequent line, but also require an index to specify the particular element
in the array which is to be altered. Here are the mnemonics for the arrays
chndesc rejdesc prfdesc
standing for channel descriptions, rejection count descriptions, and processing
function descriptions. The command format for these is:
bin hdr_var_name indexThe index should be between 0 and 15 for chndesc, 0 and 7 for rejdesc. One will probably never need to alter prfdesc. These descriptors are up to 7 characters long. One may need multiple header operations to set all the chndesc or rejdesc elements desired.
The header attached to a particular output bin is copied by default from the first input bin which participates in the formation of the data for that particular output bin. This is usually a desirable result, especially for grand averages, and one in any case has the ability to go in and change any specific header information.
There are a certain number
of variables in the header which are changed to reflect the processing
performed by dmanip. The output file will have the number of channels (chans)
set to the # of output channels. In addition, dmanip sets pp10uv and verpos
in the output file according to the input data and any specified changes
(via the desired points per 10 microvolts request). Nonetheless these values
may not be what one really needs. For instance, a dmcf whose sole rp section
* = -*will have its polarity inverted without any corresponding changes in the header variable verpos. One should thus be careful of allowing dmanip to keep track of pp10uv and verpos if strange manipulations are being performed on the data.
The sums variable in the data headers is the least well maintained
of those which dmanip alters. In the standard case all the sums variables
are set to zero when the swap areas (intermediate accumulation areas) are
zeroed. Then, whenever an input data bin is added or subtracted (regardless
of scaling) to the intermediate accumulation area, the sums variable in
the output header for that area is incremented. Thus, the sums variable
is really the number of input data bins which were involved in forming
those output data. This is convenient when one forms a grand average with
* = *rp section, since sums will then represent the number of subjects which were included in the grand average. In most other cases the sums variable is probably meaningless and may need to be forced using a header operation, if one wants to bother.
There is one other situation, however, in which
the sums variable can be made to do the "right thing"; this is during scaling
by the number of sums and adding. Whenever dmanip encounters an rp command
involving scaling by the number of sums, the sums variable associated with
that particular output area is augmented by the number of sums. Thus, an
rp entry of
7 = ^7 ^52will set the sums variable for area 7 to the sum of the number of sums of input bins 7 and 52. If one employs
7 = /sin the fp section, the data will have been lumped together (weighted by the number of sums) with the appropriate value for the final output bin 7 in the header for output bin 7. Note that this type of weighted lumping can be intermixed with other specific operations; dmanip maintains separate sums for each bin and the resulting sums in the header will correspond to the operations that were performed on only that bin.
Errors which occur during the encoding of a dmcf are generally self explanatory and include the line of the offending item in the dmcf. In any case, there are too many to be listed and explained.
Errors can occur during the processing of the data which could not be detected by encoding the dmcf. These include data overflow errors, hardware read or write errors, or incompatibility of the dmcf with the input data specified. Any of these errors produce a "run-time error message" which includes a short description of the problem, the current output bin, the name of the current input file, and the line in the original dmcf which was being executed when the error arose.
Data overflow can occur at just about any point in the massaging of the data, however, they are most likely to occur during scaling operations or divsions. This pattern arises because in both cases the data are being compressed from long (double precision integers) to single precision integers (16 bits). Often these errors can be eliminated by specifying a smaller desired pp10uv in the invocation.
Beware, however: DATA OVERFLOW ERRORS THAT ARE NOT DETECTED BY DMANIP ARE POSSIBLE WITH THE CURRENT IMPLEMENTATION. Hence, be sure to check the results of your operations for very strange data. Note also that there are probably some bugs in dmanip, too!
It is quite possible to forget the origins of a dmcf, and apply it to data which do not have the proper number of input channels or input bins. This is a fatal error, and a message specifying the problem will be printed. Unfortunately these types of errors are not easily distinguished from hardware read errors; actual read errors are so improbable, though, that this possibility can usually be dismissed out of hand.
One useful function of dmanip is to average together
a number of subjects who have participated in the same experiment to form
what is colloquially known as a "Grand Average". This is particularly easy
if one employs the "wild-card" constructions available in the rp section.
There is one tedious point which prevents a "universal" grand averaging
dmcf: there can be no wild-card channel specifications. Hence, one must type
in a dmcf which contains the proper cf section corresponding to the data
at hand. Consider a set of data which have 12 data channels. We might employ
this dmcf to form a grand average over subjects:
; ; Grand average - 12 channels ; cp 0 = 0 1 = 1 2 = 2 3 = 3 4 = 4 5 = 5 6 = 6 7 = 7 8 = 8 9 = 9 10 = 10 11 = 11 rp * = * fp * = /n * subdesc Grand AverageNot too bad, eh? The cp section simply copies the input channels to the output channel slots, and the rp section adds together like bins. The fp processing simply divides all data by the number of input files (i.e. the number of subjects, hopefully) and replaces the subject description by the phrase "Grand Average". Other header variables, however, will retain values of the first subject in the invocation list of input files. For instance, all the rejection counts will be those of the first subject. Also, since this dmcf contains only wild specifications for the rp section, dmanip will balk and complain if the -g #_bins "option" is not employed to specify the number of bins in the data.
Another common use of dmanip is to form difference waves. This type of function differs from the grand average above in that it is basically an arithmetic operation on a single subject at a time. Hence, if one wanted difference waves for each subject, dmanip will have to be run once for each subject
we have a very simple attention experiment with four input channels, and
we wish to form attended minus inattended difference waves based on high
tones, low tones, and the lumped difference wave for high and low tones.
Assuming the bin #’s here are appropriate, the form of the dmcf might be:
; ; ; cp 0 = 0 1 = 1 2 = 2 3 = 3 ; ; rp 0 = 12 -13 0 bindesc High difference wave. 1 = 24 -25 1 bindesc Low difference wave. 2 = 12 24 -23 -25 /2 2 bindesc Lumped difference wave. fpIn this case we have chosen to place the header operations next to the data operations with which they are associated in the rp section of the dmcf. This is acceptable; they could also have appeared in the fp section. Note the "null" fp section and the use of blank lines and comments to improve readability of the command file.
Let’s pretend we have run
a simple experiment involving three EEG channels and originally planned
to compare the first and second halves of the experiment (which consist
of the same experimental treatments) to examine the effects of fatigue
on the ERPs. Now, however, we realize the number of sums is going to be
too low to really demonstrate any effects, and we wish to massage the data
to form the averages we would have obtained if all the same treatments
from the first and second halves of the experiment had been averaged together
in the first place. To keep it short, let’s also pretend there were only
four experimental treatments in both the first and second halves giving
a total of 8 bins in the input data files. This is one (perhaps the only)
situation in which one might want to scale by the number of sums in the
data. So be it:
; ; cp 0 = 0 1 = 1 2 = 2 3 = 3 rp 0 = ^0 ^4 1 = ^1 ^5 2 = ^2 ^6 3 = ^3 ^7 fp 0 = /s 1 = /s 2 = /s 3 = /s
This should do it, assuming the experimental treatments are assigned to the bins in the "proper" way. This is a once per subject file.
Table of Contents