Table of Contents
cdbl_manual - cdbl User’s Manual (ERP manuals)
cdbl - A Program to Generate Binlists of Sorted Single Trials
for avg
In the continuous data digitization and averaging system it is
necessary
to sort individual trials into groups (termed
bins
or
subconditions
)
so that they can be averaged together. The program "cdbl", for
"Continuous
Data BinList", can automatically sort individual
events into
categories
on the basis of type of event, behavioral response,
sequence, or combinations
of the above. The particular contingencies
are contained in an input file
to cdbl, the
bdf,
or "Bin Descriptor File", which is an ASCII file created
using an
editor. The bdf, in combination with the
log
file for each
subject, is processed by cdbl to result in a
binlist
file. The
binlist
file is simply a list of individual trials and the corresponding bin(s)
in which avg should include the trial.
Some caveats regarding cdbl - bin descriptor files can be difficult to
write and debug, and cdbl is far from elegant.
One should
always check
the binlist file created by cdbl to ensure compliance
with ones intentions.
As usual, running cdbl is simple; preparing the input files is
where
the real work occurs. If one has forgotten how to invoke
cdbl, it is possible
to simply type
cdbl
and the proper invocation line will be printed. It should resemble:
cdbl bin_desc_file log_file bin_list_file srate [-c]opt
where
bin_desc_file is the name of the bin descriptor file which should
already
exists, and, hence, is an input to cdbl.
log_file is the name
of a logging file created by the digitization
program (cddg). This file
is also an input to cdbl.
binlist_file is the name of the binlist file
to be created by
cdbl, and is thus and output of cdbl
srate is the
effective real time sampling rate employed at
the time of digitization,
and is in Hz.
This is an input, and is used in calculating time differences
between stimuli.
[-c]opt - This is an optional flag which, when present,
causes
cdbl to clear the logging file flags prior to actual processing
of the log and bin descriptor files. It is necessary to include
this flag
if cdbl has previously processed this logging file.
There are a couple of pitfalls that can cause serious loss of
data or
incorrect results. First, if one accidently types the
arguments to cdbl
in the wrong order, it is possible to clobber
ones log file or bin descriptor
file. This occurs when
the file in question inadvertently becomes the
third argument to cdbl (the output binlist file) because
cdbl attempts
to create that file. If it already exists, it is deleted!
Note that an
accidental extra space in the bin descriptor file name
can cause the logging
file to become the third argument to cdbl;
hence .....
There
are other ways
to blow it, so be careful. The positive cure is to
back up the log and
bin descriptor files and retain these backups until one is
sure everything
worked properly.
Second, if one reruns cdbl using the same logging file that was
used
in the prior run, one must be sure to include the -c flag to
cdbl, or an
incorrect binlist will be created.
The only drawback to
always
including
the -c flag is a slightly longer execution time
the very first time the
logfile is processed.
At the conclusion of the processing, cdbl prints a summary of all
bins
in each condition code and the number of events assigned to
these bins.
This allows the user to roughly check the operation
of cdbl by comparing
the numbers to expected values.
Since the summary of the sorting process can be useful at
later stages
of data analysis, it is often desirable to retain
this output in a file.
This can be done by redirecting the standard
output thus:
cdbl bdf log blf srate -c > bloutfil
As mentioned, cdbl processes a log file from cddg and a bin descriptor
file (which is created using an editor) to produce a binlist file.
Knowledge
of the basic mechanism of the processing of these two files is important
in writing bin descriptor files,
since the order of operations can influence
the binlist file produced.
The reason for creating the binlist file is to allow the
averaging program
to know which bins each trial is associated
with at the time it is encountered
in the raw file.
This is desirable because the magtape is a sequential
access device,
and the overall processing time can be improved by only
moving the
tape forward.
Hence,
the processing
of items in the log file
also proceeds in sequence from first to last.
Each item in the log file,
when under consideration, is compared
to every bin described in the bin
descriptor file having the
proper condition code. This comparison is performed
sequentially
from the first to the last bin corresponding to that condition
code.
Whenever the log item matches a bin, the number of that bin is
appended
to the list for that log item (trial, epoch, single record) in the binlist
file.
When the process is complete, the binlist file will consist of
a
sequential list of ordinal log items and all the bins to which
the corresponding
raw trial should be added when averaging is
performed.
It should be
mentioned at this point that cdbl is designed to be a "splitting"
program, that is, the sorting done by cdbl should generally
result in
the most detailed averages one needs for analyses. Once
trials are averaged
together there is no way to separate them into
more detailed averages
without reaveraging.
It is possible, however,
to lump averages together
at a later date using "dmanip" or its congeners.
The list of bins proper in the binlist file is preceded by ASCII
descriptors
for each of the bins and condition codes. These are
extracted from the
bin descriptor file and are included in
the header to the data after it
is averaged (the header
origenates in the mdh program). It is thus useful
to concisely and
succinctly describe the data in each bin or condition
code when
entering the descriptors.
Both the raw data file and log file produced by cddg can be considered
as a list of events occurring at various times. In addition to
the event
number and the time of occurrence, each log entry (or
raw EEG trial, for
that matter) possesses two other attributes:
a condition code and a set
of
flags.
To learn the basic function of the condition code and the event
numbers,
refer to the "Overview of the Continuous Data Digitization and
Averaging System". The function
of the flags is mostly restricted to the
operation of cdbl, and is
discussed here.
The log file flags are eight separately manipulable binary bits contained
in a single
byte in each logging file entry. In the continuous data system,
none
have any preordained function and can be used in any way desired.
One common use of the flags is to mark events as "used", so they
are not
used again later.
This is useful, for instance, in the designation of
responses as hits. This is treated a little more fully in the example
at
the end of this document; for the present just think of the
flags as eight
little semaphores which can be tested, set, or cleared
in the process
of sorting events into bins.
So, there are four pieces of information that can be used by
cdbl in
sorting events: the event number, the time of occurrence,
the condition
code, and the flags. Since conditions and their
corresponding codes form
major divisions in the logical structure
of an experiment, both the format
of the bin descriptor file and
the processing performed by cdbl reflect
this structure.
Most averages
occur
within
a condition code.
Hence,
the bin descriptor file is separated into
sections with the condition
code as the major heading. The
implication is that an event must fulfill
the contingencies specified
for a particular bin
and
have the appropriate
condition code in order to be sorted into that
bin.
This is an implicit
constraint that should not be overlooked.
Another general constraint on the format of bin descriptor files
is the
order of the condition codes and the bins. Both must begin
at 0 (zero)
and ascend without skipping any numbers. This is not
to say that the conditions
must be run in that order, just that
they must be described in the bin
descriptor file in ascending order.
One condition code must always be present and has an implicit
meaning.
This is Condition Code 0 (zero), and is the calibration
condition. Note
that it receives special treatment in a number
of processing stages, not
being subject to artifact rejection as well as
being the primary and only
mandatory condition.
To begin a section of the bin descriptor file for a particular
condition
code (as well as delimit the end of the previous one,
if present), a "cd
n" is typed on a line followed by another line
containing the ASCII description
of the condition, thus:
cd 0
Calibration 200 msec. 10 uvolt verpos.
Note that cd must be lower case,
and the cd and the 0 must be separated
bay a space or a tab.
It is permissible to indent entries in the bin descriptor
files in order to improve their readability. This can be accomplished
by
using tabs or spaces in front of the entries. Even the ASCII
descriptors
can be indented, since any leading blanks or tabs
will be ignored.
Because
the condition descriptions
will be carried around in the header to the
data, it is useful
to make them meaningful. Up to 40 characters will be
accepted
after stripping off any preceding blanks or tabs.
Following a condition code heading (as above) should be specifications
for all the bins (averages) desired in that condition code. These
are discussed
next...
Because of certain customs of the heathen Druids and Celts, a
bin specifier
is introduced using: sd mm , followed by a line
containing an ASCII description
of the bin. This is almost
identical to the condition delimitation described
above, except
for the sd in place of the cd, and that mm is the
bin
number,
or
subcondition number
(hence the sd) rather than the condition
code. Following the
line containing the ASCII description of the bin is
a line containing
the
bin specifier
proper, which constitutes
the
actual contingencies for the bin in a coded format. Here is
an example
of an entry describing a particular bin:
sd 39
high standards
.{3}
The .{3} specifies that event number 3 is averaged into bin
39; presumably
events with event numbers of 3 are high standards
but this need not be
the case, since the descriptors are arbitrary.
This is about as simple
as a bin specifier can be; any event
numbered 3 in the appropriate condition
code is included in this
bin (remember that the appropriate
condition
code is determined by the position of the bin specifier
in the bin descriptor
file).
In general, the bin specifier is a sequence of symbols
specifying the
conditions which must be satisfied to include
an event in that bin. No
blanks, commas (,), or tabs can
occur on the line of the bin specifier
proper, and there must
be a period (.) somewhere. The period is referred
to as the
time-lock
point. There must also be what is called an
item
specifier
to the right of the time-lock point; this is referred to as the
home item.
The home item is associated with the log entry currently being
processed and which is under consideration for assignment to
the bin. It
is always the item corresponding to the EEG trial
(the event which initiated
post-sampling) which will be added
to the bin at averaging time; hence
the terms
home item
and
time-lock
point.
Every bin specifier must
have a time-lock point and a home item;
all tests involving time-relations
or sequence are relative to
the home item.
As can be seen from the simple example above,
every item specifier is
composed of a sequence of characters
enclosed by curly brackets (set symbol
signs, { and } ). A general
bin specifier is a sequence of item specifiers
and the time-lock
point thus:
{2}{3}.{3}
for example. Each of {2} , {3} , and {3} are item specifiers.
Item specifiers
to the left of the time-lock point denote events
which precede the home
item in time. For the bin specifier to be
matched or fulfilled, all item
specifiers must be matched in the
sequence they are written. In this example,
the home item of 3
will be included in the bin only if it is preceded
by a 2 and a
3 in that order.
Likewise, item specifiers following the
home item must also be matched
in the order specified for the bin specifier
to be fulfilled.
These correspond to events occurring after the home item.
Note
again that it is the home item that is being considered for
inclusion;
any other item specifications entail only a test
of their matching the
specifications, not an assignment to
a bin (until they too come under
consideration as a home item).
While the order of events must match that in the bin specifier
in order
for the home item to be included in the bin, the actual
sequence in which
the testing takes place is as follows. First,
the home item is tested. If
it matches the log item, item specifiers
preceding the home item are compared
sequentially to the log entries
until a failure to match occurs or the
end of the bin specifier
is encountered.
Item specifiers closest to the
home item are processed first. In a
similar manner, if the item specifiers
preceding the home item
are matched (or there aren’t any), the item specifiers
following
the home item are tested, starting with those closest to the
home
item.
This order of execution can become important in complex bin
descriptor files, as will be discussed later.
To summarize these conventions, let’s consider another example.
Suppose
in condition code 3 one wishes to average events numbered
7 which are
preceded by a 4 and followed by a 4
or a 5. Somewhere in the list of bins
under the condition code
3 header one might have a bin specification including
this line:
{4}.{7}{4;5}
Note the subtle way a smidgen of syntax was introduced. Item specifiers
can contain a list of events separated with ;’s to denote the inclusive
or of the events. That is, an item specifier containing event numbers
separated
by ;’s is matched if any of the events in the list occur
at the indicated
point.
Item specifiers can include dependencies on times of event occurrences,
the status of their flags, an ad-hoc method of diddling the flags,
and
a few other tidbits.
An item specifier is basically a list of events and conditions which
must be satisfied by the items in the log file to fulfill the item
specifier.
The list can involve event numbers and flag conditionals
only, and is
then termed a
simple event list.
It is also possible to pre-empt the strict
one-to-one sequential
dependencies implied by the list of item specifiers
constituting
a bin specifier and employ a
time-conditioned event list.
In this case any event in the log file occurring within a specified
time
window from the home item can match the event list, rather than
just the
ordinally appropriate log entry. In any case, event lists
are scanned from
left to right, with processing terminating as soon
as a match is obtained.
This has certain implications for the flag
testing, setting, and clearing
operations discussed below.
Event lists are sequences of event numbers with optional preceding
negation
(the tilda, ~) signs and/or flag test, set, or clear
suffices separated
by semicolons. Remember, no spaces, commas, or
tabs are allowed anywhere
in a bin specifier proper. Here are some
event lists in their item specifier
curly brackets and their
meaning:
event list meaning
{34} Matched by event # 34.
{2;7} Matched by event # 2 or 7.
{~9} Matched by anything but 9.
{*} Matched by any event.
{~*} Never matched.
In an event list it is not necessary to allow lists such as
{4;~5}
or
{~5;4}
since these are the same as {~5}. It might be useful, however, to be
able
to express "not event 4 and not event 5", i.e. anything but
a four or a
five. This is indeed the actual meaning of {~5;4}.
A ~ (tilda) as the first
character of an item specifier negates
the
entire
event list or time-conditioned
event list. That is, the item specifier
is matched if the event list is
not matched; conversely, if the event
list is matched, the item specifier
is not matched.
A
time-conditioned event list
(tcel) is used to specify a window
in
time over which to examine events in the log file. It consists
of a window
specification (with an optional preceding item
specifier negation) prefixed
to an event list thus:
{t<200-1200>256}
This particular tcel is true if an event number 256 is found within
200
to 1200 milliseconds of the home item. Time sense is inverted
if the tcel
appears before the time lock point. In this case log
entries are examined
sequentially starting (in this example) 200
msec before the home item
and ending 1200 msec before the home item.
Processing of a tcel stops on
the first match to the event list
within the given time window.
Any event in an event list can be further contingent upon
the state of
eight separate flags in the log entry. These all
are initialized to zero
(untrue) if the -c flag was included
in the invocation of cdbl. They also
are all zero when the log
file is first created by cddg. The flags are
denoted by their
octal representation enclosed in <> (as with a time window).
Note
that flag representations are the
only
octal numbers employed
in a bin descriptor file.
Octal numbers are used because it is easier to
combine specific
patterns of bits without having to propagate carries
from
one place to another.
Here is a list of
the flags and their octal
representations:
flag octal rep.
1 1
2 2
3 4
4 10
5 20
6 40
7 100
8 200
A flag test operation is appended to an event using colon glue,
thus:
256:f<200> or 256:~f<200>
The first event in this example is matched if a 256 with flag
8 set is
encountered in the log file; the second is matched if
a 256 without flag
8 set is found.
It is possible to test more than one flag at a time. In detail,
the bits
set in the octal number are anded with the flags in
the log entry. If the
result is nonzero, the flag test is true
(unless preceded by the option
negation tilda (~), in which
case it’s false).
In other words, the test
f<203> is true if flag 8
or
flag 1
or
flag 2 (or any combination of
these) is set.
It is further possible to specify a situation such as "flag 1 set
and
flag 4 not set" by appending more than one flag test to the
event number.
This expression implies the "flag 1 and not flag 4"
contingency when appended
to and event number in an event list:
f<1>:~f<10>
Any number of flag tests can be concatenated in this manner.
As usual,
evaluation stops at the first failure with implications
for the set and
clear flag operations.
One can further append a flag set or clear operation to a flag test
using
more colon glue in the following manner:
256:~f<1>:s<1>
93:f<2>:c<2>
256:~f<0>:s<3>
The first example indicates that if a 256 is found without flag
1 set,
set flag 1. The second means that if a 93 is found with flag
2 set, clear
it. The third example indicates how one can circumvent
the necessity of
having a flag test preceding a set or clear operation.
Since ~f<0> is always
true, 256:~f<0>:s<3> sets flags 1 and 2
on every event numbered 256, assuming
all preceding contingencies
were fulfilled.
As with flag tests, flag set and clear operations can be strung
together
using "colon glue". Consider this mess:
256:f<3>:~f<70>:s<300>:c<10>
This particular specification is evaluated and executed as follows:
If
the event number is 256 and___
either flag 1 or 2 (or both) is set and___
none of flags 4, 5, and 6 are set,
Then
set flags 7 and 8 and clear flag 4.
Yikes!!
An important consideration in the setting, clearing, and testing of
flags
is the order of operations and the point where the operation takes
place.
Flag setting or clearing occurs during
tests
applied to log entries
when they match the event and flag conditions
preceding them. This occurs
whether they are home items or not.
Since many operations in cdbl are performed
in a sequential
self-terminating manner, the order of bin specifiers in
a condition
code as well as the order of events in an event list are very
important.
An easy way to grasp the notation used in bin descriptor
files is to
analyze an example.
Here is a simple bin descriptor file to sort epochs
in
an attention experiment. The subject is presented 300 and 700 Hz tone
pips both of short duration and long duration. Short duration 300 Hz tones
are associated with event 1; 300 Hz longs with event 2; 700 Hz shorts with
3; and 700 Hz longs with 4.
In one experimental condition (condition code 1), the subject listens
to the
300 Hz tones and is instructed to press a button as fast as possible,
whenever they detect a long 300 Hz pip (RT).
In condition code 2, the same
stimuli
are presented.
In this condition, however, the subject is instructed
to respond to
the 700 Hz longs (targets).
A button press is event 256
in the logging file. The experimenter
determines that a button press within
the 200-800 milliseconds
following an attended target should be considered
a hit, for both the
stimulus and the response events, and that all these
should be averaged
separately from stimulus misses. Here is a bin descriptor
file
which will do the job:
cd 0
Calibration Pulses
sd 0
cals.
.{1;2;3;4}
cd 1
Attend 300Hz Tones
sd 1
300Hz Standards
.{1}
sd 2
700Hz Standards
.{3}
sd 3
700Hz Targets
.{4}
sd 4
300Hz Target Misses
.{2}{~t<200-800>256:~f<2>}
sd 5
300Hz Target Hits
.{2}{t<200-800>256:~f<2>:s<2>}
sd 6
Response Hits
.{256:f<2>}
sd 7
Response Misses
.{256:~f<2>}
cd 2
Attend 700Hz Tones
sd 8
300Hz Standards
.{1}
sd 9
300Hz Targets
.{2}
sd 10
700Hz Standards
.{3}
sd 11
700Hz Target Misses
.{4}{~t<200-800>256:~f<2>}
sd 12
700Hz Target Hits
.{4}{t<200-800>256:~f<2>:s<2>}
sd 13
Response Hits
.{256:f<2>}
sd 14
Response Misses
.{256:~f<2>}
Note how tabs have been used to indent the different major
divisions to
enhance readability. This is perfectly O.K., and
highly recommended.
Perhaps
the most difficult part of this bin descriptor file
to understand is the
usage of the flags. In this case flag 2 has been
used to indicate that
a response event has been assigned to a stimulus
target so that if two
targets occur very close together in time and are
followed by only one
response which falls within the response windows of
both targets,
only
one target will be counted as a hit,
while the other will be
regarded
as a miss.
It is also important to note the order of the statements
which
test, set and clear flags. This is a consequence of the fact that
binlist
compares log files events to the bin descriptor file statements in
ascending
order. Thus, in the above bin descriptor file it would be an error
to
place the bin specifier statement for target hits prior to the bin
specifier
statment for target misses. The logic goes something like this.
Suppose
cdbl is scanning condition code one (1)
of the log file and the
event
number of the current item it is checking is 2. If the target hit
and
miss bin specifier statements were reversed, binlist would first check
the hit case. If it found a response event (256) within 200 to 800 msec
post stimulus it would count the stimulus event as a hit and set flag 2
of the response event. It would now go on to the next bin specifier statment
which in this hypothetical case is the miss specifier. It now checks to
see
that there is no response event within 200 to 800 msec post stimulus
which
does not have flag 2 set. Since the response event in the window
has
just had its flag 2 set in the previous bin specifier statment, this
condition
is satisfied and the stimulus event is now counted as a miss.
Thus
the same event has been counted as both a hit and a miss which is
a
not desirable.
The above bin descriptor file deals only with events which occur
after
the time lock point. It is, however, a simple matter to construct
averages
based on sequential events by placing the approriate event lists
before
the time lock point. For example, in condition code 1
of the above bin
descriptor file if we wished to generate an average of
all 300 Hz standards
immediately preceeded by two or more 300 Hz stimuli
the bin specifier
would look something like this:
sd n
Doubly Preceded 300Hz Standards
{1;2}{1;2}.{1}
Combinations of event lists and time conditional event lists are
allowed
on either side of the time lock point.
Remember that in order to accommodate
causality in the real world, cdbl evaluates events before the time lock
point before it evaluates events after the time lock point.
Very complex
conditional averaging strategies are possible with prudent
use of this program.
You must, however, pay very close attention to the
order of statments and the
use of flags. It is foolish to attempt to
use this program without checking
the binlist file which it produces for
compliance with your intentions.
There are a large number of possible errors when running cdbl. In
most
cases, the line number where the offense occurred as well as
a short decription
of the problem is printed. These messages are
meant to be self-explanatory,
but difficulties can arise. In some
cases the error message printed doesn’t
correspond to the actual
underlying cause. This can occur when the primary
problem is
accepted as a parameter, etc. and then causes processing of
subsequent
input to be in error.
Usually close inspection can pick up some of the usual problems,
such
as a comma instead of a ;, spaces or omitted letters in
bin specifiers,
etc. If one encounters a difficult and esoteric
problem in a
bin descriptor
file, try some experimental treatments to attempt
to isolate the problem.
For example, start removing entire
condition codes until the error disappears,
and then try to
further isolate the bin specifier. When the "essence" of
the
malfunction has been isolated, the error usually becomes more
apparent.
If this diligent application of common sense and the scientific
method
fails, one might try approaching jch. Good Luck!
Table of Contents