Table of Contents


Name

cdbl_manual - cdbl User’s Manual (ERP manuals)

Description

cdbl - A Program to Generate Binlists of Sorted Single Trials for avg
In the continuous data digitization and averaging system it is necessary to sort individual trials into groups (termed bins or subconditions ) so that they can be averaged together. The program "cdbl", for "Continuous Data BinList", can automatically sort individual events into categories on the basis of type of event, behavioral response, sequence, or combinations of the above. The particular contingencies are contained in an input file to cdbl, the bdf, or "Bin Descriptor File", which is an ASCII file created using an editor. The bdf, in combination with the log file for each subject, is processed by cdbl to result in a binlist file. The binlist file is simply a list of individual trials and the corresponding bin(s) in which avg should include the trial.
Some caveats regarding cdbl - bin descriptor files can be difficult to write and debug, and cdbl is far from elegant. One should always check the binlist file created by cdbl to ensure compliance with ones intentions.

Using Cdbl


As usual, running cdbl is simple; preparing the input files is where the real work occurs. If one has forgotten how to invoke cdbl, it is possible to simply type
    cdbl

and the proper invocation line will be printed. It should resemble:
    cdbl bin_desc_file log_file bin_list_file srate [-c]opt

where

bin_desc_file is the name of the bin descriptor file which should already exists, and, hence, is an input to cdbl.

log_file is the name of a logging file created by the digitization program (cddg). This file is also an input to cdbl.

binlist_file is the name of the binlist file to be created by cdbl, and is thus and output of cdbl

srate is the effective real time sampling rate employed at the time of digitization, and is in Hz. This is an input, and is used in calculating time differences between stimuli.

[-c]opt - This is an optional flag which, when present, causes cdbl to clear the logging file flags prior to actual processing of the log and bin descriptor files. It is necessary to include this flag if cdbl has previously processed this logging file.


There are a couple of pitfalls that can cause serious loss of data or incorrect results. First, if one accidently types the arguments to cdbl in the wrong order, it is possible to clobber ones log file or bin descriptor file. This occurs when the file in question inadvertently becomes the third argument to cdbl (the output binlist file) because cdbl attempts to create that file. If it already exists, it is deleted! Note that an accidental extra space in the bin descriptor file name can cause the logging file to become the third argument to cdbl; hence ..... There are other ways to blow it, so be careful. The positive cure is to back up the log and bin descriptor files and retain these backups until one is sure everything worked properly.
Second, if one reruns cdbl using the same logging file that was used in the prior run, one must be sure to include the -c flag to cdbl, or an incorrect binlist will be created. The only drawback to always including the -c flag is a slightly longer execution time the very first time the logfile is processed.

Output Summarization


At the conclusion of the processing, cdbl prints a summary of all bins in each condition code and the number of events assigned to these bins. This allows the user to roughly check the operation of cdbl by comparing the numbers to expected values.

Hints on Using cdbl


Since the summary of the sorting process can be useful at later stages of data analysis, it is often desirable to retain this output in a file. This can be done by redirecting the standard output thus:
    cdbl bdf log blf srate -c > bloutfil


Basic Operation of Cdbl


As mentioned, cdbl processes a log file from cddg and a bin descriptor file (which is created using an editor) to produce a binlist file. Knowledge of the basic mechanism of the processing of these two files is important in writing bin descriptor files, since the order of operations can influence the binlist file produced.
The reason for creating the binlist file is to allow the averaging program to know which bins each trial is associated with at the time it is encountered in the raw file. This is desirable because the magtape is a sequential access device, and the overall processing time can be improved by only moving the tape forward. Hence, the processing of items in the log file also proceeds in sequence from first to last. Each item in the log file, when under consideration, is compared to every bin described in the bin descriptor file having the proper condition code. This comparison is performed sequentially from the first to the last bin corresponding to that condition code. Whenever the log item matches a bin, the number of that bin is appended to the list for that log item (trial, epoch, single record) in the binlist file. When the process is complete, the binlist file will consist of a sequential list of ordinal log items and all the bins to which the corresponding raw trial should be added when averaging is performed.
It should be mentioned at this point that cdbl is designed to be a "splitting" program, that is, the sorting done by cdbl should generally result in the most detailed averages one needs for analyses. Once trials are averaged together there is no way to separate them into more detailed averages without reaveraging. It is possible, however, to lump averages together at a later date using "dmanip" or its congeners.
The list of bins proper in the binlist file is preceded by ASCII descriptors for each of the bins and condition codes. These are extracted from the bin descriptor file and are included in the header to the data after it is averaged (the header origenates in the mdh program). It is thus useful to concisely and succinctly describe the data in each bin or condition code when entering the descriptors.


Creating a Bin Descriptor File


Introduction and Basic Data Formats


Both the raw data file and log file produced by cddg can be considered as a list of events occurring at various times. In addition to the event number and the time of occurrence, each log entry (or raw EEG trial, for that matter) possesses two other attributes: a condition code and a set of flags. To learn the basic function of the condition code and the event numbers, refer to the "Overview of the Continuous Data Digitization and Averaging System". The function of the flags is mostly restricted to the operation of cdbl, and is discussed here.
The log file flags are eight separately manipulable binary bits contained in a single byte in each logging file entry. In the continuous data system, none have any preordained function and can be used in any way desired. One common use of the flags is to mark events as "used", so they are not used again later. This is useful, for instance, in the designation of responses as hits. This is treated a little more fully in the example at the end of this document; for the present just think of the flags as eight little semaphores which can be tested, set, or cleared in the process of sorting events into bins.
So, there are four pieces of information that can be used by cdbl in sorting events: the event number, the time of occurrence, the condition code, and the flags. Since conditions and their corresponding codes form major divisions in the logical structure of an experiment, both the format of the bin descriptor file and the processing performed by cdbl reflect this structure. Most averages occur within a condition code. Hence, the bin descriptor file is separated into sections with the condition code as the major heading. The implication is that an event must fulfill the contingencies specified for a particular bin and have the appropriate condition code in order to be sorted into that bin. This is an implicit constraint that should not be overlooked.
Another general constraint on the format of bin descriptor files is the order of the condition codes and the bins. Both must begin at 0 (zero) and ascend without skipping any numbers. This is not to say that the conditions must be run in that order, just that they must be described in the bin descriptor file in ascending order.
One condition code must always be present and has an implicit meaning. This is Condition Code 0 (zero), and is the calibration condition. Note that it receives special treatment in a number of processing stages, not being subject to artifact rejection as well as being the primary and only mandatory condition.

Format of a Bin Descriptor File


To begin a section of the bin descriptor file for a particular condition code (as well as delimit the end of the previous one, if present), a "cd n" is typed on a line followed by another line containing the ASCII description of the condition, thus:
    cd 0
    Calibration 200 msec. 10 uvolt verpos.

Note that cd must be lower case, and the cd and the 0 must be separated bay a space or a tab. It is permissible to indent entries in the bin descriptor files in order to improve their readability. This can be accomplished by using tabs or spaces in front of the entries. Even the ASCII descriptors can be indented, since any leading blanks or tabs will be ignored. Because the condition descriptions will be carried around in the header to the data, it is useful to make them meaningful. Up to 40 characters will be accepted after stripping off any preceding blanks or tabs.
Following a condition code heading (as above) should be specifications for all the bins (averages) desired in that condition code. These are discussed next...


Bin Specifiers and Specifying Bins


Because of certain customs of the heathen Druids and Celts, a bin specifier is introduced using: sd mm , followed by a line containing an ASCII description of the bin. This is almost identical to the condition delimitation described above, except for the sd in place of the cd, and that mm is the bin number, or subcondition number (hence the sd) rather than the condition code. Following the line containing the ASCII description of the bin is a line containing the bin specifier proper, which constitutes the actual contingencies for the bin in a coded format. Here is an example of an entry describing a particular bin:
    sd 39
    high standards
    .{3}


The .{3} specifies that event number 3 is averaged into bin 39; presumably events with event numbers of 3 are high standards but this need not be the case, since the descriptors are arbitrary. This is about as simple as a bin specifier can be; any event numbered 3 in the appropriate condition code is included in this bin (remember that the appropriate condition code is determined by the position of the bin specifier in the bin descriptor file).
In general, the bin specifier is a sequence of symbols specifying the conditions which must be satisfied to include an event in that bin. No blanks, commas (,), or tabs can occur on the line of the bin specifier proper, and there must be a period (.) somewhere. The period is referred to as the time-lock point. There must also be what is called an item specifier to the right of the time-lock point; this is referred to as the home item. The home item is associated with the log entry currently being processed and which is under consideration for assignment to the bin. It is always the item corresponding to the EEG trial (the event which initiated post-sampling) which will be added to the bin at averaging time; hence the terms home item and time-lock point. Every bin specifier must have a time-lock point and a home item; all tests involving time-relations or sequence are relative to the home item.
As can be seen from the simple example above, every item specifier is composed of a sequence of characters enclosed by curly brackets (set symbol signs, { and } ). A general bin specifier is a sequence of item specifiers and the time-lock point thus:
    {2}{3}.{3}

for example. Each of {2} , {3} , and {3} are item specifiers. Item specifiers to the left of the time-lock point denote events which precede the home item in time. For the bin specifier to be matched or fulfilled, all item specifiers must be matched in the sequence they are written. In this example, the home item of 3 will be included in the bin only if it is preceded by a 2 and a 3 in that order. Likewise, item specifiers following the home item must also be matched in the order specified for the bin specifier to be fulfilled. These correspond to events occurring after the home item. Note again that it is the home item that is being considered for inclusion; any other item specifications entail only a test of their matching the specifications, not an assignment to a bin (until they too come under consideration as a home item).
While the order of events must match that in the bin specifier in order for the home item to be included in the bin, the actual sequence in which the testing takes place is as follows. First, the home item is tested. If it matches the log item, item specifiers preceding the home item are compared sequentially to the log entries until a failure to match occurs or the end of the bin specifier is encountered. Item specifiers closest to the home item are processed first. In a similar manner, if the item specifiers preceding the home item are matched (or there aren’t any), the item specifiers following the home item are tested, starting with those closest to the home item. This order of execution can become important in complex bin descriptor files, as will be discussed later.
To summarize these conventions, let’s consider another example. Suppose in condition code 3 one wishes to average events numbered 7 which are preceded by a 4 and followed by a 4 or a 5. Somewhere in the list of bins under the condition code 3 header one might have a bin specification including this line:
{4}.{7}{4;5}

Note the subtle way a smidgen of syntax was introduced. Item specifiers can contain a list of events separated with ;’s to denote the inclusive or of the events. That is, an item specifier containing event numbers separated by ;’s is matched if any of the events in the list occur at the indicated point.

More on Item Specifiers


Item specifiers can include dependencies on times of event occurrences, the status of their flags, an ad-hoc method of diddling the flags, and a few other tidbits.
An item specifier is basically a list of events and conditions which must be satisfied by the items in the log file to fulfill the item specifier. The list can involve event numbers and flag conditionals only, and is then termed a simple event list. It is also possible to pre-empt the strict one-to-one sequential dependencies implied by the list of item specifiers constituting a bin specifier and employ a time-conditioned event list. In this case any event in the log file occurring within a specified time window from the home item can match the event list, rather than just the ordinally appropriate log entry. In any case, event lists are scanned from left to right, with processing terminating as soon as a match is obtained. This has certain implications for the flag testing, setting, and clearing operations discussed below.

Event Lists


Event lists are sequences of event numbers with optional preceding negation (the tilda, ~) signs and/or flag test, set, or clear suffices separated by semicolons. Remember, no spaces, commas, or tabs are allowed anywhere in a bin specifier proper. Here are some event lists in their item specifier curly brackets and their meaning:
event list        meaning

  {34}        Matched by event # 34.

 {2;7}        Matched by event # 2 or 7.

  {~9}        Matched by anything but 9.

   {*}        Matched by any event.

  {~*}        Never matched.


Negation of Item Specifiers and Events


In an event list it is not necessary to allow lists such as
{4;~5}

or

{~5;4}

since these are the same as {~5}. It might be useful, however, to be able to express "not event 4 and not event 5", i.e. anything but a four or a five. This is indeed the actual meaning of {~5;4}. A ~ (tilda) as the first character of an item specifier negates the entire event list or time-conditioned event list. That is, the item specifier is matched if the event list is not matched; conversely, if the event list is matched, the item specifier is not matched.

Time-Conditioned Event Lists


A time-conditioned event list (tcel) is used to specify a window in time over which to examine events in the log file. It consists of a window specification (with an optional preceding item specifier negation) prefixed to an event list thus:
{t<200-1200>256}

This particular tcel is true if an event number 256 is found within 200 to 1200 milliseconds of the home item. Time sense is inverted if the tcel appears before the time lock point. In this case log entries are examined sequentially starting (in this example) 200 msec before the home item and ending 1200 msec before the home item. Processing of a tcel stops on the first match to the event list within the given time window.

Testing, Setting, and Clearing Flags


Any event in an event list can be further contingent upon the state of eight separate flags in the log entry. These all are initialized to zero (untrue) if the -c flag was included in the invocation of cdbl. They also are all zero when the log file is first created by cddg. The flags are denoted by their octal representation enclosed in <> (as with a time window). Note that flag representations are the only octal numbers employed in a bin descriptor file. Octal numbers are used because it is easier to combine specific patterns of bits without having to propagate carries from one place to another. Here is a list of the flags and their octal representations:
flag        octal rep.

  1             1
  2             2
  3             4
  4            10
  5            20
  6            40
  7           100
  8           200

A flag test operation is appended to an event using colon glue, thus:
256:f<200>    or    256:~f<200>

The first event in this example is matched if a 256 with flag 8 set is encountered in the log file; the second is matched if a 256 without flag 8 set is found.
It is possible to test more than one flag at a time. In detail, the bits set in the octal number are anded with the flags in the log entry. If the result is nonzero, the flag test is true (unless preceded by the option negation tilda (~), in which case it’s false). In other words, the test f<203> is true if flag 8 or flag 1 or flag 2 (or any combination of these) is set.
It is further possible to specify a situation such as "flag 1 set and flag 4 not set" by appending more than one flag test to the event number. This expression implies the "flag 1 and not flag 4" contingency when appended to and event number in an event list:


f<1>:~f<10>

Any number of flag tests can be concatenated in this manner. As usual, evaluation stops at the first failure with implications for the set and clear flag operations.
One can further append a flag set or clear operation to a flag test using more colon glue in the following manner:

256:~f<1>:s<1>

93:f<2>:c<2>

256:~f<0>:s<3>

The first example indicates that if a 256 is found without flag 1 set, set flag 1. The second means that if a 93 is found with flag 2 set, clear it. The third example indicates how one can circumvent the necessity of having a flag test preceding a set or clear operation. Since ~f<0> is always true, 256:~f<0>:s<3> sets flags 1 and 2 on every event numbered 256, assuming all preceding contingencies were fulfilled.
As with flag tests, flag set and clear operations can be strung together using "colon glue". Consider this mess:


256:f<3>:~f<70>:s<300>:c<10>

This particular specification is evaluated and executed as follows:

If
    the event number is 256 and___
    either flag 1 or 2 (or both) is set and___
    none of flags 4, 5, and 6 are set,
Then
    set flags 7 and 8 and clear flag 4.
Yikes!!
An important consideration in the setting, clearing, and testing of flags is the order of operations and the point where the operation takes place. Flag setting or clearing occurs during tests applied to log entries when they match the event and flag conditions preceding them. This occurs whether they are home items or not. Since many operations in cdbl are performed in a sequential self-terminating manner, the order of bin specifiers in a condition code as well as the order of events in an event list are very important.

An Example


An easy way to grasp the notation used in bin descriptor files is to analyze an example. Here is a simple bin descriptor file to sort epochs in an attention experiment. The subject is presented 300 and 700 Hz tone pips both of short duration and long duration. Short duration 300 Hz tones are associated with event 1; 300 Hz longs with event 2; 700 Hz shorts with 3; and 700 Hz longs with 4.
In one experimental condition (condition code 1), the subject listens to the 300 Hz tones and is instructed to press a button as fast as possible, whenever they detect a long 300 Hz pip (RT). In condition code 2, the same stimuli are presented. In this condition, however, the subject is instructed to respond to the 700 Hz longs (targets). A button press is event 256 in the logging file. The experimenter determines that a button press within the 200-800 milliseconds following an attended target should be considered a hit, for both the stimulus and the response events, and that all these should be averaged separately from stimulus misses. Here is a bin descriptor file which will do the job:
 cd 0
 Calibration Pulses
     sd 0
         cals.
         .{1;2;3;4}
 cd 1
 Attend 300Hz Tones
     sd 1
         300Hz Standards
         .{1}
     sd 2
         700Hz Standards
         .{3}
     sd 3
         700Hz Targets
         .{4}
     sd 4
         300Hz Target Misses
         .{2}{~t<200-800>256:~f<2>}
     sd 5
         300Hz Target Hits
         .{2}{t<200-800>256:~f<2>:s<2>}
     sd 6
         Response Hits
         .{256:f<2>}
     sd 7
         Response Misses
         .{256:~f<2>}
 cd 2
 Attend 700Hz Tones
     sd 8
         300Hz Standards
         .{1}
     sd 9
         300Hz Targets
         .{2}
     sd 10
         700Hz Standards
         .{3}
     sd 11
         700Hz Target Misses
         .{4}{~t<200-800>256:~f<2>}
     sd 12
         700Hz Target Hits
         .{4}{t<200-800>256:~f<2>:s<2>}
     sd 13
         Response Hits
         .{256:f<2>}
     sd 14
         Response Misses
         .{256:~f<2>}

Note how tabs have been used to indent the different major divisions to enhance readability. This is perfectly O.K., and highly recommended. Perhaps the most difficult part of this bin descriptor file to understand is the usage of the flags. In this case flag 2 has been used to indicate that a response event has been assigned to a stimulus target so that if two targets occur very close together in time and are followed by only one response which falls within the response windows of both targets, only one target will be counted as a hit, while the other will be regarded as a miss. It is also important to note the order of the statements which test, set and clear flags. This is a consequence of the fact that binlist compares log files events to the bin descriptor file statements in ascending order. Thus, in the above bin descriptor file it would be an error to place the bin specifier statement for target hits prior to the bin specifier statment for target misses. The logic goes something like this. Suppose cdbl is scanning condition code one (1) of the log file and the event number of the current item it is checking is 2. If the target hit and miss bin specifier statements were reversed, binlist would first check the hit case. If it found a response event (256) within 200 to 800 msec post stimulus it would count the stimulus event as a hit and set flag 2 of the response event. It would now go on to the next bin specifier statment which in this hypothetical case is the miss specifier. It now checks to see that there is no response event within 200 to 800 msec post stimulus which does not have flag 2 set. Since the response event in the window has just had its flag 2 set in the previous bin specifier statment, this condition is satisfied and the stimulus event is now counted as a miss. Thus the same event has been counted as both a hit and a miss which is a not desirable.
The above bin descriptor file deals only with events which occur after the time lock point. It is, however, a simple matter to construct averages based on sequential events by placing the approriate event lists before the time lock point. For example, in condition code 1 of the above bin descriptor file if we wished to generate an average of all 300 Hz standards immediately preceeded by two or more 300 Hz stimuli the bin specifier would look something like this:
    sd n
        Doubly Preceded 300Hz Standards
        {1;2}{1;2}.{1}


Combinations of event lists and time conditional event lists are allowed on either side of the time lock point. Remember that in order to accommodate causality in the real world, cdbl evaluates events before the time lock point before it evaluates events after the time lock point.
Very complex conditional averaging strategies are possible with prudent use of this program. You must, however, pay very close attention to the order of statments and the use of flags. It is foolish to attempt to use this program without checking the binlist file which it produces for compliance with your intentions.


Errors


There are a large number of possible errors when running cdbl. In most cases, the line number where the offense occurred as well as a short decription of the problem is printed. These messages are meant to be self-explanatory, but difficulties can arise. In some cases the error message printed doesn’t correspond to the actual underlying cause. This can occur when the primary problem is accepted as a parameter, etc. and then causes processing of subsequent input to be in error.
Usually close inspection can pick up some of the usual problems, such as a comma instead of a ;, spaces or omitted letters in bin specifiers, etc. If one encounters a difficult and esoteric problem in a bin descriptor file, try some experimental treatments to attempt to isolate the problem. For example, start removing entire condition codes until the error disappears, and then try to further isolate the bin specifier. When the "essence" of the malfunction has been isolated, the error usually becomes more apparent.
If this diligent application of common sense and the scientific method fails, one might try approaching jch. Good Luck!


Table of Contents