Table of Contents


erp_overview - overview of ERP data collection and analysis (ERP manuals)


An Overview of the Continuous Data Digitization and Averaging System
Loosely speaking, an ERP experiment involves a number of stages between the hypothesis and the advancement of the frontiers. These include designing the experiment (including the analysis and the interpretation of various outcomes), stimulus production and coding, running subjects and digitizing data, averaging and massaging data, measuring and analyzing data, and finally interpreting and writing up the data. This overview focuses on the digitization and averaging aspects of the process using the continuous data system; more specifically, the relevant hardware, software, and data structures used in transforming analog data from the polygraph into digital averages around the epochs of interest.



The process of digitization refers to the conversion of analog data from the polygraph into a sequence of numbers stored in digital form. These numbers represent samples of the original data, and are taken at a fixed rate, termed the sampling rate, in the continuous data system. This sequence of numbers, acquired for each channel and stored on magtape or as a disk file, forms the raw file.
Digitization is usually used to denote more than just the analog to digital conversion of the EEG; to further process the data more information is needed. In particular, the stage of digitization entails the creation of a sufficient record of the experiment so as to allow further analyses of the data. Part of this extra information is included in a header to the raw file, such as the subject’s name, the date, the experiment description, the number of channels, the channel descriptions, etc. Another part is the times of occurrence of various events (stimuli). These are recorded in an additional, hidden channel called the mark track, which the user need not be concerned about. Since the raw data are on a sequential access device, (magtape), a summary file, termed the log file, containing a record of each event, its time of occurrence, the condition code, and a set of flags, is also produced.
What is a condition code? Basically, a condition code signifies a particular experimental treatment in further analyses. Often one wishes to present the same sequence of stimuli to the subject while the subject performs different tasks. In this case, the condition code enables one to employ the same stimuli (more importantly, the same event numbers) in both treatments and yet later sort the events into different categories. One condition code has a special meaning, this is code 0 (zero). Condition code 0 (zero) is reserved for the calibration condition, and must always be present. The calibration pulses and event numbers recorded in this condition enable normalization of between channel differences in gain as well as calibration of the absolute magnitude and polarity of the data.
Well then, what are events? Why don’t you just call them stimuli? As alluded to earlier, events usually are the stimuli presented to the subject, and event numbers are used to represent different stimuli. There can be other types of events, however. One important class is subject responses. Pauses in the digitization as well as "delete marks" form another class. Hence, the term event is used to subsume all these instances.
Hopefully, further considerations and details will become apparent in this section as well as the man page describing the digitization program.


Continuous recording of the data on magtape was chosen over digitizing peri-stimulus epochs for a number of reasons. Most of these derive from the ability to perform later processing on the continuous data. These include:

1) Capability of altering epoch length in subsequent analyses without redigitizing.

2) Ability to alter the presampling interval in following analyses without redigitization.

3) Possibility of performing operations on the continuous data prior to or without separation into discrete epochs. These include spectral analyses of the ongoing signals and digital filtering/processing of the data without losing segments near the ends of an epoch.

4) Ability to extract overlapping epochs of data offline, thus considerably reducing the realtime resource drain on the machine. This is especially useful for attention experiments.

5) Ability to recreate the signals using d/a converters and to display and plot interesting sections of the continuous data.
On the minus side are the need for more tapes and extra tape changes in situations where the data of interest comprise only a small part of the recording. (Note, though, that in situations where discrete epochs of interest overlap, this scheme uses less tape than recording of individual epochs!). At 1600 bits per inch this is not too heinous. Assuming 16 channels on a 2400 foot reel at 1600 bpi, we have about 6 inches per 256 point record (all channels included here). Thus, there are about 4800 records on each tape. The total time per tape also depends on the sampling rate. Here are some results (approximate) for both 8 and 16 channels at various sampling rates:

Total Time in Minutes for a 2400 foot Reel recorded at 1600 BPI

Sampling Rate

        64 Hz        128 Hz        256 Hz        512 Hz

32 chans       160        80        40        20

16 chans       320        160         80         40

8 chans        640        320        160         80

Note that these are actual recording times, which generally are not as long as an experiment. Note also that currently dig is capable of sampling at a maximum rate of 750 Hz.

Sampling Rate

The sampling rate employed in recording the data needs further consideration. A number of issues are important here. The first is the bandwidth of the signal being digitized. This must be less than one half the sampling rate, but since extremely good presampling filters are expensive, a compromise is reached using good filters and a sampling rate of about 4 to 6 times the filter bandwidth. This is a topic unto itself and cannot be pursued here.
Another important consideration is the epoch length to be employed. Although this can be altered later, current programs are limited to a 256 point epoch. Thus, 64 Hz corresponds to a 4 second epoch of 256 points, 128 Hz to a 2 second epoch, 256 a 1 second epoch, and 512 Hz to a 500 msec epoch of 256 points. Currently, the upper limit on sampling rate is 750 Hz.
With a fixed 256 points per epoch (currently) the following formulae allow a quick conversion between sampling rate in Hz and the epoch length in msec:
    Epoch        256000
    length    =    __________
    (msec.)        srate (Hz)


    Srate (Hz) =    ___________________
            Epoch Length (msec)

Note also that the number of milliseconds per point can be found by dividing 1000 by the sampling rate in Hz.


The hardware employed in execution of an experiment includes:

1) Stimulus presentation computer, colloquially referred to as the stim machine. This is an IBM PC/AT or compatible computer used to present visual or auditory stimuli. This stim machine is equipped with an EGA (enhanced graphics adapter) and EGA compatible monitor, a hercules compatible monochrome adapter and monitor, and a DT2821 Data Translation card. It is connected to an SPI box (Stimulus Presentation Interface), which facilitates the transmission of stimulus event codes to the digitization computer. The SPI also provides circuitry for output of audio stimuli. Generally one of 2 programs are run on the stim machine for stimulus presentation: vspres for visual stimulus presentation, and aspres for auditory stimulus presentation. Detailed descriptions of the SPI and the stimulus presentation programs are available in separate documents which can be found in the notebook labeled "Stimulus Presentation".

2) Response boxes or "button" boxes. These are devices that provide digital event codes to the digitization computer via DB25 connector. Currently each response box is hard-wired to supply a particular event code when a button is pressed. Each response box should be labeled with the event code that it produces.

3) The polygraph. The subject must be hooked up to the polygraph to record good data. The polygraph is in turn hooked up to the digitization computer.

4) The digitization computer system, colloquially referred to as the dig machine. This is an IBM PC/AT or compatible computer equipped with hardware and software for digitizing and displaying data acquired from the polygraph. Currently the ouput of the polygraph is connected to a DT2801 Data Translation card in the dig machine which performs the analog to digital conversion of the data. The dig machine also has a Scientific Solutions Baseboard which has four digital i/o ports that are used for acquiring event codes (including subject responses) from up to four different sources. The program dig is used to control the digitization process and is discussed in another document. As data is digitized, it is stored either on mag tape or in a disk file, and displayed graphically on a monitor attached to the dig machine.

Programs Used in the Digitizing Process

There are basically two programs used in digitization. The program "dig" stands for "digitization", and actually performs the A/D conversion, the creation of the log and raw files, etc. Another program, "mdh", or "make digitization header", is employed to specify the various parameters required during digitization. The header file created by mdh is then used by dig. For more detailed information on actually using these programs, consult their man pages.

Examining Log Files

Often one wishes to examine log files. Since these are not stored in the form of ASCII characters, a special program, "logexam", is used to accomplish this. logexam is relatively simple to use, and displays all information in a log file. Further information on logexam can be found in its manual entry; since it is so simple to use one can also simply invoke it and type "help".

Examining Raw Files

In a similar vein, it is useful to be able to peruse the digitized EEG contained in the raw file. Currently this is accomplished using "garv" which is outlined below. For more information consult the documentation on garv.



Once digitization is complete, one will probably want to average the data (among other tasks). At this stage, one will have a raw data file (either on dis or mag tape), a log file, and (hopefully) an idea about the averages desired. The log and raw files can be viewed as a sequence of events and associated epochs, since the averaging program, "avg" will extract overlapping epochs (if necessary). The central problem in averaging the data is the sorting of events into categories of interest.
Since the magtape is basically a sequential access device (and even though it can be programmed to retrieve random records, it is slow), it is useful to know what bin(s) to average each event into when it is encountered in the raw file (a bin is a set of channels corresponding to a single averaged ERP when averaging is complete). The general solution to this problem is to make a sequential list of events in the raw and log files, and the bins to which they should be added when encountered on the raw tape. Hence, the events in the log and raw files are given ordinal designations termed item numbers, running from 0 to one less than the total number of events. If the terms item numbers and event numbers are confusing, maybe it would help to think of the event number as an event code, or trigger number and the item number as simply the ordinal position of the event in the long string of stimuli (which is what they are). Hence, the seventh stimulus would be item number 6 (they start at 0) but the event number could be anything (a stimulus, a response, a pause or delete mark, etc.)
The next step is to literally create a sequential list of item numbers and the proper bin number(s) associated with each, termed a binlist file. Since binlists are ASCII files, they can be created using an editor, allowing any conceivable averaging strategy. Generally, however, they are generated using the "cdbl" program. If the cdbl program is going to be used, then a additional step is required to create the binlist program. cdbl uses the log file and a bin descriptor file in generating the sequential list of events and the corresponding bins to which they should be added (the binlist file). The bin descriptor file is simply a description of how to decide which bins a particular event should be averaged into, using the event numbers, condition code, flags, time windows, and sequential dependencies of the events.
At some point, we will have this binlist file, in addition to the log and raw files. If artifact rejection is not required, averaging can commence immediately. Should one want to employ artifact rejection as well, yet another file, the arf file ("artifact rejection function") is needed. This is created with an editor and calibrated using the program "garv" ("get artifact rejection values").
Finally, the averaging program "avg" can be used to process the raw, log, binlist, and optional arf files and produce the desired averages. These averages form a sequential concatenation of the bins mentioned above, containing discrete segments of averaged EEG corresponding to sets of events, (i.e. ERPs), and can be further analyzed or massaged after they are normalized for differences in between channel gains.
The program "normerp" performs this function. As well as normalizing between channel gains, normerp correctly fixes the polarity and absolute magnitude of the data by using the calibration pulses in condition code 0.

Creating a Binlist File

There are at least three ways to create the binlist files needed for averaging raw data. The first is to write a special program to sort the trials in the manner desired. A second method is to create the binlist file using the editor. These two methods are quite general, and "anything goes". The format of a binlist file is described in the section "Program Implementation and Data Formats". The third, most frequently employed method is to run the cdbl program using a bin descriptor file. This is treated more fully in the "cdbl Users’ Manual", and will not be discussed further here.

Averaging the Data

Averaging the data is perhaps the simplest part of the whole procedure. As such, there is not much to said; one should already have a log file, raw file, binlist file, and artifact rejection function file (if desired). Running the program is simple; it is described in the avg program documentation.
One general point is that proper averaging of the data entails a precise matching of the binlist, log and raw files. Because of this, great pains are taken to check the correspondence between these files, and recover from errors. If one consistently encounters "Raw - Log mismatch" errors, or any other type of errors, consult the system administrator.
Averaging the data is basically a process of separating each epoch from the raw data file, determining which bins to average it into, swap each of the appropriate intermediate accumulation areas into main memory, add in the trial ( if not rejected by artifact rejection routines ), and repeat.

Normalizing the Averaged Data

The program "normerp" performs the normalization functions described above. Basically, it involves placing two cursors on calibration pulses in such a way that the voltages at the cursors correspond to the magnitude of the pulse. Using this information (obtained for each channel) and user supplied parameters describing the polarity of the data, the voltage of the calibration pulses, and a resolution factor, scaling is performed to equalize between channel difference and transform the digitized averages into a certain number of points per microvolt. A more detailed description of this process can be found in the documentation for normat.

a Summary of Programs Used in Averaging and Digitizing

Here is a list of the programs performing the functions described above; These are available in the EPL program documentation notebook.

mdh(2) - make digitization header.

dig(2) - digitization.

logexam(1) - examine a logging file.

garv(1) - get artifact rejection values.

cdbl(1) - continuous data binlist.

avg(1) - continuous data averager.

normerp(1) - normalize averaged data at the cursor values.

Table of Contents