Table of Contents
erp_overview - overview of ERP data collection and analysis (ERP manuals)
An Overview of the Continuous Data Digitization and Averaging System
Loosely speaking, an ERP experiment involves a number of stages between
the hypothesis and the advancement of the frontiers. These include
the experiment (including the analysis and the interpretation
outcomes), stimulus production and coding, running subjects
data, averaging and massaging data, measuring
and analyzing data, and
finally interpreting and writing up the data.
This overview focuses on
the digitization and averaging aspects of the
process using the continuous
more specifically, the relevant hardware, software, and
data structures used in transforming analog data from the polygraph into
digital averages around the epochs of interest.
The process of digitization refers to the conversion of analog data from
the polygraph into a sequence of numbers stored in digital form. These
numbers represent samples of the original data, and are taken at a fixed
rate, termed the
in the continuous data system. This sequence
of numbers, acquired for
each channel and stored on magtape or as a disk
file, forms the
Digitization is usually used to denote more than just the analog to
conversion of the EEG; to further process the data more information
needed. In particular, the stage of digitization entails the creation of
sufficient record of the experiment so as to allow
of the data. Part of this extra information is included
in a header to
the raw file, such as the subject’s name, the date, the
the number of channels, the channel descriptions,
etc. Another part is
the times of occurrence of various events (stimuli).
These are recorded
in an additional, hidden channel called the mark
track, which the user
need not be concerned about.
Since the raw data are on a sequential access
device, (magtape), a summary
file, termed the
a record of each event, its time of occurrence,
and a set of flags, is also produced.
What is a condition code? Basically, a condition code signifies a particular
experimental treatment in further analyses. Often one wishes to present
the same sequence of stimuli to the subject while the subject performs
different tasks. In this case, the condition code enables one to employ
the same stimuli (more importantly, the same event numbers) in both
and yet later sort the events into different categories.
code has a special meaning, this is code 0 (zero). Condition code
is reserved for the calibration condition, and must
The calibration pulses and event numbers recorded in this
normalization of between channel differences in
gain as well as calibration
of the absolute magnitude and polarity of the
Well then, what are events? Why don’t you just call them stimuli?
to earlier, events usually are the stimuli presented to the
event numbers are used to represent different stimuli.
There can be other
types of events, however. One important class is
subject responses. Pauses
in the digitization as well as "delete
marks" form another class.
is used to subsume all these instances.
Hopefully, further considerations and details will become apparent in
this section as well as the man page describing the digitization program.
Continuous recording of the
data on magtape was chosen over digitizing
peri-stimulus epochs for
a number of reasons. Most of these derive from
to perform later processing on the continuous data. These
1) Capability of altering epoch length in subsequent analyses
2) Ability to alter the presampling interval in
analyses without redigitization.
3) Possibility of performing
operations on the continuous data prior
to or without separation into
discrete epochs. These include
spectral analyses of the ongoing signals
filtering/processing of the data without losing segments
near the ends of an epoch.
4) Ability to extract overlapping epochs of
thus considerably reducing the realtime resource drain
on the machine. This is especially useful for attention
Ability to recreate the signals using d/a converters
and to display and
plot interesting sections of the continuous
On the minus side are the need for more tapes and extra tape changes
in situations where the data of interest comprise only a small part
the recording. (Note, though, that in situations where discrete
of interest overlap, this scheme uses
recording of individual
epochs!). At 1600 bits per inch this is not
too heinous. Assuming 16 channels
on a 2400 foot reel
at 1600 bpi, we have about 6 inches per 256 point
record (all channels
included here). Thus, there are about 4800 records
on each tape.
The total time per tape also depends on the sampling rate.
some results (approximate) for both 8 and 16 channels at various
Total Time in Minutes for a 2400 foot Reel recorded at 1600 BPI
64 Hz 128 Hz 256 Hz 512 Hz
32 chans 160 80 40 20
16 chans 320 160 80 40
8 chans 640 320 160 80
Note that these are actual recording times, which generally are not
long as an experiment.
Note also that currently dig is capable of sampling
at a maximum rate of
The sampling rate employed in recording the data needs further
A number of issues are important here. The first is
the bandwidth of the
signal being digitized. This must be less than
one half the sampling rate,
but since extremely good presampling filters
are expensive, a compromise
is reached using good filters and a sampling
rate of about 4 to 6 times
the filter bandwidth. This is a topic unto itself
and cannot be pursued
Another important consideration is the
epoch length to be employed. Although
this can be altered later,
current programs are limited to a 256 point
epoch. Thus, 64 Hz corresponds to
a 4 second epoch of 256 points, 128 Hz
to a 2 second epoch, 256 a 1 second
epoch, and 512 Hz to a 500 msec epoch
of 256 points. Currently, the
upper limit on sampling rate is 750 Hz.
With a fixed 256 points per epoch (currently) the following formulae
allow a quick conversion between sampling rate in Hz and the epoch
length = __________
(msec.) srate (Hz)
Srate (Hz) = ___________________
Epoch Length (msec)
Note also that the number of milliseconds per point can be found by
1000 by the sampling rate in Hz.
The hardware employed in execution of an experiment includes:
presentation computer, colloquially referred to as the stim
This is an IBM PC/AT or compatible computer used to present
auditory stimuli. This stim machine is equipped with an EGA
graphics adapter) and EGA compatible monitor, a hercules
adapter and monitor, and a DT2821 Data Translation
card. It is connected
to an SPI box (Stimulus Presentation Interface),
which facilitates the
transmission of stimulus event codes to the
digitization computer. The
SPI also provides circuitry for output of
audio stimuli. Generally one
of 2 programs are run on the stim machine
for stimulus presentation: vspres
for visual stimulus presentation,
and aspres for auditory stimulus presentation.
descriptions of the SPI and the stimulus presentation programs
available in separate documents which can be found in the notebook
labeled "Stimulus Presentation".
2) Response boxes or "button" boxes. These
are devices that provide
digital event codes to the digitization computer
via DB25 connector.
Currently each response box is hard-wired to supply
a particular event
code when a button is pressed. Each response box should
be labeled with
the event code that it produces.
3) The polygraph. The
subject must be hooked up to the polygraph to
record good data. The polygraph
is in turn hooked up to the digitization
4) The digitization
computer system, colloquially referred to as the dig
machine. This is
an IBM PC/AT or compatible computer equipped with
hardware and software
for digitizing and displaying data acquired from
the polygraph. Currently
the ouput of the polygraph is connected to a
DT2801 Data Translation card
in the dig machine which performs the analog
to digital conversion of
the data. The dig machine also has a
Scientific Solutions Baseboard which
has four digital i/o ports that are
used for acquiring event codes (including
subject responses) from up to
four different sources. The program dig
is used to control the
digitization process and is discussed in another
document. As data is
digitized, it is stored either on mag tape or in a
disk file, and
displayed graphically on a monitor attached to the dig
There are basically two programs used in digitization. The program
stands for "digitization", and actually performs
the A/D conversion, the
creation of the log and raw files, etc. Another
program, "mdh", or "make
digitization header", is employed to specify
the various parameters required
during digitization. The header file
created by mdh is then used by dig.
For more detailed information
on actually using these programs, consult
their man pages.
Often one wishes to examine log files. Since these are not stored in
form of ASCII characters, a special program, "logexam", is used
this. logexam is relatively simple to use, and displays
in a log file. Further information on logexam can be
found in its manual
entry; since it is so simple to use one
can also simply invoke it and
In a similar vein, it is useful to be able to peruse the digitized EEG
contained in the raw file. Currently this is accomplished using "garv"
which is outlined below.
For more information consult the documentation
Once digitization is complete, one will probably want to average the data
(among other tasks). At this stage, one will have a raw data file (either
on dis or mag tape), a log file,
and (hopefully) an idea about the averages
The log and raw files can be viewed
as a sequence of events and
associated epochs, since the averaging program,
"avg" will extract overlapping
epochs (if necessary).
The central problem in averaging the data is the
sorting of events
into categories of interest.
Since the magtape is basically a sequential access device (and even though
can be programmed to retrieve random records, it is slow), it is useful
to know what
to average each event into when it is encountered
in the raw file (a bin is a
set of channels corresponding to a single
averaged ERP when averaging is
The general solution to this
problem is to make a sequential list of events
in the raw and log files,
and the bins to which they should be added when
encountered on the raw
tape. Hence, the events in the log and raw files
are given ordinal designations
numbers, running from 0 to one less than the total number
If the terms item numbers and event numbers are confusing, maybe
would help to think of the event number as an event code, or trigger
number and the item number as simply the ordinal position of the
in the long string of stimuli (which is what they are). Hence,
stimulus would be item number 6 (they start at 0) but the
could be anything (a stimulus, a response, a pause or
delete mark, etc.)
The next step is to literally create a sequential list of item numbers
and the proper bin number(s) associated with each,
Since binlists are ASCII files, they can be
created using an editor,
allowing any conceivable averaging strategy.
Generally, however, they are
generated using the "cdbl"
If the cdbl program is going to be
used, then a additional step
is required to create the binlist program.
cdbl uses the log file
bin descriptor file
in generating the
sequential list of events and the corresponding
bins to which they should
be added (the binlist file).
The bin descriptor file is simply a description
of how to decide
which bins a particular event should be averaged into,
event numbers, condition code, flags, time windows, and sequential
dependencies of the events.
At some point, we will have this
file, in addition to the log
and raw files. If artifact rejection is not
required, averaging can commence
immediately. Should one want to employ
artifact rejection as well, yet
another file, the
file ("artifact rejection function") is needed.
This is created with an
editor and calibrated using the program "garv"
artifact rejection values").
Finally, the averaging program "avg" can be used to process the raw, log,
binlist, and optional arf files and produce the desired averages. These
averages form a sequential concatenation of the bins mentioned above, containing
discrete segments of averaged EEG corresponding to sets of events,
can be further analyzed or massaged
after they are normalized
for differences in between channel gains.
The program "normerp" performs this function. As well as normalizing between
channel gains, normerp correctly fixes the polarity and absolute magnitude
the data by using the calibration pulses in condition code 0.
There are at least three ways to create the binlist files needed for
averaging raw data. The first is to write a special program to sort the
trials in the manner desired. A second method is to create the binlist
file using the editor. These two methods are quite general, and "anything
goes". The format of a binlist file is described
in the section "Program
Implementation and Data Formats".
The third, most frequently employed method
is to run
the cdbl program using a bin descriptor file. This is treated
in the "cdbl Users’ Manual", and will not be discussed further
Averaging the data is perhaps the simplest part of the whole procedure.
As such, there is not much to said; one should already have a log file,
raw file, binlist file, and artifact rejection function file (if desired).
Running the program is simple; it is described in the avg
One general point is that proper averaging of the data entails a precise
matching of the binlist, log and raw files. Because of this, great pains
are taken to check the correspondence between these files, and recover
from errors. If one consistently encounters "Raw - Log mismatch" errors,
or any other type of errors, consult the system administrator.
Averaging the data is basically a process of separating each epoch from
the raw data file, determining which bins to average it into, swap each
of the appropriate intermediate accumulation areas into main memory,
in the trial ( if not rejected by artifact rejection routines ),
The program "normerp" performs the normalization functions described above.
Basically, it involves placing two cursors on calibration pulses in such
a way that the voltages
the cursors correspond to the magnitude of
the pulse. Using this information
(obtained for each channel) and user
supplied parameters describing the
polarity of the data, the voltage of
the calibration pulses, and a resolution factor,
scaling is performed
to equalize between channel difference and transform
the digitized averages
into a certain number of points per microvolt. A more
of this process can be found in the documentation for
Here is a list of the programs performing the functions described above;
These are available in the EPL program documentation notebook.
- make digitization header.
a logging file.
- get artifact rejection values.
- continuous data averager.
averaged data at
the cursor values.
Table of Contents