Table of Contents
erp_overview - overview of ERP data collection and analysis (ERP manuals)
An Overview of the Continuous Data Digitization and Averaging System
Loosely speaking, an ERP experiment involves a number of stages between
the hypothesis and the advancement of the frontiers. These include
designing
the experiment (including the analysis and the interpretation
of various
outcomes), stimulus production and coding, running subjects
and digitizing
data, averaging and massaging data, measuring
and analyzing data, and
finally interpreting and writing up the data.
This overview focuses on
the digitization and averaging aspects of the
process using the continuous
data system;
more specifically, the relevant hardware, software, and
data structures used in transforming analog data from the polygraph into
digital averages around the epochs of interest.
The process of digitization refers to the conversion of analog data from
the polygraph into a sequence of numbers stored in digital form. These
numbers represent samples of the original data, and are taken at a fixed
rate, termed the
sampling rate,
in the continuous data system. This sequence
of numbers, acquired for
each channel and stored on magtape or as a disk
file, forms the
raw file.
Digitization is usually used to denote more than just the analog to
digital
conversion of the EEG; to further process the data more information
is
needed. In particular, the stage of digitization entails the creation of
a
sufficient record of the experiment so as to allow
further analyses
of the data. Part of this extra information is included
in a header to
the raw file, such as the subject’s name, the date, the
experiment description,
the number of channels, the channel descriptions,
etc. Another part is
the times of occurrence of various events (stimuli).
These are recorded
in an additional, hidden channel called the mark
track, which the user
need not be concerned about.
Since the raw data are on a sequential access
device, (magtape), a summary
file, termed the
log file,
containing
a record of each event, its time of occurrence,
the
condition code,
and a set of flags, is also produced.
What is a condition code? Basically, a condition code signifies a particular
experimental treatment in further analyses. Often one wishes to present
the same sequence of stimuli to the subject while the subject performs
different tasks. In this case, the condition code enables one to employ
the same stimuli (more importantly, the same event numbers) in both
treatments
and yet later sort the events into different categories.
One condition
code has a special meaning, this is code 0 (zero). Condition code
0 (zero)
is reserved for the calibration condition, and must
always
be present.
The calibration pulses and event numbers recorded in this
condition enable
normalization of between channel differences in
gain as well as calibration
of the absolute magnitude and polarity of the
data.
Well then, what are events? Why don’t you just call them stimuli?
As alluded
to earlier, events usually are the stimuli presented to the
subject, and
event numbers are used to represent different stimuli.
There can be other
types of events, however. One important class is
subject responses. Pauses
in the digitization as well as "delete
marks" form another class.
Hence,
the term
event
is used to subsume all these instances.
Hopefully, further considerations and details will become apparent in
this section as well as the man page describing the digitization program.
Continuous recording of the
data on magtape was chosen over digitizing
peri-stimulus epochs for
a number of reasons. Most of these derive from
the ability
to perform later processing on the continuous data. These
include:
1) Capability of altering epoch length in subsequent analyses
without
redigitizing.
2) Ability to alter the presampling interval in
following
analyses without redigitization.
3) Possibility of performing
operations on the continuous data prior
to or without separation into
discrete epochs. These include
spectral analyses of the ongoing signals
and digital
filtering/processing of the data without losing segments
near the ends of an epoch.
4) Ability to extract overlapping epochs of
data offline,
thus considerably reducing the realtime resource drain
on the machine. This is especially useful for attention
experiments.
5)
Ability to recreate the signals using d/a converters
and to display and
plot interesting sections of the continuous
data.
On the minus side are the need for more tapes and extra tape changes
in situations where the data of interest comprise only a small part
of
the recording. (Note, though, that in situations where discrete
epochs
of interest overlap, this scheme uses
less
tape than
recording of individual
epochs!). At 1600 bits per inch this is not
too heinous. Assuming 16 channels
on a 2400 foot reel
at 1600 bpi, we have about 6 inches per 256 point
record (all channels
included here). Thus, there are about 4800 records
on each tape.
The total time per tape also depends on the sampling rate.
Here are
some results (approximate) for both 8 and 16 channels at various
sampling rates:
Total Time in Minutes for a 2400 foot Reel recorded at 1600 BPI
Sampling Rate
64 Hz 128 Hz 256 Hz 512 Hz
32 chans 160 80 40 20
16 chans 320 160 80 40
8 chans 640 320 160 80
Note that these are actual recording times, which generally are not
as
long as an experiment.
Note also that currently dig is capable of sampling
at a maximum rate of
750 Hz.
The sampling rate employed in recording the data needs further
consideration.
A number of issues are important here. The first is
the bandwidth of the
signal being digitized. This must be less than
one half the sampling rate,
but since extremely good presampling filters
are expensive, a compromise
is reached using good filters and a sampling
rate of about 4 to 6 times
the filter bandwidth. This is a topic unto itself
and cannot be pursued
here.
Another important consideration is the
epoch length to be employed. Although
this can be altered later,
current programs are limited to a 256 point
epoch. Thus, 64 Hz corresponds to
a 4 second epoch of 256 points, 128 Hz
to a 2 second epoch, 256 a 1 second
epoch, and 512 Hz to a 500 msec epoch
of 256 points. Currently, the
upper limit on sampling rate is 750 Hz.
With a fixed 256 points per epoch (currently) the following formulae
allow a quick conversion between sampling rate in Hz and the epoch
length
in msec:
Epoch 256000
length = __________
(msec.) srate (Hz)
likewise,
256000
Srate (Hz) = ___________________
Epoch Length (msec)
Note also that the number of milliseconds per point can be found by
dividing
1000 by the sampling rate in Hz.
The hardware employed in execution of an experiment includes:
1) Stimulus
presentation computer, colloquially referred to as the stim
machine.
This is an IBM PC/AT or compatible computer used to present
visual or
auditory stimuli. This stim machine is equipped with an EGA
(enhanced
graphics adapter) and EGA compatible monitor, a hercules
compatible monochrome
adapter and monitor, and a DT2821 Data Translation
card. It is connected
to an SPI box (Stimulus Presentation Interface),
which facilitates the
transmission of stimulus event codes to the
digitization computer. The
SPI also provides circuitry for output of
audio stimuli. Generally one
of 2 programs are run on the stim machine
for stimulus presentation: vspres
for visual stimulus presentation,
and aspres for auditory stimulus presentation.
Detailed
descriptions of the SPI and the stimulus presentation programs
are
available in separate documents which can be found in the notebook
labeled "Stimulus Presentation".
2) Response boxes or "button" boxes. These
are devices that provide
digital event codes to the digitization computer
via DB25 connector.
Currently each response box is hard-wired to supply
a particular event
code when a button is pressed. Each response box should
be labeled with
the event code that it produces.
3) The polygraph. The
subject must be hooked up to the polygraph to
record good data. The polygraph
is in turn hooked up to the digitization
computer.
4) The digitization
computer system, colloquially referred to as the dig
machine. This is
an IBM PC/AT or compatible computer equipped with
hardware and software
for digitizing and displaying data acquired from
the polygraph. Currently
the ouput of the polygraph is connected to a
DT2801 Data Translation card
in the dig machine which performs the analog
to digital conversion of
the data. The dig machine also has a
Scientific Solutions Baseboard which
has four digital i/o ports that are
used for acquiring event codes (including
subject responses) from up to
four different sources. The program dig
is used to control the
digitization process and is discussed in another
document. As data is
digitized, it is stored either on mag tape or in a
disk file, and
displayed graphically on a monitor attached to the dig
machine.
There are basically two programs used in digitization. The program
"dig"
stands for "digitization", and actually performs
the A/D conversion, the
creation of the log and raw files, etc. Another
program, "mdh", or "make
digitization header", is employed to specify
the various parameters required
during digitization. The header file
created by mdh is then used by dig.
For more detailed information
on actually using these programs, consult
their man pages.
Often one wishes to examine log files. Since these are not stored in
the
form of ASCII characters, a special program, "logexam", is used
to accomplish
this. logexam is relatively simple to use, and displays
all information
in a log file. Further information on logexam can be
found in its manual
entry; since it is so simple to use one
can also simply invoke it and
type "help".
In a similar vein, it is useful to be able to peruse the digitized EEG
contained in the raw file. Currently this is accomplished using "garv"
which is outlined below.
For more information consult the documentation
on garv.
Once digitization is complete, one will probably want to average the data
(among other tasks). At this stage, one will have a raw data file (either
on dis or mag tape), a log file,
and (hopefully) an idea about the averages
desired.
The log and raw files can be viewed
as a sequence of events and
associated epochs, since the averaging program,
"avg" will extract overlapping
epochs (if necessary).
The central problem in averaging the data is the
sorting of events
into categories of interest.
Since the magtape is basically a sequential access device (and even though
it
can be programmed to retrieve random records, it is slow), it is useful
to know what
bin(s)
to average each event into when it is encountered
in the raw file (a bin is a
set of channels corresponding to a single
averaged ERP when averaging is
complete).
The general solution to this
problem is to make a sequential list of events
in the raw and log files,
and the bins to which they should be added when
encountered on the raw
tape. Hence, the events in the log and raw files
are given ordinal designations
termed
item
numbers, running from 0 to one less than the total number
of events.
If the terms item numbers and event numbers are confusing, maybe
it
would help to think of the event number as an event code, or trigger
number and the item number as simply the ordinal position of the
event
in the long string of stimuli (which is what they are). Hence,
the seventh
stimulus would be item number 6 (they start at 0) but the
event number
could be anything (a stimulus, a response, a pause or
delete mark, etc.)
The next step is to literally create a sequential list of item numbers
and the proper bin number(s) associated with each,
termed a
binlist
file.
Since binlists are ASCII files, they can be
created using an editor,
allowing any conceivable averaging strategy.
Generally, however, they are
generated using the "cdbl"
program.
If the cdbl program is going to be
used, then a additional step
is required to create the binlist program.
cdbl uses the log file
and a
bin descriptor file
in generating the
sequential list of events and the corresponding
bins to which they should
be added (the binlist file).
The bin descriptor file is simply a description
of how to decide
which bins a particular event should be averaged into,
using the
event numbers, condition code, flags, time windows, and sequential
dependencies of the events.
At some point, we will have this
binlist
file, in addition to the log
and raw files. If artifact rejection is not
required, averaging can commence
immediately. Should one want to employ
artifact rejection as well, yet
another file, the
arf
file ("artifact rejection function") is needed.
This is created with an
editor and calibrated using the program "garv"
("get
artifact rejection values").
Finally, the averaging program "avg" can be used to process the raw, log,
binlist, and optional arf files and produce the desired averages. These
averages form a sequential concatenation of the bins mentioned above, containing
discrete segments of averaged EEG corresponding to sets of events,
(i.e.
ERPs), and
can be further analyzed or massaged
after they are normalized
for differences in between channel gains.
The program "normerp" performs this function. As well as normalizing between
channel gains, normerp correctly fixes the polarity and absolute magnitude
of
the data by using the calibration pulses in condition code 0.
There are at least three ways to create the binlist files needed for
averaging raw data. The first is to write a special program to sort the
trials in the manner desired. A second method is to create the binlist
file using the editor. These two methods are quite general, and "anything
goes". The format of a binlist file is described
in the section "Program
Implementation and Data Formats".
The third, most frequently employed method
is to run
the cdbl program using a bin descriptor file. This is treated
more fully
in the "cdbl Users’ Manual", and will not be discussed further
here.
Averaging the data is perhaps the simplest part of the whole procedure.
As such, there is not much to said; one should already have a log file,
raw file, binlist file, and artifact rejection function file (if desired).
Running the program is simple; it is described in the avg
program documentation.
One general point is that proper averaging of the data entails a precise
matching of the binlist, log and raw files. Because of this, great pains
are taken to check the correspondence between these files, and recover
from errors. If one consistently encounters "Raw - Log mismatch" errors,
or any other type of errors, consult the system administrator.
Averaging the data is basically a process of separating each epoch from
the raw data file, determining which bins to average it into, swap each
of the appropriate intermediate accumulation areas into main memory,
add
in the trial ( if not rejected by artifact rejection routines ),
and repeat.
The program "normerp" performs the normalization functions described above.
Basically, it involves placing two cursors on calibration pulses in such
a way that the voltages
at
the cursors correspond to the magnitude of
the pulse. Using this information
(obtained for each channel) and user
supplied parameters describing the
polarity of the data, the voltage of
the calibration pulses, and a resolution factor,
scaling is performed
to equalize between channel difference and transform
the digitized averages
into a certain number of points per microvolt. A more
detailed description
of this process can be found in the documentation for
normat.
Here is a list of the programs performing the functions described above;
These are available in the EPL program documentation notebook.
mdh(2)
- make digitization header.
dig(2)
- digitization.
logexam(1)
- examine
a logging file.
garv(1)
- get artifact rejection values.
cdbl(1)
- continuous
data binlist.
avg(1)
- continuous data averager.
normerp(1)
- normalize
averaged data at
the cursor values.
Table of Contents