Table of Contents
rawfile - EPL system raw digitized data format (ERP files)
Jonathan
C. HansenEPL System Raw Digitized Data Format
Under the EPL Continuous Data Digitization and Averaging System,
multiple channels of EEG data are recorded continuously and stored on a
digital magtape. These files, containing the raw, continuous, EEG data,
are termed raw data files. Although the data could be stored on a disk
instead, current hardware limitations preclude this luxury. In addition,
most investigators will want to retain their data for possible further
analyses, and hence the data will end up being stored on tape anyway. The
rationale for recording continuous data is described in the document entitled
"An Overview of the Continuous Data Digitization and Averaging System";
refer to this document for the advantages and disadvantages of this approach.
Currently only the program "cddg" produces these raw data files. Besides
a raw file, cddg also produces a disk file termed the log file, which
contains a summary of the experimental events, their times of occurrence,
and the condition code in effect at the time. This file is essential to
further processing of the raw data file and is described further in the
document on Log File Format. The present document describes the format in
which the raw, continuous data are stored.
A raw data file, whether on magtape or disk, consists of a
header of length 512 bytes followed by multiple records of data. The header
contains certain information such as the sampling rate, the number of data
channels, and various ASCII descriptions of the experiment and subject.
The data records, while of a fixed length within a particular raw data
file, can vary in length between raw data files depending on the number
of channels of data. These data records are not split into 512 byte sub-records,
even though they are always a multiple of 512 bytes in length. Instead,
they are single magtape records; this saves a considerable amount of tape
that would otherwise be used to form inter-record gaps. If stored on magtape,
the end of a raw data file is denoted by an EOF mark.
The first record of any valid raw data file is 512 bytes in length and
is termed the header. It uses the same "C" language data structure as do
averaged ERP data in the EPL system, but only a few of the slots are or
need be filled. The details of the header structure are described in the
header document; only those structure elements which need be filled in
a raw data file will be described here. The DGMAGIC constant is an arbitrary
number which is placed in the first word of a header, and indicates that
this header can be used by the digitization program. This number remains
in the header when it is written on the raw tape, and can be checked by
programs which wish to verify that a file is indeed a raw data file, or
that a header has been verified as having parameters that are within a
reasonable range for use in digitization. The header element for the magic
number is evtno; this slot at one time held the event number in the discrete
epoch digitization system. Its name has been retained for possible future
reincarnation of that method of digitization. The header elements which
absolutely must be filled in a raw data header include: nchans (# of channels),
odelay (trigger to stimulus delay in msec), and ctickt (clock period in
10’s of micoseconds). The maximum and minimum values of these structure elements
are determined by the particular hardware/software system being used to
perform the digitization and thus are specific to each installation. In
addition to the above elements, it is recommended that the following ASCII
descriptor elements be filled at digitization: 1) chndes (channel descriptions).
This array of 128 characters allows room for 16 eight character channel
descriptors, or 32 four character descriptions (may be changed to the latter
in the future). 2) subdes (subject description). Up to 40 characters of
ASCII describing the subject and date should be placed here. 3) expdes
(experiment description). The highest level description of the experiment,
up to 40 characters, should be placed here. 4) rawname (raw data file
name). This optional 16 character slot can be used to hold an ASCII raw
data file name for use with the rawtape program. Note that no current programs
rely on this name; the ordinal position of the file on the tape is definitive.
In the EPL Continuous Data system, the lone header record
is followed by multiple data records of a specific format. As mentioned,
these are large, single magtape records to help conserve tape that would
normally be used for inter-record gaps. If the data are not stored on magtape,
the data records can be conceptualized as the units described here. The
first 256 words (512 bytes) of a data record constitute what is termed
a mark track. The remainder of the record holds 256 times nchans words
(512 times nchans bytes) of raw EEG data. Hence, the length of each data
record is (1+nchans)*512 bytes in length. Currently the data points have
12 bit precisions and are hence stored as words; thus there are 256 points
for each channel in the data segment of a data record. The data points are
written in multiplexed order so that the first point from channel 0 is
followed by the first point from channel 1, ..., to the first point from channel
(nchans-1). Then, the second points from all nchans channels are written,
etc, until 256 samples from all channels have been written. The mark track
requires further explanation. This additional "channel" of data holds event
numbers of events which occur during digitization of that record. Any event
occurring between sample n and n+1 has its event number placed in the mark
track at n+1. Thus there is a one sample-interval resolution of times of
event occurrences. All other words in the mark track contain 0, (except
for the very first entry, see below) thus eliminating 0 as a valid event
number. Theoretically, this information should be sufficient to uniquely
specify the time at which events occurred during digitization, since time
is recorded in units of the sample clock and all pauses and exits occur
"between" full 256 point data records. The time of an event (in units of
sample clock ticks) should be 256*(record #)+n, where n is the offset of
the event in the mark track of record (record #). Practically, however,
the raw data is so voluminous and the error rate of the magtape so high
(especially when used for digitization) that it is necessary to include
some timing information to allow re-synchronization in cases of error. Hence,
the very first element of the mark track contains the record number. This
subterfuge, along with a valid log file, allows one to recover from data
errors with only the loss of a few events. For further information on the
relation of log files to raw data files, see the log file format document.
It turns out that when data are recorded on magtape
one often reads data that are in error but which were not detected as being
in error by the tape controller. Hence, one expected modification of the
raw data format is to append a longitudinal parity check word and a "checksum"
check word to each raw data record. This additional layer of error detection
should enhance the reliability of the data regardless of the hardware in
use. In addition, the only information present in the log file that is
not in the raw file is the condition code. Since this can change from record
to record, it too may be appended to the raw record. Finally, if these changes
are made, it is likely that the record number will be placed in this suffix
so that the first mark slot can be used for events numbers, thus extending
the consistency of the format.
Table of Contents