Table of Contents

Name

rawfile - EPL system raw digitized data format (ERP files)

Jonathan C. HansenEPL System Raw Digitized Data Format

The Genesis of Raw Data Files

Under the EPL Continuous Data Digitization and Averaging System, multiple channels of EEG data are recorded continuously and stored on a digital magtape. These files, containing the raw, continuous, EEG data, are termed raw data files. Although the data could be stored on a disk instead, current hardware limitations preclude this luxury. In addition, most investigators will want to retain their data for possible further analyses, and hence the data will end up being stored on tape anyway. The rationale for recording continuous data is described in the document entitled "An Overview of the Continuous Data Digitization and Averaging System"; refer to this document for the advantages and disadvantages of this approach. Currently only the program "cddg" produces these raw data files. Besides a raw file, cddg also produces a disk file termed the log file, which contains a summary of the experimental events, their times of occurrence, and the condition code in effect at the time. This file is essential to further processing of the raw data file and is described further in the document on Log File Format. The present document describes the format in which the raw, continuous data are stored.

Structure of a Single Raw Data File

General

A raw data file, whether on magtape or disk, consists of a header of length 512 bytes followed by multiple records of data. The header contains certain information such as the sampling rate, the number of data channels, and various ASCII descriptions of the experiment and subject. The data records, while of a fixed length within a particular raw data file, can vary in length between raw data files depending on the number of channels of data. These data records are not split into 512 byte sub-records, even though they are always a multiple of 512 bytes in length. Instead, they are single magtape records; this saves a considerable amount of tape that would otherwise be used to form inter-record gaps. If stored on magtape, the end of a raw data file is denoted by an EOF mark.

The Header Record

The first record of any valid raw data file is 512 bytes in length and is termed the header. It uses the same "C" language data structure as do averaged ERP data in the EPL system, but only a few of the slots are or need be filled. The details of the header structure are described in the header document; only those structure elements which need be filled in a raw data file will be described here. The DGMAGIC constant is an arbitrary number which is placed in the first word of a header, and indicates that this header can be used by the digitization program. This number remains in the header when it is written on the raw tape, and can be checked by programs which wish to verify that a file is indeed a raw data file, or that a header has been verified as having parameters that are within a reasonable range for use in digitization. The header element for the magic number is evtno; this slot at one time held the event number in the discrete epoch digitization system. Its name has been retained for possible future reincarnation of that method of digitization. The header elements which absolutely must be filled in a raw data header include: nchans (# of channels), odelay (trigger to stimulus delay in msec), and ctickt (clock period in 10’s of micoseconds). The maximum and minimum values of these structure elements are determined by the particular hardware/software system being used to perform the digitization and thus are specific to each installation. In addition to the above elements, it is recommended that the following ASCII descriptor elements be filled at digitization: 1) chndes (channel descriptions). This array of 128 characters allows room for 16 eight character channel descriptors, or 32 four character descriptions (may be changed to the latter in the future). 2) subdes (subject description). Up to 40 characters of ASCII describing the subject and date should be placed here. 3) expdes (experiment description). The highest level description of the experiment, up to 40 characters, should be placed here. 4) rawname (raw data file name). This optional 16 character slot can be used to hold an ASCII raw data file name for use with the rawtape program. Note that no current programs rely on this name; the ordinal position of the file on the tape is definitive.

The Data Records

In the EPL Continuous Data system, the lone header record is followed by multiple data records of a specific format. As mentioned, these are large, single magtape records to help conserve tape that would normally be used for inter-record gaps. If the data are not stored on magtape, the data records can be conceptualized as the units described here. The first 256 words (512 bytes) of a data record constitute what is termed a mark track. The remainder of the record holds 256 times nchans words (512 times nchans bytes) of raw EEG data. Hence, the length of each data record is (1+nchans)*512 bytes in length. Currently the data points have 12 bit precisions and are hence stored as words; thus there are 256 points for each channel in the data segment of a data record. The data points are written in multiplexed order so that the first point from channel 0 is followed by the first point from channel 1, ..., to the first point from channel (nchans-1). Then, the second points from all nchans channels are written, etc, until 256 samples from all channels have been written. The mark track requires further explanation. This additional "channel" of data holds event numbers of events which occur during digitization of that record. Any event occurring between sample n and n+1 has its event number placed in the mark track at n+1. Thus there is a one sample-interval resolution of times of event occurrences. All other words in the mark track contain 0, (except for the very first entry, see below) thus eliminating 0 as a valid event number. Theoretically, this information should be sufficient to uniquely specify the time at which events occurred during digitization, since time is recorded in units of the sample clock and all pauses and exits occur "between" full 256 point data records. The time of an event (in units of sample clock ticks) should be 256*(record #)+n, where n is the offset of the event in the mark track of record (record #). Practically, however, the raw data is so voluminous and the error rate of the magtape so high (especially when used for digitization) that it is necessary to include some timing information to allow re-synchronization in cases of error. Hence, the very first element of the mark track contains the record number. This subterfuge, along with a valid log file, allows one to recover from data errors with only the loss of a few events. For further information on the relation of log files to raw data files, see the log file format document.

Expected Modifications

It turns out that when data are recorded on magtape one often reads data that are in error but which were not detected as being in error by the tape controller. Hence, one expected modification of the raw data format is to append a longitudinal parity check word and a "checksum" check word to each raw data record. This additional layer of error detection should enhance the reliability of the data regardless of the hardware in use. In addition, the only information present in the log file that is not in the raw file is the condition code. Since this can change from record to record, it too may be appended to the raw record. Finally, if these changes are made, it is likely that the record number will be placed in this suffix so that the first mark slot can be used for events numbers, thus extending the consistency of the format.


Table of Contents