Table of Contents

Name

ranova - analysis of variance (ERP statistics)

Synopsis

ranova [-options] [inputfile1 inputfile2 inputfile3...]

Description

ranova is an analysis of variance program conceived as a special case of the general linear model. ranova does not allow nested designs or unequal cell sizes, but is otherwise completely backwards compatible with the anovae program, and uses the same design specification format. It offers several additional options as well, such as Greenhouse-Geisser and Huynh-Feldt adjustments to degrees of freedom and a more rational format for specifying descriptive statistics. Additionally, the design specifications and data can be split up into as many separate files as desired, allowing the user to keep the design information and data separate and to split up or combine data. If no input files are specified, the standard input is read.

Design Specifications

The first set of information required by ranova is a set of design specifications. The specifications should be in the following order: Title (Optional); Factors (Required); Sample Size (Required); Design (Optional); Names (Optional); Levels (Optional); Options (Optional). Comment statements and blank lines can be interspersed anywhere within this information. These statements have the following syntax:
Title titlestring
Factors #factors levels1 levels2 level3...
Sample Size samplesize
Model modeldescription
Names Factor1name Factor2name Factor3name...
Levels Factor1name Level1name Level2name...
Levels Factor2name Level1name Level2name...
Levels Factor3name Level1name Level2name...
Options optionname [parameters]
Comment commentstring

The keywords which begin each line can be abbreviated with the first letter of the keyword (or any string beginning with the same first letter) and can be in upper or lower case. As an example of these statements, consider a mixed model design in which factors A and B are between groups, factors C and D are within groups, and there are 6 subjects per group. Factor A has 3 levels, B has 2 levels, and C and D have 4 levels. A typical set of design specifications would be:

Title ANOVA with 2 between factors and 2 within factors
Factors 5 3 2 4 4 6
Sample Size 1
Model Mixed Design AxBx(CxDxS)
Names Afactor Bfactor Cfactor Dfactor
Options Means A B C
Options Epsilon
C This is a comment field.

The first thing which should be noted is that subjects are considered a factor in any design with repeated measures. In the Factors statement, the first number specifies that there are 5 factors (including subjects) and the 5 following numbers indicate the number of levels in each factor, beginning with factor A and ending with the subjects factor. The sample size is always 1 in designs with repeated measures, but the actual number of subjects per cell would be indicated in the Sample Size statement for a between subjects design. The Model statement specifies a mixed design and indicates which factors are between subjects and which are within subjects (mixed design is specified even for a completely within subjects design). The word "split-plot" may be used instead of "mixed design" (actually any word beginning with "m" or "s" is fine). The factors are always indicated by letters in alphabetical order, with the subjects factor is denoted as S. The between subjects factors must precede the within subjects factors, and the within subjects factors are enclosed within parentheses. It is not, therefore, possible to have model specifications like "AxCx(BxS)" or "(AxBxS)xC". If no model statement is used, a between subjects design is assumed. A between subjects can be specified explicitly by using "between" as the model type. For example, "model between AxB" would be used for a two factor between subjects design.

In addition to the designation of factors as A, B, C, etc., it is also possible to give the factors names, via the "Names" command, which will be used in the anova table. The names of the factors follow the "Names" command in order, beginning with the first factor. If fewer factor names are specified than the total number of factors, then the factors without specified names will revert to the default names (A, B, C, etc.). If a factor name contains non-standard characters such as spaces or commas, then the name should be enclosed in double quotes. In the anova table, interaction effects are named on the basis of the first letters of the constituent factors. If redundant first letters exist, ranova will choose alternative abbreviations for the factors.

The "Levels" command can be used in conjunction with the -L command line option, to allow level names to be generated for a factor instead of numbers, making the final cleave output much more readable. The command begins with the keyword ’Levels’, followed by the name of one of the factors listed in the "Names" command (it must match exactly, case is significant), followed by a name for each level of that factor. There can be as many "Levels" commands as there are factors named with the "Names" command, but not all (in fact none) are required; any factors not having level names specified for them will default back to having numbers generated to differentiate the levels. "Levels" commands can be given in any order, however a "Names" command must precede the "Levels" command. The "Levels" command has no visible effect without the use of the -L command line option.

Options

There are 4 types of Options statements. "Options Epsilon" will cause calculation of the Greenhouse-Geisser and Huynh-Feldt epsilon values to adjust for heterogeneity of variance and covariance. The epsilon values are multiplied by the degrees of freedom for each F, and a new p-value is calculated. Note that the adjusted degrees of freedom are rounded to the nearest integer, which produces slightly different results than would obtain with non-rounded values. Adjusted values are only determined for effects with repeated measures and only when there are more than 1 numerator degrees of freedom. This can be emulated by placing ’-e’ on the command line.

"Options List" produces a list of the input data with the grouping indices so that proper group assignment of each case can be verified. This can be emulated by placing ’-l’ on the command line.

"Options Means" causes means and standard errors to be printed. This statement can be followed by a number, and the statistics will be printed for all cells with at least that number of samples. Alternatively, a list of factor names can be used, as in the above example (the default names, A, B, C, etc., must be used). Statistics will then be printed for all levels of the listed factors and their interactions. This is most useful when one wishes to average across subjects, in which case all of the factors except S would be listed. This option can be emulated by placing ’-m minsamples’ on the command line, where ’minsamples is the minimum number of samples necessary for a cell’s means to be printed. There is no way to specify factor names on the command line. The means printed by ranova can be formatted into tables by the table program.

"Options Compress" causes ranova to compress the output, eliminating unnecessary blank lines, the model summary, and epsilon values that equal 1.0. This option is handy when one wants to reduce the size of the file for printing. It can be emulated by placing ’-c’ on the command line.

Data Format

The design specifications must be completed before the data begin. The data must be ordered such that factor A changes slowest, and subjects change fastest. For example, in a 2-factor design with 3 samples per cell, the data would be in this order: A1B1S1 A1B1S2 A1B1S3 A1B2S1 A1B2S2 A1B2S3 A2B1S1 A2B1S2 A2B1S3 A2B2S1 A2B2S2 A2B2S3. There can be as many values per line as desired as long as there are no more than 255 characters per line. Blank lines are ignored, but comment lines are not allowed.

Output Format

ranova writes to the standard output. After every 60-or-so lines, a form-feed character is printed so that the output will look nicer when printed. The top of every page after the first contains the page number and title.

All of the command information, including comment fields, is printed first. A summary of the design is then printed, followed by the data list or means if requested. These are followed by an ANOVA summary table. In the summary table, effects are grouped according to which error term is used. For each effect, the source, sum of squares, degrees of freedom (unadjusted), mean square, F-value, and p-value are printed. P-values less than .05 are indicated with a "*" symbol. F-values that are smaller than expected by chance at the .05 level are indicated with a "#" symbol. Significantly small F-values usually indicate a violation of the assumptions of ANOVA or the absence of some counterbalanced factor from the design specification. If the epsilon values were calculated, then the adjusted p-values are also printed. There is one epsilon value for each error term in the design, and the epsilon values are printed in a table following the ANOVA summary table.

Command Line Options

-e
Calculates the Greenhouse-Geisser and Huynh-Feldt epsilon values as described above.
-m cellcount
Calculates means and standard errors for all cells with at least cellcount samples.
-l
Lists the input data with the grouping indices so that proper group assignment of each case can be verified.
-L
Sort the input data into column format, and list them suitable for input to the analysis of variance program cleave(1) .
-c
Compresses the output.

See Also

table(1) , cleave(1) , cleaver(1)

Diagnostics

Most of the error messages occur when ranova encounters incorrect design specifications or the wrong number of input data. If the problem is too large for the available memory, ranova will print "Error: Not enough memory for this problem." Insufficient memory can usually be overcome by not calculating the epsilon coefficients.

Author

Steve Luck

Bugs

The calculation of epsilon values can be molasses-slow for large problems due to the matrix multiplication involved.


Table of Contents