HOW TO USE CLEAVE by Tim Herron January 30, 2005 Version ----------------------------------------------------- TABLE OF CONTENTS INTRODUCTION 0. THE BASICS OF USING CLEAVE CALLING CLEAVE ANOVA DESIGNS SPECIFYING INPUT FILES 1. OPTIONAL FEATURES AND HOW TO INVOKE THEM 2. SAMPLE CLEAVE OUTPUT 3. NOTES ON USING CLEAVE’S FEATURES USING VARIOUS ANOVA DESIGNS AND PRODUCING F VALUES USING BOX-GEISSER-GREENHOUSE CORRECTION VALUES USING SOURCE TREATMENT MAGNITUDES USING POWER/SUBJECT TABLES USING POST-HOC AND PAIRWISE TESTS DETECTION OF OUTLIERS DETECTING AND CORRECTING ANOVA DATA SET DESIGNS 4. PROGRAM LIMITATIONS 5. REFERENCES 6. ERROR AND INFORMATIONAL MESSAGES 7. CONTACT INFORMATION ----------------------------------------------------- INTRODUCTION CLEAVE is a UNIX-style program which performs Analysis of Variance (ANOVA) computations on text files containing experimental data. The program should work for balanced and proportional designs with crossed data sets of up to 15 fixed and/or random factors each having arbitrarily many levels. And any factor can be of the "between" or "within" (repeated measures) variety. CLEAVE is intended to be a very fast ANOVA program which can handle very large data sets. As such, the program is rather streamlined and does not spend effort in having a user friendly interface. Nonetheless, we aim for elegant design and flexibility. Creation of the CLEAVE program was inspired by the desire to remove two particular ANOVA assumptions which often do not hold in experiments: (a) equality of factor variances, and (b) sphericity of factor covariances. Much effort has been taken to adjust for situations when these two assumptions do not hold. A secondary goal for the program is to provide the user with important complementary information which augments the basic ANOVA F test, which by itself does not give the user much information about an experiment’s factors’ effects. Information such as magnitude of effect, power calculations, and pairwise significance tests help the user understand the importance of each factor or factor combination in determining the experimental output. Reporting this kind of auxilliary information is quickly becoming a requirement in many fields in order to ensure the publication of an experiment’s results. CLEAVE (or "cleave.exe" for Dos/Windows) uses a configuration file "cleave.cnf" to decide which of several optional computations to perform on the experimental data. In addition to computing the usual sums of squares, F and probability values on the data, CLEAVE can perform the following useful algorithms: (1) CLEAVE computes corrections for factor (co)variance anomolies. (2) CLEAVE can compute treatment magnitude effects. (3) Some post-hoc test statistics can be computed. (4) CLEAVE computes post-hoc power values. (5) CLEAVE can handle some designs with random factors Finally, we include the source code for the CLEAVE program so that the user can modify the program to include more features (or just to squash bugs). This document has 8 parts (not including this intro): 0) The basics on how to use CLEAVE. 1) A very brief discussion of the optional features and how to invoke them using the configuration file. 2) A walk through of an example output which highlights the optional features’ output. 3) Notes on using CLEAVE features and their use in analyzing statistical data. 4) Computational and statistical limitations of CLEAVE. 5) References which could help you get the most from CLEAVE. 6) Error and Informational Messages in CLEAVE 7) Contact information so that you can complain to me. Probably the fastest way of reading this document is to read sections 0) and then 2) to understand how to run the program and read its output. Then one can skim section 1) and read section 3) to learn how and why to invoke CLEAVE’s special features. Finally, 4)-6) are available as reference. 0. THE BASICS OF USING CLEAVE CALLING CLEAVE CLEAVE takes as input text files containing experimental data in column format and produces as output a text file containing the results of the ANOVA analysis. Ordinary text editors can be used to produce the input and likewise to edit the output. A typical invocation of CLEAVE looks as follows on your command line: /home/tjherron>cleave <anova_data >anova_output where "anova_data" is a space- or tab-delimited file containing the experimental data (one outcome per row), and "anova_output" is text file output. Note that CLEAVE uses the standard input and output sources, and so if one wants to use files on a hard disk (or some other media source), the user needs to specify that using the command-line redirection characters "<" and ">". Of course one can also use pipes ("|") to specify input and output files. E.g. if one is happy to allow the output to whiz by them on the computer screen, one can type: /home/tjherron>cleave <anova_data but it would be more pleasant to type /home/tjherron>cleave <anova_data | more in order to view the output file one screen at a time. ANOVA DESIGNS The following kinds of ANOVA designs can be processed by CLEAVE’s algorithms: 1) Fixed, Random and Mixed designs 2) Highly Multi-Way ANOVA designs 3) Between, Within (Repeated Measures), and Mixed designs 4) Balanced and Proportional Designs Any combination of these designs can be fully handled with a few exceptions. The following two types of designs cannot have their F statistics computed by CLEAVE. First, proportional designs cannot have random factors in them (i.e. Random or Mixed designs) not counting the subject factor (= the first column, which is always random). Second, unbalanced designs of any type will not have F statistics computed by CLEAVE, except for certain kinds of very slightly unbalanced designs. For these two types of experimental design, the only output that CLEAVE produces are the pairwise comparisons that are scheduled by "cleave.cnf" because those computations are independent of the F statistic that CLEAVE is designed to compute. CLEAVE can correctly detect whether a factor is a within factor or between factor even when there are certain types of errors in the data set which cause a data set to be imbalanced. Further, if the data set is nearly balanced or proportional (perhaps due to a typo in the data set or a script which generates a data set file), CLEAVE can attempt to correct the flaw as well as inform the user which data is missing or is in excess. Finally, and unfortunately, CLEAVE cannot currently handle nested designs of any sort at all, and it cannot detect properly that the design is nested. The pairwise comparison results that it prints out, however, may well be accurate, but whether they are always good is not known at present (CLEAVE attempts to guard against those cases in which the pairwise comparison results may be misleading - but it is not foolproof). SPECIFYING INPUT FILES Specifying the input file is straightforward. The user needs to produce a text file which contains the same number of columns on each line, in the following format. The final column must contain the experimental outcomes, the first column must contain the identifiers for the experimental subjects, and each of the middle columns contain the levels for one unique factor being tracked in the experiment. Each line of the input file contains information about one experimental outcome, e.g. 14 Left 3 75.4 records that the 14th experimental subject had outcome of 75.4 where "Left" was the level of the first factor and "3" was the level of the 2nd factor. Columns can be separated using whitespace or tabs in any combination and the separators do not have to consistent from line to line (or even within a line for that matter). If the user wishes to include whitespace in a level’s name, then quotes (double or single) can be used to include it into the level’s name, e.g.: Jim "First Group" Sleep -4.2 Note that in the above one can include a single quote inside a pair of delimiting double quotes (and vice- versa is possible). We have included the following example text files for illustration of what a text input file can look like: proportional - a 2x4x3 proportional ANOVA design with one varying factor bogartz6 - a 3x3x3 ANOVA design with one repeated measures factor (and 2 "between" factors). repeats - a 4x5 repeated measures ANOVA design which relys on CLEAVE to create a REPEATS factor CLEAVE reads in the input files and figures out the following properties of the experimental design: a) how many factors there are b) the number and names of each factor’s levels c) which factors to ignore (any factor with only 1 level) d) whether a factor is a repeated measures factor e) whether a factor is proportionally varying or is ’flat’ f) whether the ANOVA design is balanced or unbalanced g) whether to introduce a new factor for duplicate data lines (a "REPEATS" factor) The user needs to specify only two basic things about the experiment. First, the user can specify the factor names on the command line before the input and output files are specified, as e.g. /home/tjherron>cleave Side Violations <anova_data >anova_output specifies that the first factor be called "Side" and the second factor is named "Violations". If the user does not specify factor names, CLEAVE assumes that the first factor is called "A", the second, "B", etc. Second, the user needs to specify which factors are random factors and which are fixed factors. The default is to assume that all non-subject factors are fixed. One edits the (text) file "cleave.cnf" and puts the appropriate multi-bit index in the following line(s) appearing in "cleave.cnf": 1 Indicates which factors are random variables = Sum_{c = random factor column number} 2^{c} This should always be odd since the subjects column (column 1) is always a random factor The default above, "1", assumes that all factors are fixed except for the initial subject factor (which is always random). If, e.g. the user knows that the first and third factors (i.e. the second and fourth columns in the input file) are the only random factors out of the 4 appearing in the input file, then s/he would place 11 (= 1*1 + 1*2 + 0*4 + 1*8 + 0*16) in place of 1 above in "cleave.cnf". The other parts of the file "cleave.cnf" control the options that can be turned on in CLEAVE, and these will be described in the following sections. 1. OPTIONAL FEATURES AND HOW TO INVOKE THEM In addition to having the cleave program in your working path, you will want to have the "cleave.cnf" file accessible to the program in your current directory. Inside the configuration file, which you can edit with any text editor, there are many program options. The following sections inside the configuration file are general program options useful for even the most basic ANOVA: --- General Parameter Section 10 Maximum number of ANOVA Factors [Can use a maximum of 15 unless you change variable MAXFACTOR in cleave.h and recompile] 1024 Maximum number of Levels for each Factor Section concerning Significance Levels 0.10 Lowest Significance "p" Level "Sig[0]" 0.05 Middle Significance "p" Level "Sig[1]" 0.01 Highest Significance "p" Level "Sig[2]" --- The first section is useful in managing the amount of memory that CLEAVE uses (the smaller both parameters are the better). The second section is useful for telling CLEAVE when to alert you to statistically significant results by indicating them with "*"s in various places. More specialized CLEAVE options are outlined below. First, CLEAVE can make adjustments to deal with (co)variance variations within factor levels in repeated measures designs. To invoke these adjustments, go into the "cleave.cnf" file and look for the lines: --- Section concerning Lack of Uniform Data Variance/Covariance 1 Between-Factors Box Correction Computed? Yes = 1; No = 0 1 G-G Computed only When Needed? Yes = 1; No = 0 4 Compute Geisser Greenhouse Epsilons for Interactions of at Most This Order (0 = None) --- These allow the program to compute Geisser-Greenhouse and related epsilon factors - these factors attempt to take into account distortion produced by correlation amongst factor levels. It does this by reducing the effective degrees of freedom used by F values that the CLEAVE program computes. The first parameter above allows the user to compute Geisser-Greenhouse-like epsilons for non- repeated measures factor. Two such epsilons are computed for each factor, one to correct the F’s numerator degrees of freedom and the other to correct the denominator df. The parameter labelled "G-G Computed only When Needed" is a way to restrict degree of freedom adjustments to the times when the unadjusted "p" values are judged to be significant. Note that in the "cleave.cnf" file we have noted where the user can select 3 levels of significance, and the above parameter restricts Geisser-Greenhouse adjustments from being computed to when the unadjusted "p" value is less than the largest level of significance (= "Sig[0]"). The third parameter above allows the user to restrict adjustments of degrees of freedom to those main and interaction terms of a certain degree or less. This is useful because it can take a very long time to compute the Geisser-Greenhouse adjustments. (See the Limitations section of this document for more information on the speed of computing Geisser-Greenhouse epsilons). The results of the computational adjustments appear in the second half of the CLEAVE ouput where the "F values" are computed. Second, CLEAVE enhances the ANOVA output by computing treatment magnitude effects - specifically partial omega squared and partial nu squared - within each of the source sections where F values are computed. In addition, if one of the above Geisser-Greenhouse options has been selected and computed for a given source, then the partial omega squared treament magnitude will be recomputed taking into account the Geisser- Greenhouse correction for nonspherical covariances. One can choose to compute treatment magnitudes by the changing the "cleave.cnf" at section: --- Section concerning Treatment Magnitudes and Power 1 Treatment Magnitudes Computed On = 1; Off = 0 1 Treatment Magnitudes List On = 1; Off = 0 2 Compute Power/# of Subjects Table On = 1; Off = 0 (to do only when F is significant = 2) --- In addition, after the F values for each source factor (main effects and all interactions) along with the new additions have been printed, CLEAVE can generate a convenient listing of each of the source factors by partial omega squared treatment magnitude and significance level that has been exceeded. And the program indicates whether the Geisser-Greenhouse correction has been used in computing both values. See the Example section of this document to see what a treatment magnitude list looks like. Third, note that in the above section, one can choose to compute a power/subjects table from the treatment magnitudes. This table provides a post-hoc analysis that can be used in experiment trial runs to help determine how many subjects are likely to be needed to run a successful experiment (one where statistical significance is attained if there is real experimental effect). The output format of the table is easy to use (see the Example section), and the table’s output is adjusted using the Geisser-Greenhouse factor if the latter parameter is computed for the source factor in question. Fourth, there are a number of post-hoc tests and values that can be computed. The options are detailed in "cleave.cnf" as: --- Section concerning Post-Hoc Tests 1 Scheffe Post-Hoc Test Computed On = 1; Off = 0 1 Compute Pairwise Significance Comparisons for Interaction Terms of at most this Order 2 Pairwise Comparison Type: 0 base code w/ options: Control is +1, All-Pairwise is +2, and to use Plain Joint Factors is +4 (default value is 2) 0 Use Bonferroni (=0) or Sidak (=1) Pairwise Probabilities inside Pairwise Tests 0 Perform Simulataneous (=0) or Sequential (=1) Pairwise Significance Tests 1 Indicate which Significance Level to use (Sig[x]) in determining pairwise familywise significance 1 Indicates whether or not to use one pooled error in computing all pairwise tests: 1 = Yes , 0 = No --- One post-hoc test value that can be computed for each of the source sections is the Scheffe Test, which controls experimentwise error for any number of specific linear combination tests on the source levels. As such it provides a very high barrier to making experimentwise errors. As with treatment magnitudes, if one of the Geisser-Greenhouse options has been selected by the user and those epsilons are computed for a given source, the Scheffe value is recomputed in light of the source factor’s covariance effect. The other options tell CLEAVE to compute one or two of 16 different pairwise "t" tests that can help identify a pair of factor levels which are significantly different from one another. First, one can tell CLEAVE to compute all pairwise differences between a control level and all other levels, or to compute all possible pairwise difference (or you can compute both). Second, one can choose to use Bonferroni or Sidak probability corrections with which to control the familywise error. Third, one can use a sequential (=step-wise,"Holm") analysis to improve the chance of finding significant pairwise level differences. Fourth, one can choose to compute pairwise differences on the ordinary joint factors level means or using the ANOVA model’s interaction level means. Finally, as with the Geisser-Greenhouse correction, one can restrict CLEAVE to computing pairwise differences to interaction source terms of a certain degree. The last two parameters that the user can set to control pairwise comparison tests are to choose which of the three significance values to use as the familywise significance level, and to choose whether or not to use a pooled variance error in all of the pairwise comparisons (rather than using the variances computed directly from each level of the source term pairs). The next section of the configuration file deals with random factors that might appear in the experimental design. --- Section concerning Factor Types 1 Indicates which factors are random variables = Sum_{c = random factor column number} 2^{c} This should always be odd since the subjects column (column 1) is always a random factor 0 Satterthwaite F Computation Type Use approximations of effect and error = 0 Use approximations in denominator only = 1 0 Compute Pairwise Post-Hoc Results for Sources w/ Random Components. 0 = No, 1 = Yes --- We already saw in the previous section how to use the first subsection entry to specify which factors are random factors. The second parameter which can be set tells CLEAVE exactly how to compute the quasi-F values which sometimes must be computed in order to compare a source term’s effect to the appropriate error term. The default "0" requests that CLEAVE use quasi-Fs in both the numerator and denominator, and has the advantage that the F value cannot be negative. While a "1" tells CLEAVE to only use quasi-F calculations in the denominator (error) term. The third parameters allows the user to compute pariwise post- hoc tables for source terms with random components - which ordinarily the user will not want since the levels of such a source will not be all that interesting (they were just randomly selected after all). --- Section concerning Repairing/Detecting Problematic Data Sets 0.01 Percentage (in %) of missing data that can be interpolated to balance a design (use 0.0 to turn this feature off) 0.01 Percentage (in %) of excess "entangled" data that can be excised to balance a design. (use 0.0 to turn this feature off) 24 Maximum Number of bins for the data histogram for detecting outliers (minimum of 1) 0 Simply average repeated data values (=1) or create a new REPEATS factor (=0) --- This penultimate section of the "cleave.cnf" file helps deal with data sets which are deficient in some way. The first two parameters help the user deal with slightly imbalanced data sets - ones with a little bit too much data or too little data. In both, the user selects a percentage value which tells CLEAVE to interpolate or delete data points provided that the number of missing or extra, respectively, are less than the specified percentage (compared to the amount of present data). E.g. in the template data file above, 0.1% are the thresholds selected, and so if CLEAVE detects that a data file has fewer than 1 in 1000 data points missing from being a balanced data set, CLEAVE will attempt to interpolate the missing points by using the cells means and standard deviations. The third parameter above ("24") tells CLEAVE the maximum size of the histogram which appears at the start of the output file. E.g. one might wish to increase this size to help detect the exact placement of an outlier data point. Finally, the fourth parameter listed above instructs CLEAVE on how to treat duplicate data lines - those lines whichappearing in the data set which are identical save for the outcome value in the last column. CLEAVE can either simply average all identical duplicates’s values together or create a new factor to divide the duplicate data into separate cells. Finally, the last section of parameters helps the user control the output by selecting factor interaction orders: --- Section concerning Restricting data output via interaction orders 6 Order of Interaction Levels of Basic Statistics to Output (0 = output them all [default]) 0 Order of Interaction (and above) to classify as part of the error term (=0 to turn off) 4 Order of Interaction at which to compute GG epsilons using an approximation routine. --- The first parameter can be used to shorten output considerably when dealing with highly multi-way ANOVA designs - one’s where the joint factor means are of little interest. The second parameter is used to specify that the ANOVA model you desire chops off fixed effects at a certain interaction order and thereby pools those sums of squares into appropriate error terms. The final parameter is used to speed up the computation of high order interaction GG epsilons by using an approximate routine that is exact only for uniformly covarying levels. In the next section we see how most of these options come together to form the output of CLEAVE. 2. SAMPLE CLEAVE OUTPUT The enhanced output of CLEAVE is demonstrated in the following example from the output file using the command "cleave Position <repeats >output" with the "cleave.cnf" file configured as in the previous section: ***************************************************************************** CLEAVE: Duplicate data lines...creating new REPEATS factor. CLEAVE (c) Timothy Herron January 30, 2005 Version Data Histogram ---------------------------------------------------- Bin Left Edge # of Data Points Bin Right Edge ---------------------------------------------------- 2 3 3.5 3.5 7 5 5 18 6.5 6.5 7 8 8 12 9.5 9.5 4 11 11 7 12.5 12.5 2 14 ---------------------------------------------------- Source: Grand Mean Positio Repeats N Mean Std Dev Normed Ranges 60 7.2500 2.7778 **|** Subjects: S1 12 6.3333 2.8391 **|** S2 12 7.1667 2.7579 **|** S3 12 7.0833 2.8431 *|** S4 12 7.5000 2.7136 *|** S5 12 8.1667 2.8868 *|** Source: Position Positio Repeats N Mean Std Dev Normed Ranges Center 20 7.3000 2.6378 **|** Left 20 6.1000 2.2688 **|** Right 20 8.3500 3.0310 *|** Source: Repeats Positio Repeats N Mean Std Dev Normed Ranges 1 15 4.2667 1.0998 **|** 2 15 6.4000 1.2984 **|** 3 15 7.4000 1.3522 **|** 4 15 10.933 1.7099 **|** Source: Position Repeats Positio Repeats N Mean Std Dev Normed Ranges Center 1 5 4.2000 0.83666 *|* Center 2 5 6.4000 0.54772 *|* Center 3 5 7.6000 1.1402 *|* Center 4 5 11.000 0.70711 *|* Left 1 5 3.6000 1.1402 *|* Left 2 5 5.2000 0.83666 *|* Left 3 5 6.4000 0.89443 **|* Left 4 5 9.2000 0.83666 *|* Right 1 5 5.0000 1.0000 *|* Right 2 5 7.6000 1.1402 *|* Right 3 5 8.2000 1.4832 *|* Right 4 5 12.600 1.3416 *|* FACTOR LEVELS TYPE VARIABLE DIMENSION BALANCE ---------------------------------------------------------------------------- SUBJECTS 5 Random Crossed Position 3 Within Fixed Crossed Uniform Repeats 4 Within Fixed Crossed Uniform Values 60 Data ---------------------------------------------------------------------------- SS df Eta^2 (R^2) Total Sum Squared: 455.25 59 S/ 21.3333333 4 0.0469 SOURCE SS df MS F p Position 50.7 2 25.35 33.99 0.0001 *** PS/ 5.96666667 8 0.7458333 Partial Omega^2: 0.8919 Scheffe Test p=0.050: 8.92 Partial Eta^2: 0.8947 Eta^2 (R^2): 0.1114 Lower Bound Epsilon: 0.5000 0.0043 *** Box-Geisser-Greenhouse Epsilon: 0.5376 0.0033 *** Huynh-Feldt Epsilon: 0.5772 0.0025 *** Scheffe p=0.050 (GG): 14.42 Power(GG)=> 0.50 0.70 0.80 0.90 0.95 0.99 Sig=0.100 2 2 2 3 3 3 subjects Sig=0.050 3 3 3 3 3 4 subjects Sig=0.010 4 4 4 5 5 5 subjects Pairwise Comparisons; Familywise Error: 0.0500 ; Bonferroni Prob.: 0.0167 0.0500 1 1 -1.15 2 0.0044* 2 1.10 3 0.0088* 0.0001* 3 1 2 3 Cente Left Right SOURCE SS df MS F p Repeats 348.183333 3 116.0611 125.85 0.0000 *** RS/ 11.0666667 12 0.9222222 Partial Omega^2: 0.9690 Scheffe Test p=0.050: 10.47 Partial Eta^2: 0.9692 Eta^2 (R^2): 0.7648 Lower Bound Epsilon: 0.3333 0.0004 *** Box-Geisser-Greenhouse Epsilon: 0.4687 0.0000 *** Huynh-Feldt Epsilon: 0.6465 0.0000 *** Scheffe p=0.050 (GG): 17.24 Power(GG)=> 0.50 0.70 0.80 0.90 0.95 0.99 Sig=0.100 2 2 2 2 2 2 subjects Sig=0.050 2 2 2 2 2 2 subjects Sig=0.010 2 3 3 3 3 3 subjects Pairwise Comparisons; Familywise Error: 0.0500 ; Bonferroni Prob.: 0.0083 -2.98 1 1 -0.850 2 6e-06* 2 0.150 3 9e-08* 0.0036* 3 3.68 4 2e-11* 1e-09* 2e-08* 4 1 2 3 4 1 2 3 4 SOURCE SS df MS F p PR 5.96666667 6 0.9944444 1.98 0.1080 PRS/ 12.0333333 24 0.5013889 Partial Omega^2: 0.1973 Scheffe Test p=0.050: 15.05 Partial Eta^2: 0.3315 Eta^2 (R^2): 0.0131 -------------------------------------------------------------------------- TREATMENT EFFECTS IN ORDER OF SIGNIFICANCE AND THEN SIZE Partial Significance Error Error Eta Source Omega^2 Levels Level Types Squared -------------------------------------------------------------------------- Repeats 0.9690 0.0100* 0.9222 Subjects 0.76482 Position 0.8919 0.0100* 0.7458 Subjects 0.11137 PR 0.1973 1.0000 0.5014 Subjects 0.01311 Cumulative R^2 (Eta^2) Due to All Source Terms: 0.8893 * = Significance Levels Modified by Box-Geisser-Greenhouse Epsilons ***************************************************************************** We will analyze the above output for the factor labelled "Position" to see CLEAVE’s features. Each output file divides into three main parts: basic information, then statistical information, and then a summary list at the end. CLEAVE: Duplicate data lines...creating new REPEATS factor. Note that CLEAVE alerts us to the fact that it has detected and generated a new factor to take care of data rows with duplicate factor levels (including the subject levels). CLEAVE then prints out basic information about the data, starting with a histogram of all of the input data: Data Histogram ---------------------------------------------------- Bin Left Edge # of Data Points Bin Right Edge ---------------------------------------------------- 2 3 3.5 3.5 7 5 5 18 6.5 6.5 7 8 8 12 9.5 9.5 4 11 11 7 12.5 12.5 2 14 ---------------------------------------------------- The histogram’s purpose is primarily to help the user detect outliers and also provides some information on the distribution of the data error. In this case we see that most of the data points lie in the 4 bins lying in the range from 3.5 to 9.5, and that there is a rough symmetry to the data distribution. The second section of the output prints out some basic information about each of the main and interaction effects in the ANOVA design. In the example above, the main effect "Position" is seen to have 3 levels, "Center", "Left", and Right" and each levels outcomes’s mean and standard deviation is reported: Source: Position Positio Repeats N Mean Std Dev Normed Ranges Center 20 7.3000 2.6378 **|** Left 20 6.1000 2.2688 **|** Right 20 8.3500 3.0310 *|** The Normed Ranges column provides information on the normalized (in std. dev. units) maximum and minimum data values in the Center, Left, and Right positions, respectively. The Ranges are intended to help the user detect the presence of outliers, which will likely make the graph lopsided (e.g. **|*****, which would indicate that the maximum data value is greater than 5 standard deviations from the mean). In the above Normed Ranges, no data point in any of the 3 positions is greater than 2 standard deviations above or below the mean ("|" indicates the mean). After basic information is printed about both main and interaction terms, the statistical info is printed. The first part of CLEAVE’s statistical output is the summary information about the factors in the design: FACTOR LEVELS TYPE VARIABLE DIMENSION BALANCE ---------------------------------------------------------------------------- SUBJECTS 5 Random Crossed Position 3 Within Fixed Crossed Uniform Repeats 4 Within Fixed Crossed Uniform Values 60 Data ---------------------------------------------------------------------------- This informs the user whether each factor is a within or a between factor and how many levels the program detected within each factor. CLEAVE also tested the data and found that that the levels are crossed. Finally, CLEAVE figures out whether a factor is uniformly balanced, is nonuniformly proportional (i.e. the data is a so-called proportional design experiment), or is totally unbalanced. Next, CLEAVE prints out the total sums of squares and degrees of freedom for this particular ANOVA design: SS df Eta^2 (R^2) Total Sum Squared: 455.25 59 S/ 21.3333333 4 0.0469 In a pure repeated measures design, as is the case here, the total SS and df should be equal to the summed SS’s and df’s of all of the proceeding factors combinations SS’s and df’s, respectively (and we see this is true: 455 = 21+51+6+348+11+6+12 and 59 = 4+2+8+3+12+6+24). CLEAVE also computes the SS and df due to the subjects, as well as printing the Eta^2 value of the subjects (21.3333333/455.25 = 0.0496) for comparison to later Eta^2 values. CLEAVE then produces various statistics for main effects and then interactions in increasing order of interaction size (second order, then third order, etc.). The omnibus F test of the main effect of factor "POSITION" is recorded in the lines: SOURCE SS df MS F p Position 50.7 2 25.35 33.99 0.0001 *** PS/ 5.96666667 8 0.7458333 and are recognizable from any ANOVA program - they indicate the sum of squares, degrees of freedom and mean squares of the main effect "POSITION" and its proper error term "PS/" (the slash "/" is the usual sign which indicates nesting of between factors inside of the subject factor - in this case there are no nesting between factors). We see that the above F value is highly significant (by the "***" beside the p value, which indicates that the p value is less than "Sig[2]"). The next two lines: Partial Omega^2: 0.8919 Scheffe Test p=0.050: 8.92 Partial Eta^2: 0.8947 Eta^2 (R^2): 0.1114 record some of the treatment effects and one post-hoc test value associated with the particular source named above ("A"). Partial Omega^2 and Eta^2 values can be used to estimate the effect that the indicated source factor combination (or main effect) has on the experiment output value. Partial omega squared is the ratio of the variance due to the source level means to the sum of the source level mean variance plus the variance due to the error term. Partial eta squared is a simpler value and is just the ratio of the sum of squares due to the source factors to the sum of sums of squares of the source and error factors. Finally, R^2 is the ratio of source sum squared (SS) to the total ANOVA sums squared. See later on in the documentation for ways to use these three treatment magnitude values. In the above example it appears that our factor "A" produces treatment magnitude effects which are quite strong: 1.0 is a maximum value for both partial values, and the 0.1114 R^2 says that more than 11% of the total variation is due to the POSITION factor. In addition, the Scheffe post-hoc values were computed for source "POSITION" above. This critical F value can be used to conduct any number of linear post-hoc tests on the factor levels to check which levels, if any, significantly affect the outcome of the experiment. The way to use Scheffe values is to run CLEAVE again on an altered experimental data set, a process which we explain later in this documentation. The next three lines: Lower Bound Epsilon: 0.5000 0.0043 *** Box-Geisser-Greenhouse Epsilon: 0.5376 0.0033 *** Huynh-Feldt Epsilon: 0.5772 0.0025 *** give estimates of the various Box correction factors for the immediately preceeding F value. Each of the values can be used to reduce both of the degrees of freedom of the sums of squares in order to more accurately estimate the probability that the null hypothesis is true. The Lower Bound estimate is the most conservative value, while the Geisser-Greenhouse is the maximum liklihood estimate of the Box correction. Finally, the Huynh-Feldt estimate attempts to correct the bias that the Geisser- Greenhouse epsilon has when the "true" Box correction is near 1 [H-F should only be used when G-G > 0.8, if used at all]. For example, if we want to be very conservative, we can use the Lower Bound estimate in the above example to reduce the effect ("POSITION") and the error ("PS/") degrees of freedom to be 0.5*2 = 1 and 0.5*8 = 4, respectively. In this case, the computed F value of 33.99 gives a probability of maintaining the null hypothesis of 0.0043, which we compute at the right of the Lower Bound Epsilon line for convenience. (use this "p" value instead of the "0.0001" that you originally obtained in the above example). The next source-associated line reads: Scheffe p=0.050 (GG): 14.42 CLEAVE recomputes the Scheffe post-hoc values by factoring in the Geisser-Greenhouse correction. Using this value is preferred over using the uncorrected value computed above because this new value takes into account the distortion to the source distribution introduced by covariance effects (and unequal variance effects). Following this we find that CLEAVE prints out a power table: Power(GG)=> 0.50 0.70 0.80 0.90 0.95 0.99 Sig=0.100 2 2 2 3 3 3 subjects Sig=0.050 3 3 3 3 3 4 subjects Sig=0.010 4 4 4 5 5 5 subjects The interpretation of this power table is simple: the top row lists the desired power that user wishes to achieve in a future experimental run: say 0.80 or 80%. And the first column lists the three possible levels of significance that the user might use in that future run: say 0.05. Then, CLEAVE predicts that the user will need to run 3 subjects in the experiment to have an 80% chance of attaining a 0.05 significance level assuming that Factor "POSITION"’s effect in future runs of the experiment is approximately as it was in this experimental run (where we used 5 subjects - so we perhaps wasted some time by testing too many subjects). Note that the "(GG)" in the above table indicates that the table has been corrected for source covariance anamolies by using the Box-Greenhouse-Geisser epsilon. CLEAVE makes this correction by using the corrected (GG) degrees of freedom above. The next section of the output is where the post-hoc pairwise factor level comparisons are listed: Pairwise Comparisons; Familywise Error: 0.0500 ; Bonferroni Prob.: 0.0167 0.0500 1 1 -1.15 2 0.0023* 2 1.10 3 0.0049* 4e-05* 3 1 2 3 Cente Left Right What we see is a simple table which lists the t-test probabilities run on all pairs of the three factor levels of factor "POSITION". The Bonferroni correction is used to control the familywise error and we find that each of the factor levels are significantly different from each other level (all comparisons performed simultaneously). We can see this by noting that all normalized level means (the values 0.05, -1.15,and 1.10) differ from one another by more than the approximate Bonferroni distance, which is 0.824, and that the more precisely computed p values in the table (0.0023, 0.0049, and 0.0000) indicate that that every pair is significantly different from every other. Lastly, CLEAVE produces a new section summarizing important data values at the end of the program: -------------------------------------------------------------------------- TREATMENT EFFECTS IN ORDER OF SIGNIFICANCE AND THEN SIZE Partial Significance Error Error Eta Source Omega^2 Levels Level Types Squared -------------------------------------------------------------------------- Repeats 0.9690 0.0100* 0.9222 Subjects 0.76482 Position 0.8919 0.0100* 0.7458 Subjects 0.11137 PR 0.1973 1.0000 0.5014 Subjects 0.01311 Cumulative R^2 (Eta^2) Due to All Source Terms: 0.8893 * = Significance Levels Modified by Box-Geisser-Greenhouse Epsilons This list is an ordered list of ANOVA sources which tell us the estimated treatment magnitudes (source effects) and signficance levels acheived by each source. The only sources listed are those whose partial omega squared is greater than zero, and the sources are listed in order of significance level, and then partial omega squared order. Note that the significance levels are divided into 4 classes, highest significance (0.01), medium significance (0.05), lesser significance (0.10), and not significant (1.0). Also, CLEAVE indicates whether or not each significance value was computed by taking into account the Box-Geisser-Greenhouse factor. The Error Level column records the value of the error term’s mean squared value that was used to compute the row’s partial omega squared. These values provide a good check on whether the list of partial omega squared values can really be used for comparisons or not by seeing if the error level values are reasonably constant (which is an assumption of the standard ANOVA procedure). At the bottom of the list CLEAVE display the cumulative R^2 value as computed from all three source terms for which R^2 values were computed. Nearly 89% of the SS values are accounted for in these 3 in the three source terms - as opposed to being contained in error terms (or due to the subjects themselves). The intent of this list is to provide the CLEAVE user a convenient summary of treatment magnitudes so that the user can gain some idea of which main and interaction terms really matter to the experimental outcome. Here we see that the main effects are both important, while the interaction effect is not so important - though an effect of 0.0895 is not so trivial and might be significant with a few more subjects in the experiment. Finally, we again note that 89% of the total sums squared is accounted for by the main and interaction term, indicating that the signal to noise ratio in the data produced in the experiment is rather high. 3. NOTES ON USING CLEAVE’S FEATURES USING VARIOUS ANOVA DESIGNS AND PRODUCING F VALUES For most ANOVA designs that the program CLEAVE can handle, the basic computations are the same - compute the sums of squares (SS) and degree of freedom values for all relevant factor combinations by using the proportional design equations found in many ANOVA references. Then mean squares (MS) and F ratios can be computed - the ratio of the MSs for the source and its error term - and thus "p" values computed using standard equations for the F distribution’s cdf function. The only relevant exception to this story is when there are multiple random factors other than the SUBJECTS factor in the design. In the case where there are two or more random factors not in a source effect under consideration, then in order to test the hypothesis that the effect’s variation implies no real effect, CLEAVE must use at least 4 MS values to compute a quasi-F value and subsequent p value. In fact, if there are n>1 random variables not included in the source term, then CLEAVE uses the classic Satterthwaite formulas to compute a quasi-F value which uses 2^n MS/df pairs. Quasi-F statistics are not true F statistics, but when used along with the degrees of freedom as specified by Satterthwaite, they approximate the correct distribution by using an F distribution which has the correct distribution’s first two moments. The user can tell when CLEAVE makes such a computation by seeing the header: SOURCE Satterthwaite df MS Quasi-F p In this case no SS values are displayed since it takes at least 4 to compute the MS (and df) values of the numerator and denominator. The user has a choice of computing one of two quasi-F values. The first way is to choose to let MS(effect) be the numerator of the quasi-F and the other MS’s be in the denominator. The other choice is to use (2^n)/2 MSs in both the numerator and the denominator of the quasi-F, The advantage of the latter scenario is that only addition is used to combine the appropriate MS’s. Most of the theoretical work that has been done has been done on the second, addition-only quasi-F computation, and it is the best one to use according to some authors (it provides "p" values that are closer to being correct). However, some authors recommend the first Satterthwaite method - modifying the denominator only - because computed "p" values are competitive with the second method and because the source MS is not tampered with. However, the first method can end up with a negative denomnator - which makes little numerical sense for a mean squared. In that case CLEAVE will switch to using the addition-only method. The output of the Satterthwiate equation looks like the following: SOURCE Satterthwaite df MS Quasi-F p A Numerator: 2.5 1.215278 1.29 0.3377 ->ABC Denominator: 7.9 0.9444444 where factors "B" and "C" are both random factors. Notice that there can be fractional degrees of freedom in the denominator (and the numerator using the second method). Note that the first column says that the source factor is "A" and "->ABC" indicates that the highest order interaction term used by Satterthwaite is "ABC". The second method is the default method for CLEAVE, but the first method can be chosen by altering the cleave.cnf lines 0 Satterthwaite F Computation Type Use approximations of effect and error = 0 Use approximations in denominator only = 1 USING BOX-GEISSER-GREENHOUSE CORRECTION VALUES The Box correction should be computed whenever there is significant unevenness in variance or covariance of factor levels typical of repeated measures designs. The Box correction value is used to compute both the Geisser-Greenhouse and Huynh-Feldt epsilons, which are parameters for approximating a so called quasi-F distribution using an F distribution. In fact, uneven (co)variance values cause the mean square statistics of a main or interaction effect to stray from an ideal chi-squared distribution, and the two epsilons above are ways of approximating this deviant distribution by adjusting the degrees of freedom of the chi-squared distribution (they do this by trying to match the first two moments of a true chi-squared distribution to that of the distorted distribution). The F distribution which results when using the Geisser-Greenhouse epsilon (by dividing by the error mean square adjusted again using the epsilon) is usually a conservative approximation and thus can be trusted for use in identifying significant factors. A widely recommended way to use the Geisser Greenhouse options is as follows (this is sometimes called the Geisser-Greenhouse algorithm): 1) See if the null hypothesis is refuted using the unadjusted F value probability (at the desired level of significance). If not, you are done (using any of the epsilons only makes the null hypothesis more likely true). But if the null hypothesis IS refuted, go to step 2. 2) Use the Lower Bound estimate to see if the null hypothesis can be refuted. If so, you are done, since using this epsilon assumes the worst-case correlation. If not, go to step 3. 3) Use the Geisser-Greenhouse epsilon to see if the null hypothesis can be refuted. Either way, you are done. 4) Optionally, after step 3, if the Geisser-Greenhouse epsilon value is greater than, say, 0.7, and you can’t quite get step 3 to refute the null hypothesis (e.g. p = 0.057), then use the Huynh-Feldt epsilon to see whether the null hypothesis can be refuted with it. If so, then be prepared to convince your paper referee that the H-F epsilon is a kosher method to use to wring significance out of your borderline-significant experimental data. However, the 4 steps just mentioned were formulated back in the day when computation was more expensive than it is today, and so the idea was to avoid computing the GG epsilon values, if one could, even for a significant F result. Given the demonstrated reliability (though it is a bit overly conservative) of the Geisser-Greenhouse modification to the degrees of freedom for any F, a reasonable approach is to simply always use the GG epsilons - in effect you are dispensing with the ANOVA assumption of homogeneity of (co)variances. Ignoring the Huynh-Feldt epsilon is reasonable, too, since though it corrects rather large, but overly conservative, GG epsilons, the actual effect on the resulting "p" value is not that great (so the user loses little power by just sticking with the Geisser-Greenhouse epsilons) and there have been studies done where the the H-F epsilon produces a liberal p value test. The algorithm which computes the Geisser-Greenhouse epsilon relies on one assumption which may be of interest to the user: in the case where the are "between" variables not included in the source factor, the algorithm just averages together the covariance matrices within each separate group (of all the auxilliary between factors) in order to compute the Box correction factor. If, in fact, one has reason to believe that there may be significant differences in these averaged covariance matrices, then the G-G epsilon might not provide a fair adjustment to the "p" values. However, the lower bound epsilon is as it claims to be even in this case, so it provides a conservative "p" value to the user. In the case that the user has non-subject random variables included in the experimental design, CLEAVE is able to use Geisser-Greenhouse and Lower Bound adjustments to the p values even in these cases. What this requires is computing a separate epsilon (G-G or L-B) for each of the MS/df pair used in computing the unadjusted F value. Thus, we end up with a separate epsilon for the numerator and denominator of the computed F value which is used to adjust - separately in this case - the degrees of freedom of the numerator and denominator of the F or quasi-F. CLEAVE shows the user the results in the following format: SOURCE SS df MS F p B 6.88888889 2 3.444444 12.40 0.0193 ** AB 1.11111111 4 0.2777778 Lower Bound Epsilons: (num: 0.500, den: 0.250) 0.1762 Box-Geisser-Greenhouse Epsilons: (num: 0.625, den: 0.324) 0.1344 Here factor "A" is a random variable and so there are seperate epsilons for source term "B" and error term "AB" (numerator and denominator respectively). In the case where a quasi-F value is computed, the numerator epsilon and denominator epsilon are both composites of the epsilons used to adjust each "df" appearing in the Satterthwaite df equation. In addition, the user can compute Geisser-Greenhouse-like epsilons for randomized ANOVA designs (non-repeated measures experiments containing pure between factors). The theoretical underpinning is the same as it is for within factors designs - see Box (1954) - so that epsilons are used to correct degrees of freedom and thus to compensate for variance heterogeneity. And, in fact, we have done Monte Carlo simulations which show that the corrected degrees of freedom, under various heterogeneity scenarios, does well in making corrected "p" values reflect their stated significance levels with balanced ANOVA designs, which are CLEAVE’s specialty. See the file "randbox.txt" for a brief description of the simulations and the results. If the user chooses to compute Box epsilons for pure between factors main effect and interaction terms, CLEAVE will produce the following kind of result (the following example is reproduced in the file bogartz6.out and concerns the between main effect labeled "B").: SOURCE SS df MS F p B 75.2839506 2 37.64198 17.03 0.0001 *** S/BC 39.7777778 18 2.209877 Lower Bound Epsilons: (num: 0.500, den: 0.111) 0.0540 * Box-Geisser-Greenhouse Epsilons: (num: 0.988, den: 0.423) 0.0016 *** As with repeated measures designs and Geisser-Greenhouse epsilons, CLEAVE produces multiple epsilons, one a "worst case" pair of epsilons, and another a best estimate (Box) pair of epsilons. But as in the random factors case, for between factors the Box epsilons are different for the numerator degree of freedom and the denominator degree of freedom. So, for example, the Lower Bound Epsilon "p" value of 0.0540 was computed by using the F value of 17.03 along with the adjusted degrees of freedom 2*0.500 (numerator) and 18*0.111 (denominator). But the above factor "B" had little likely heterogeneity of variance as indicated by the Box epsilon of 0.988 for the numerator and subsequent adjusted "p" value of 0.0016. One can use the between factor’s Box epsilons in the same way that one uses Geisser Greenhouse corrections in the repeated factors case: as part of the Geisser-Greenhouse algorithm or always using them. The latter policy is made more attractive as only variance vectors, not covariances matrices, need to be computed for pure between factor terms. Similar independent epsilons are computed when the error term is constituted, in part, of high-order interaction terms (e.g. when cells only contain 1 value and the highest order interaction term is the error). This can happen because epsilons are never computed for such high- order interactions used in the error term even when the epsilons are computed for the effect terms. One last comment. In our lab, we have noticed that when processing highly multi-way data using CLEAVE, that many, high-order interaction factors show significant GG correction epsilons (i.e. they are closer to the lower bound value than to 1 [1=a true "F" dist]). To us, this implies that when analyzing highly factorial experiments, we would be remiss to not use the Box-Geisser-Greenhouse option when looking for significant experimental factors. This is especially true when one realizes that with multi-way and/or large data sets, it is hard to get a feel for whether ANOVA’s standard "equal- variance" or "covariance sphericity" assumptions hold for all factor interaction terms. If you always use the Box options as standard policy, then you have to worry less about those two assumptions (don’t worry - you’ve still got normality and other linearity assumptions to keep you on your toes...). USING SOURCE TREATMENT MAGNITUDES When perusing the usual results of an multi-way ANOVA, an important question to ask is whether the significant (or nearly significant) results one finds using an omnibus F test have large or small effects on the outcome of the experiment. This is not an obvious question to answer from the F or p values alone because with enough subjects in the experiment, even a trifling effect can be made to pass even a severe significance test. CLEAVE include three parameters which can help answer this question. Partial omega squared is the ratio of the variance estimate due to the effect divided by that same variance plus the error variance. And partial eta squared is the ratio of effect sums squared to the same sum squared plus the error term’s sum squared. Partial omega squared provides a direct look at the source term’s effect within the linear ANOVA model equation. Partial eta^2 is an analogue to the regression coefficient which estimates how important a coefficient is to the multilinear regression equation. Third, Eta squared, which is often called R squared, is the ratio of source SS (sums squared) to total SS and shows the user what fraction of total variation is due to the source term being considered. The two "partial" values are only ratios of the source (effect) term to the effect plus (local) noise, whereas the R squared value is a ratio of the source magnitude to the total sum squared magnitude. All three treatment magnitude values are valuable to a researcher because they strip away the degrees of freedom which are intrinsic in "F" and "p" values computed for each source, and the prescence of those df’s makes comparing source terms’s "F" or "p" values a *very* bad idea. In the linear model generally assumed when performing an ANOVA, partial omega^2 is a relative estimate of the variance (sum squared of effect) of each of the factors or factor interactions. Thus, the larger this value is, the less likely the null hypothesis is true. Further, it is sometime kosher to compare treatment magnitude terms of different interaction terms one to another. If one source term’s partial omega^2 is an order of magnitude larger than that of another source term, then it is likely that the first source term has a larger effect on the experimental value than does the latter. Similar remarks are applicable for partial eta squared. In contrast to the two "partial" magnitude values. R^2 tells the user in a simple way how the amount of variation due to the effect compares with the amount of variation due to other source effects. And at the end of the treatment magnitude list, the user can see how much of the total sum squared variance is due to all of the source terms whose F values were computed - the balance is sum squared variance due to error terms which are computed earlier but are not always appearing in the column of error terms. It is not as straightforward to use these values as it is to use the "F" and "p" values computed for each source term. However, it appears to be the case that in some fields, it is known that a researcher should look for treatment magnitude effects to be a certain size in order for those factors to be claimed to be significant to the outcome of the experiment. For example, Keppel states that in psychological experiments, decades of experience shows that partial omega^2 values can be loosely categorized into small, medium and large effects by looking for values in the vicinity of 0.01, 0.06, and 0.11, respectively. The size of partial omega squared (and partial eta^2) depends upon the ratio of "signal to noise" that you see in the experiments that you are running. If there is a lot of random fluctuation in your experiments compared to those in another field, then your treatment effect values will, in general, be uniformly smaller. That is why CLEAVE makes no attempt to classify the absolute size of treatment magnitudes and instead just treats them as comparative values in its ordering of source effects at the end of the output file. There is a reason for caution even with this approach, however. Using partial omega^2 values comparatively requires the standard ANOVA assumption that error level stays "constant" across subjects and factors (= the epsilon that appears in the usual linear ANOVA model). That is why it pays to look at the "Error Level" column in the ordering of treatment magnitudes. One should look to see if these values are distrubuted in a vaguely normal manner. If not, then the ordering of partial omega squared values might be of little use, and, in fact, the omnibus ANOVA results should be taken with a grain of salt since there might be some subject-factor interaction lurking in the error terms. However, there is one way in which the absolute values of the treatment magnitudes can make a big difference. Because CLEAVE’s "power/number of subjects" table is computed directly from the value of partial omega squared, the smaller this value (regardless of the field of study you are in) the more subjects you are going to have to run through an experiment to find significance - assuming you are confident enough that the effect you are looking for is really there. Note that because of the way in which partial omega squared is computed, it is possible that the partial omega squared estimate can be negative despite the fact that a variance can never be negative. However, statisticians recommend that when partial omega squared values are used comparatively, that negative values be reported lest differences in reported value (between different factor combinations) be biased. CLEAVE doe snot, however, list negative magnitudes in the treatment list table. Finally, we note that in the case where a random factor not in the effect term appears in the error term instead of the usual subject-inclusive error term, then partial omega squared uses the subject-inclusive error term in its computation instead of the previous error term in order to help make the partial omega squared terms more meaningful. The column "Error Types" records which error term is being used: "Subjects" indicates the when the subject-inclusive error term is being used (which is the right one to use if there are only fixed factors outside of the effect term), while "Effects" indicates that the partial omega squared uses the effects proper error term’s mean squared (the one including random factors) if the subject-inclusive error type is not available for some reason. USING POWER/SUBJECT TABLES Given the intuitive display format of the power/subject table - that being the number of subjects that it would take to have a specified chance of attaining the specified significance level - the power table is easy to use. We wished to make it easy to use so that users will opt to keep this feature turned on as often as possible. This is particulary important to experimenters for at least two reasons. First, reducing the number of subjects saves money, so trial runs can be used effectively to gauge how many subjects will be needed to demonstrate any effect or to figure out that the experimental setup just doesn’t have the requisite power. Second, many funding organizations (e.g. US National Institute of Health and the US Veterans Administration) have ethics regulations which urge researchers to estimate the experimental power of their setup in order to help reduce the number of subjects that are needed in the experiment. This helps reduce the number of subjects exposed to the experimental treatment’s side effects, if any. Finally, one very interesting use of the power/subjects table as presented in CLEAVE is as a measure of strength of effect of the source term. If the user fixes a significance level (e.g. 0.05) and a reliability level (say 90%) and reports the number of subjects needed to reach that level with that reliability: this is an intuitive and quite visceral measure of effect strength. No interpretation is needed compared to partial omega squared or R^2 or other measures (like Cohen’s f or d’) because experimenters in a field generally understand approximately how many subjects are need to attain significance in experiments with strong or weak effects. Note that when there are non-subject random factors in a design, whenever such random factors appear in the error term (the denominator) of the computed F, a subject table cannot be computed for the source term because the number of subjects does not come into play directly in computing the F value or testing it for significance. CLEAVE skips the computation in that case. USING POST-HOC AND PAIRWISE TESTS Scheffe Post Hoc Test Using the Scheffe post-hoc value is not too hard but takes a little effort on the part of the person using CLEAVE. The main idea is that the user reruns CLEAVE on an altered experimental data set in order to test the analogous source F value with the Scheffe value produced with the orignial data set. This is most useful when the user wants to test 3-or-more-level contrasts of a given factor. The Scheffe post-hoc value is one of the most conservative post-hoc tests that is available when one wants to follow up an omnibus ANOVA with more specific interaction or main effect hypothesis test. It is conservative because it allows one to perform any number of linear hypotheses tests on the source factors while containing familywise error under the specified level (i.e. the chance of making just one error or more on all of the tests is kept below 0.05). An efficient way to use the tests as follows. First, run the omnibus test on your data and observe the Schefe post-hoc value for each source term you wish to apply post-hoc test to. Then, edit the input to the CLEAVE program in order to convert the data to perform the test you wish to see, likely one factor at a time. For example, suppose that factor "A" has 4 levels: CONTROL, DRUG1, DRUG2, and DRUG3. If you wish to test the hypothesis that the average of the drugs combined is significant against the control, then use a text editor and change all of the DRUG1, DRUG2, and DRUG3 labels appearing in the data set into the "DRUG" label (a search and replace should make this easy), and then rerun CLEAVE and see if the computed F value for the specified source is exceeds the Scheffe value from the omnibus test. As another example, if you wish to test the hypothesis that DRUG1’s effect is significantly different from DRUG3’s effect, then use the text editor on the original data set and remove all of the lines of data containing the labels DRUG2 and CONTROL (can be done reasonably fast with a editor macro). Then rerun CLEAVE and see if the computed F for the source exceeds that of the omnibus Scheffe value. However, in this case of a pairwise comparison, it is easier and more accurate to use the pairwise comparison tables as described later in this subsection. The main problem with the Scheffe post-hoc critical F value is that it is a low power test - meaning that there is a good chance that you may not get significant results (via the Scheffe test) even though some other post-hoc test would tell you that the comparison are looking at is significant. However, a good thing about the Scheffe test is that its values are easy to compute and it is easy to adjust when there are significant covariance effects appearing in the source factors. In that case you should use the Geisser-Greenhouse corrected Scheffe critical F value for your comparisons. In this case the user should use the "(GG)" corrected Scheffe value when rerunning the ANOVA with altered factor levels. See the next section for limitations on using Scheffe’s F test value. Paiwise Comparison Tests Using the pairwise comparison tests are more straightforward. The pairwise tests use a generalized t test to see whether or not the hypothesis that two level outcome averages are likely to be the same. This is performed for every combination of levels. The t test is generalized to take into account differences in sample size and level variances - we use a version of Welch’s V as recommended by many statisticians. And the test use various fairly conservative algorithms to control familywise error. In all cases what CLEAVE outputs are tables which list a "p" value each pair of source term levels under the hypothesis that they produce the same outcome. In the extensive example analyzed in a previous section we saw that a triangular table is produced if the user wishes to inspect all pairwise comparisons (the default case). If the user chooses to inspect all comparisons to one control, then the table that is prouced is a square with the digonal missing such as the one below (this is for the factor "Repeats" in the "repeats" file where the "control comparison" option chosen in cleave.cnf): Control Comparisons; Familywise Error: 0.0500 ; Bonferroni Prob.: 0.0167 -2.98 1 0.0001* 0e+00* 0e+00* 1 -0.850 2 0.0001* 0.0146* 0e+00* 2 0.150 3 0e+00* 0.0146* 0e+00* 3 3.68 4 0e+00* 0e+00* 0e+00* 4 1 2 3 4 1 2 3 4 Here, if one wants to use, say, level 2 as the control, then the second row (or column because of diagonal symmetry) should be consulted to see which other level’s effects are likely to be different from level 2’s effect. Other rows can be ignored. One should use the "cleave.cnf" file to select a control table to be produced (likewise in choosing other options listed below) because using the all-pairwise table for the purposes of making comparisons reduces the power of the user’s pairwise tests. Also, if we are using pooled error in the above comparisons, we can use the Sidak Distance, 0.971 above to provide significance intervals to each factor level. E.g. factor level 3 above has its mean in the interval [0.150-0.971,0.150+0.971] while keeping the familywise error below 0.05. We now talk about the other pairwise comparison options below. The difference between choosing the Bonferroni correction or the Sidak correction is one of assumptions: the Sidak correction assumes just that your factor levels are distributed according to a multinormal distribution, while the Bonferroni makes not even that assumption - it just uses basic laws of probability to make its "familywise" correction. In practice, however, both corrections give nearly the same results, but the Sidak correction is slightly higher in value - so it provides a more powerful test while being conservative with respect to any significant factor level covariation. Likewise, whether you use a simultaneous test or a sequential (step-wise) test depends upon whether you are comfortable with the quasi-Bayesian assumption that the sequential ("Holm") test makes - that every time you find a pair which passes the corrected probability test, then you can run the next test with a corrected probability assuming one fewer pair to test (you are updating on the information that the previous pair has significantly different outcomes). Thus, the sequential tests are slightly easier tests to pass and are similar to tests like Newman-Keuls and Duncan (the latter have more assumptions attached to them, however, which is why we do not include them). Another major option that you can change is to choose whether or not to use a pooled error when computing the pairwise comparisons. The default is to not use a pooled error, and the reason is that if one does use a pooled error variance, the user runs the risk of having the computed p values not being a reliable indicator of significance in one of two senses. If there are significant differences in the the variances of two levels being compared, then using a pooled error variance can cause the familywise error specified to be exceeded - or it can cause the pairwise tests to lose power. However, if a user knows that the variances of the different levels are quite similar (or discovers this by looking at the first part of CLEAVE’s output), then using a pooled error can increase the pairwise comparison’s power, and so it would be recommended to change the default value. The final major option one can choose in doing post-hoc pairwise comparisons is whether or not to compare ANOVA-model interaction levels, which is what was done in bulk by the ANOVA F statistic, or to simply compare plain joint factor levels via paired comparisons. The former is the default case, but that latter could be used in the case that joint factor levels are more interesting than are the ANOVA model’s interactions where intersecting lower-order averages being subtracted out for the sake of having disjoint sums squared. Plain joint factor levels can be chosen to be analyzed by adding a 4 to the value in cleave.cnf which chooses whether to look at all-pairwise comparisons or control comparisons. In the case that ANOVA model interaction levels (the default) and pooled errors (also the default) are selected, CLEAVE will attempt to boost significance by using logical relations amongst the different pairs. These logical relations occur in interaction terms where at least one of the term’s factors is binary. E.g. if we choose to process interactions terms on the included "proportional" data file using the two assumptions listedd above, then the pairwise comparison’s table for an all-pairwise comparison for the first two factor’s interaction looks as follows: Pairwise Comparisons; Familywise Error: 0.0500 ; Bonferroni Prob.: 0.0031 0.491 1 0.366 2 0.9364 -1.22 3 0.2515 0.2283 1.12 4 0.7010 0.6098 0.0936 -0.491 5 0.5673 0.5844 0.6250 0.3244 -0.366 6 0.5844 0.6015 0.5161 0.3141 0.9364 1.22 7 0.6250 0.5161 0.0466 0.9415 0.2515 0.2283 -1.12 8 0.3244 0.3141 0.9415 0.1476 0.7010 0.6098 0.0936 1 2 3 4 5 6 7 1 1 1 1 2 2 2 1 2 3 4 1 2 3 Note that even though there are 28 possible pairwise comparisons in the table, the familywise Bonferroni probability is 0.0031, which is 0.05/16 and not 0.05/28 which is what it would be had CLEAVE not used the logical relations inherint in ANOVA model interaction terms having one binary factor (the first in the above example). DETECTION OF OUTLIERS There are two parts of CLEAVE’s output which assist the user in detecting outlying data values. Detection of rogue values is important as their presence is one of the major ways in which the results of an ANOVA can be misleading. Given the fact that ANOVA and pairwise comparisons rely on sums of squares, extreme values can easily reduce their effectiveness of the resulting statistics. The inital Data Histogram allows the user to see whether the data is roughly generated by a linear model with Gaussian-like noise added to it. The two most troubling things to see in the histogram would be, first, to have the bulk of the data compressed in relatively few bins with a few outlying values scattered at the edges, and second, to see a very asymmetric or non-unimodal distribution of the experimental data - one which is not at all consistent with a linear model. The Normed Ranges that appear with the basic factor information also can be useful in tracking down multiple outlying data values. Each normed range line diagram displays the maximum and minimum values appearing in each line’s factor-level combination data by displaying the range as multiples of the combination’s data standard deviation. Thus, one can use the line diagrams to detect single data values which are abberrant relative to the rest of the values. This works espeically well when one inspects the Normed Ranges of the cells of the ANOVA design - when all of the factors in a dataset (sans the subjects) take on levels and are not averaged over. One can then see whether or not the cell data appears to have extreme values in accord with Gaussian shaped noise added to a constant data value. Note that when one is viewing the Normed Ranges of main effects or lower-order interactions, it is often the case that one will see maximum or minimum values of high multiples of the standard deviation even though there are relatively few data values taking on the particular line’s factor’s levels. This can be consistent with the data values being prouced by a linear model with Gaussian noise when the other factors being averaged over have significant effects via the linear model. E.g. if the user sees the Normed Range diagram "****|***" on a line where only 35 data points are in the range of a main effect’s line, the user should not think that the result is surprising by thinking that one should rarely see a data point 4 standard deviations below the mean (and 3 above) beacuse the variation is not due to Gaussian noise alone but due to linear factor effects. DETECTING AND CORRECTING ANOVA DATA SET DESIGNS The detection of the particular design of an ANOVA data set is intended to help the program automatically perform the appropriate statistical procedures on the data values. The most important decision the program makes is in classifying factors as "within" (repeated measures) or "between" factors, where the latter factors have one level assigned to each subject (and the assignment is ordinarily done via randomization). The detection routine used can pick out "within" and "between" factors even when there are some errors in the data set, such as there being too few or too many data values, or minor errors in the factor level labels. Second, the detection routine performs a through check on the proportionality of the ANOVA data set and to tell the user whether or not a factor is balanced or not and whether a balanced factor has varying numbers of subjects assigned to each level or is evenly balanced. Finally, CLEAVE can "correct’ data sets which are nearly balanced, but can only do so for data imblanced in certain limited ways. One way in which a defective data set can be corrected is if some subject performs the experiment repeatedly with most but not all of the possible combinations of levels of all of the within factors. CLEAVE can interpolate data values based upon the present data cells means and standard deviations in order to force the design into balance (at least the within factors will be balanced). The other main correction that CLEAVE can make is to find subjects which do an experiment multiple times with different levels of a between factor. In this case CLEAVE can (randomly) delete some of the data points to make sure that each subject is associated with only one level of every between factor. In both cases of data set modification, the intent is to have CLEAVE modify only very small departures from a proportional design, which is why the configurations file "cleave.cnf" allows the user to place a cap on both the percentage of missing data that needs to be added or excess data that needs to be deleted (defaults are set at 0.1% for both). The algorithm used is justified by numerical considerations - that guessing reasonable values for the absent data will not change the computed statistics highest (3 or more) significant digits (similarly for deleted data). More importantly, if data set modification is attempted by CLEAVE, in the basic data output one will find the following messages: -------------------------------------------------------- CLEAVE: Excise some data to create a proportional design. Deleted Cell Data SUBJECT A B C Deleted Data Value 5 1 3 3 3 CLEAVE: Inserting missing data to create a proportional design. Inserted Cell Data SUBJECT A B C Inserted Data Value 8 1 3 3 0.88776571 CLEAVE: Start over with, perhaps, a modified dataset. -------------------------------------------------------- which informs the user which data points were deleted and which data points were interpolated into the data set. This can be important in the case that the user has fixable defects in the data set file because it pinpoints precisely where the problem is. 4. PROGRAM LIMITATIONS 1) CLEAVE is constanly being tested in vivo and in vitro. It has passed all of the tests thus far thrown at it (see "cleave.tst" for details), but further testing is being done on real trial data and ongoing corrections are likely to be needed. 2) Using Box corrections with higher interaction term degrees might cause the program to work for a long time to compute the G-G values. When using CLEAVE on large, multi-way data sets, it is advisable to test how long the program will take using a degree value like 2 in order to estimate how long it might take to compute higher order interaction Box coefficients. Computing the Box corections only when needed will also speed up execution a certain extent without costing any interesting results [the good news is that the more significant your experimental results are, the longer this option takes]. Also, a computer with memory limitations might not be able to hold the covariance matrices which are computed in the course of computing the G-G epsilons. E.g. with a 7-way ANOVA run on fragmentary fMRI data common in our lab, (9 x 3 x 2 x 2 x 15 x 2 x 7 x 5) where the 9 is the random factor and all factors are crossed (within factors), it takes the following amount of time to run CLEAVE on a 900 MHz Athlon computer (512 Megs RAM) running Windows 2000: Computed Only When Needed? Time Interaction Degree ------- ------------- ------------------ No 9 seconds 0 No 10 seconds 1 No 20 seconds 2 No 100 seconds 3 No 1500 seconds 4 No 35000 seconds 5 Yes 10 seconds 1 Yes 15 second 2 Yes 50 seconds 3 Yes 800 seconds 4 Yes 10300 seconds 5 Unfortunately, it does appear that the amount of time it takes to compute interactions of 1 order higher increases exponentially (this makes sense if one looks at the number of computations that the program does to compute the Box factors). This problem can be lessened by using the cleave.cnf option that CLEAVE compute approximate GG epsilon values for interaction terms at or above a specified order. The approximation epsilon is identical epsilon under certain simplifying assumptions, speeds up the computation greatly, and appears to be a fairly good approximation to the true GG epsilon in testing thus far. 3) The input file to CLEAVE is written to a temporary file so that it can be reread multiple times during processing. Occasionally, if CLEAVE is interrupted, this file can be left on the disk - it has the name "CLEAxxxx.tmp" - and can be deleted without harm when you find it in the current directory or the tmp directory. 4) Our GG adjustment of the Scheffe critical value, unfortunately, is not the recommended one [see Miller, e.g.] - we simply adjust the Scheffe F value using the same moment-matching degree of freedom adjustment that the omnibus F test relied upon to estimate significance. However, this is a sensible approximation which allows the user to run CLEAVE again as recommended while providing a margin of safety which the lack of equal covariance values demands. The standard adjustment of the Scheffe value is one which requires knowing what linear combinations of levels are to be tested - something which would be hard for the user to incorporate into successive runs of CLEAVE (this is a post-hoc test, after all). Additionally, the G-G correction used to adjust the Scheffe F test value is an average value which may not reflect the variance/covariance inhomogeneities of the specific post-hoc factor level comparisons the user is interested in. However, one can check to see if the latter are the same as the former by running the post-hoc test with the Geisser-Greenhouse option turned on and looking to see whether the post-hoc epsilon is reasonably close to the original (omnibus) F test’s computed G-G epsilon. 5) We only include the rather conservative Bonferroni and Sidak pairwise tests because those two tests use few assumptions about the data when controlling experimentwise error, and also because the two best other tests - Tukey for all-pairise tests and Dunnett for control-pairwise tests - are expensive to compute if we want to correct (and we do) for (co)variance heterogeneity. 6) The input file parser has the following quirks to it. First, one could arrange the data in "quote-delimited" format without the use of any whitespaces, e.g. ’Sub1’"medial""posterior"’high’"4567.89" is a perfectly valid input line. However, one wants to avoid having input files with an empty set of quotes: "" or ’’ - either of these will mess up the parsing (a level name must contain 1 character). Finally, one can record the abscence of an experimental outcome by placing the string "NA" in the final column - this will undoubtedly cause the input file to be an unbalanced design, but a record will be made of how many such experimental outcomes are missing from the input file. 7) The Bonferroni-Sidak Post-hoc tables are not very easy to use read when there are more than 13 or so factor levels. It helps to have a text editor in which one can turn off "wrapping" and easily scroll left and right. 8) In the computation of partial omega squared in the case where ther are repeated measures factors, we assume that the ANOVA model is a so-called "additive model". Essentially, this assumes that there is no substantial interaction between the subjects and the repeated factors (each subject was subjected to all of the levels of said factors). This is a very important assumption when one is computing omega squared, but as we are computing only partial omega squared - so that we do not need to know the total experimental variance, but only a factor’s error variance - it is less of an issue. However, it remains to be computed how partial omega squared is affected in clearly non-additive models (which can be detected by the "Tukey test" - see Vaughn and Corballis). 9) The data insertion/deletion feature cannot in general correct a nearly proportional design when the design has two or more varying between factors and the lack of proportionality is due to there being a lack or excess of interaction instances of those varying between factors. 5. REFERENCES Box, G.E.P., "Some Theorems on Quadratic Forms Applied in the Study of ANOVA Problems, I. Effect of Inequality of Variance in the One-Way Classification", Ann. of Math. Stat., v. 25, 1954, p. 290. Games, P.A., H.J. Kesselman and J.C. Rogan, "Simultaneous Pairwise Multiple Comparison Procedures for Means when Sample Sizes are Unequal", Psychological Bulletin, v. 90, #3, pp. 594-598, 1981. Geisser S. and Greenhouse, S., "An Extension of Box’s Results on the Use of F Distributions in Multivariate Analysis", Annals of Mathematical Statistics, v. 29, pp. 885-891, 1958. Keppel, G., "Design and Analysis: A Researcher’s Handbook", 3rd Ed., Prentice Hall, 1991. Kirk, Roger E., "Experimental Design: Procedures for the Behavioral Sciences", 3rd Ed., Brooks/Cole, 1995. Kirk, Roger E., "Practical Significance: A Concept Whose Time Has Come", Educational and Psychological Measurement, v. 56, #5, pp746-759, Oct. 1996. Miller, Rupert, "Simultaneous Statistical Inference", 2nd Ed., Springer, 1981. Pendelton, O.J., "Inflential Observations in the Analysis of Variance", Communications in Statistics - Theory and Methods, v. 14, #3, pp.551-565, 1985. Sahai, Haredo and Mohammed I. Ageel, "The Analysis of Variance: Fixed, Random and Mixed Models", Birkhauser, Boston, 2000. Shaffer, Juliet Popper, "Modified Sequentially Rejective Multiple Test Procedures", JASA, v. 81, #395, Sept. 1986. Vaughn, Graham and Michael Corballis, "Beyond Tests of Significance: Estimating Strength of Effect in Selected ANOVA Designs", Psychological Bulletin, v. 22, #3, 1969. For more references used in the design of this program see the cleave.hst file. 6. ERROR AND INFORMATIONAL MESSAGES The following is an alphabetized list of the messages you might see when you run CLEAVE, along with some suggestions on how to make those messages disappear the next time you try to run CLEAVE on offending data set. --- "CLEAVE: ___ is not a data value" CLEAVE has encountered a non-numeric string in the final column that is always dedicated to holding the outcome values. Check the data file to see if there are extra columns or stray characters in the final column. --- "CLEAVE: Cannot exceed ___ many levels/factor!" This message can be gotten rid of by going into cleave.cnf and increasing the maximum number of levels/factor so that your data set fits into the allocated memory space. --- "CLEAVE: Cannot find a temporary file name!" "CLEAVE: Cannot open a temporary write file!" "CLEAVE: Cannot reopen the temporary data storage file!" CLEAVE is having problems locating space for or opening a disk file which is to hold the data file so that CLEAVE can reread it again. This file is either in the current directory or is in the /tmp/ directory (*NIX systems), and it is important to be able to reread the file as that is how CLEAVE is able to perform the ANOVA without having to load the entire data set into memory. --- "CLEAVE: Data rows must have from one to ___ factors" This probably means that you need to go into cleave.cnf and increase the number of factors that it can hold, but it may indicate that you data set is rather trivial - it has only one cell, and CLEAVE doesn’t have anything to do then. --- "CLEAVE: Duplicate data lines...creating new REPEATS factor." "CLEAVE: Duplicate data lines...averaging into existing factors." CLEAVE is informing you that it found rows with indentical factor levels (including the subject name) and is saying that it is going to either average them into the factors listed in the data file or create a new (repeated measure = within) factor which separates either of these. --- "CLEAVE: Excise some data to create a proportional design" "CLEAVE: Inserting missing data to create a proportional design" These messages can appear when CLEAVE tries to alter the input data set to make it a balanced (proportional) ANOVA design. They appear in the output directly above the indicators of the data points inserted or deleted. --- "CLEAVE: Input file has varying columnar structure" CLEAVE has detected that the number of columns in your data file doesn’t have a uniform number of columns - you have to fix that in the data set. --- "CLEAVE: Interrupted by user (Ctrl-C)... temporary file removed" what you see when you hit the Ctrl-C key combination while CLEAVE is running. The temporary file may or may not be removed (as indicated) depending upon whether it still exits or not. --- "CLEAVE: Invalid a or b or NUMERICS_ITMAX too small in betacf" "CLEAVE: Invalid betai value" "CLEAVE: Invalid degree of freedom in cdfF denominator." "CLEAVE: Invalid degree of freedom in cdfF numerator." "CLEAVE: Invalid degree of freedom in cdfNCF denominator." "CLEAVE: Invalid degree of freedom in cdfNCF numerator." "CLEAVE: Invalid gammaln value" These errors are internal, mathematical function errors which you should never see unless there is some kind of bug in CLEAVE. CLEAVE keeps on going past these problems, but the values printed in the section immediately following the warnings should be ignored as they are likely bogus. --- "CLEAVE: Invalidly small DF: ___ in source: ___" "CLEAVE: Large F value due to zero errorms value" "CLEAVE: Negative DF: ___ in source denominator: ___" "CLEAVE: Negative DF: ___ in source numerator: ___" "CLEAVE: Negative MS: ___ in source denominator: ___" "CLEAVE: Negative MS: ___ in source numerator: ___" "CLEAVE: Negative SS: ___ in source: ___" "CLEAVE: Negative SS: ___ in source pure error term: ___" CLEAVE is having problems calculating sums of squares, or mean squares, or degrees of freedom or an appropriate F value. Seeing the "Large F value" warning might be a sign that something is wrong with your data file - there is just so little noise in the data set that something is probably fishy. The other warning/errors should rarely be seen unless there is a bug in CLEAVE or your data set is badly (and obviously) messed up. --- "CLEAVE: No free memory for A Data Means Insertion Array" These memory allocations are used when CLEAVE tries to interpolate data into a data set when a data set is not balanced due to a lack of data. --- "CLEAVE: No free memory for Aux Means" "CLEAVE: No free memory for Aux Counts" "CLEAVE: No free memory for Aux (Co)Variance" "CLEAVE: No free memory for Aux Interaction Covariance Matrix Or t Pairs" "CLEAVE: No free memory for Aux t Pairs" "CLEAVE: No free memory for Aux Variance" These memory allocation errors say that you are running out of memory for performing Geisser-Greenhouse computations and/or pairwise post-hoc comparisons. Either you need to turn off those options or use the other suggestions appearing in the other memory warnings in this section. --- "CLEAVE: No free memory for Box Approximations" "CLEAVE: No free memory for Box Trace" These memory allocation errors say that you are running out of memory for performing Geisser-Greenhouse computations. Either you need to turn off this option or use the other suggestions appearing in the other memory warnings in this section. --- "CLEAVE: No free memory for Bracket Computations" "CLEAVE: No free memory for Duplication Array" "CLEAVE: No free memory for File Column Level Name Parsing" "CLEAVE: No free memory for Level Names" "CLEAVE: No free memory for Level Tracking Index" "CLEAVE: No free memory for Multidimensional Array" "CLEAVE: No free memory for Number of Levels per Factor" "CLEAVE: No free memory for Proportionality Single Factor Counts" "CLEAVE: No free memory for Repeats Instance Factor" "CLEAVE: No free memory for Temporary Duplication Array" "CLEAVE: No free memory for Text Column Mask" Seeing any of the above memory allocation errors means that you will not be able to perform basic ANOVA tasks without getting or freeing up more memory somehow. Make sure that you see if cleave.cnf contains the lowest number of factors and levels/factor that your data set requires. --- "CLEAVE: No free memory for Treatment Effects" "CLEAVE: No free memory for Treatment Errors" "CLEAVE: No free memory for Treatment List Flags" These memory allocation errors say that you are running out of memory for building a treatment magnitude list. Either you need to turn off this option or use the other suggestions appearing in the other memory warnings in this section. --- "CLEAVE: No room for new factor - using only last data instances!" CLEAVE is informing you that it found rows with indentical factor levels (including the subject name) and is saying that it cannot create a new factor to handle it because you are already at the limit in the number of factors you have. Thus, it will only pay attention to the last instance of each duplicated factor. --- "CLEAVE: Not enough subjects in data (column 1)!" CLEAVE only detects that there is one subject, which makes it kind of hard to do good analysis of variance. Check to see if column 1 contains all of your subject labels. --- "CLEAVE: Said lack of separation explains the ANOVA design imbalance" "CLEAVE: Seems unable to balance the repeated measures design" "CLEAVE: Some subjects fail to keep between factors separated" "CLEAVE: Some within factors are missing some subject instances" "CLEAVE: Subjects missing in design: cannot fix data imbalance" These warnings can appear when the data set that CLEAVE reads in is not a proportional or balanced design. The problems that can be detected include a lack of separation amongst "between" factors (which suggests that proper randomization has not been done?) as well as too few instantiations of within factors, indicating that some subjects did not do the entire experiment. Also some of the warnings say whether the design can be balanced or not. --- "CLEAVE: Switching to default Satterthwaite approximation due to negative MS" This appears when a negative MS denominator value is computed when using the (non-default) Satterthwaite method which involves subtraction. The default method is then automatically invoked to compute a quasi-F value. --- "CLEAVE: Too little input data" CLEAVE detects that you have zero multi-level, non-subject factors. Check your data set. --- "CLEAVE: Zero or negative MSerror => Will not use pooled variance." CLEAVE gives this warning in two cases, usually. Either your data set has (suspiciously) zero noise, or in the case when there are multiple random variables, and you are using the version of Satterthwaite’s approximation which only approximates the denominator. Either you should switch to using the (default) Satterthwiate approximation which approximates both the numerator and denominator of the F computation, or you should switch to not using the pooled variance when computing pairwise post-hoc tests. --- "CLEAVE: Cannot open ’cleave.cnf’! Using default values..." This is when the configuration file cleave.cnf cannot be found in the current (working) directory. CLEAVE still works in this case, but it uses default values. --- "Completely imbalanced design => F values will not be computed" CLEAVE cannot handle unbalanced designs when computing F, so all you can expect to get is the pairwise comparisons. --- "Presence of within factors => Pairwise comparisons might be misleading" "Within factor interation terms will use pooled error for pairwise comparisons" CLEAVE automatically tries to use pooled error in any pairwise comparisons it does whenever there are within variables in the ANOVA design. If it cannot compute pooled errors, then it will still do the pairwise comparisons, but it informs the user of the potential problem (which is an excess of degrees of freedom in the pairwise comparison error terms). --- "Random, mixed between-within design => F values will not be computed" When there is a design with random elements and both between and within (repeated measures) factors, CLEAVE can only compute the pairwise comparisons. --- "Random unbalanced proportional design => F values will not be computed" When there is an unbalanced proportional design with some non- subject random elements in it, CLEAVE cannot compute the F values, but only the pairwise comparison. --- "Some Geisser-Greenhouse sub-epsilons not computed: interaction levels too high" This is presented when there are random variables and in attempting to compute the GG adjusted p value CLEAVE has to compute a GG value for an interaction term whose order exceeds that specified in cleave.cnf. It is advised to change the maximal order in cleave.cnf to at least the order of the term appearing at the head of the line just below the original computed F value for this source term. --- 7. CONTACT INFORMATION Please send any bugs or suggestions to me at the email and/or realmail addresses below. Timothy Herron Staff Statistical Curmudgeon Human Cognitive Neurophysiology Laboratory Department of Neurology UC Davis and VANCHCS, 150 Muir Road Martinez, CA USA 94553 tjherron@ebire.org 1-925-372-2000-x4119