Home > matlabmk > mxt_perm2.m

mxt_perm2

PURPOSE ^

function [pval, t_orig, tmx_ptile, seed_state, est_alpha]=mxt_perm2(dataA,dataB,n_perm,alph,tail,verblevel,seed_state,freq_domain)

SYNOPSIS ^

function [pval, t_orig, tmx_ptile, seed_state, est_alpha]=mxt_perm2(dataA,dataB,n_perm,alph,tail,verblevel,seed_state,freq_domain)

DESCRIPTION ^

function [pval, t_orig, tmx_ptile, seed_state, est_alpha]=mxt_perm2(dataA,dataB,n_perm,alph,tail,verblevel,seed_state,freq_domain)

 mxt_perm2-Two sample permutation test based on a t-statistic and null
 hypothesis of a mean difference of zero.  For one-tailed tests, the null distribution
 is derived from the most extreme t-score of all permutations.  For two-tailed
 tests, the null distribution is derived from the maximum absolute t-score of
 all permutations.  Can handle multiple electrodes and time
 points. According to Hemmelammnn et al. (2004), the t_max permutation
 test has relatively good power for multivariate data whose dimensions are
 highly correlated (like EEG/ERP data).  See Maris (2004) for an
 explanation of how permutation tests like this control for multiple
 comparisons.  Note, Maris uses Hotelling's multivariate T^2 statistic in
 his paper (instead of the t_max used here) and he advocated performing
 the tests over multiple time points (e.g., a 10 ms time window).
 However, I found that the T^2 statistic did not have much power when the
 sample size was near the number of electrodes and the results of the test
 could vary greatly depending on the size of the time window.

 Required Inputs:
  dataA  - 3D matrix of ERPs from Group A (Channel x Time x Participant)
  dataB  - 3D matrix of ERPs from Group B (Channel x Time x Participant)

 Optional Inputs:
  n_perm - number of permutations {default=1000}.  Manly (1997) suggests
           using at least 1000 permutations for an alpha level of 0.05 and 
           at least 5000 permutations for an alpha level of 0.01.
  alph   - test alpha level {default=.05}
  tail   - [1 | 0 | -1] If tail=1, the alternative hypothesis is that the
           means of Group A are greater than those of Group B (upper tailed test).  
           If tail=0, the alternative hypothesis is that the means of the 
           two groups are different (two tailed test).  If tail=-1, the 
           alternative hypothesis is that the means of Group A are less 
           than those of Group B (lower tailed test). {default: 0}
  verblevel   - An integer specifiying the amount of information you want
                this function to provide about what it is doing during runtime.
                 Options are:
                    0 - quiet, only show errors, warnings, and EEGLAB reports
                    1 - stuff anyone should probably know
                    2 - stuff you should know the first time you start working
                        with a data set {default value}
                    3 - stuff that might help you debug (show all
                        reports)
  seed_state  - The initial state of the random number generating stream
                (see MATLAB documentation for "randstream"). If you pass
                a value from a previous run of this function, it should
                reproduce the exact same values.
  freq_domain - If 0, command line report will be given in temporal units
                (e.g. time points).  Otherwise, the report will be given 
                in frequency domain units (e.g., frequencies). {default:
                0}

 Outputs:
  pval       - p-value at each time point and electrode (corrected for
                multiple comparisons via permutation test)
  t_orig     - t-score at each time point and electrode
  tmx_ptile  - critical t-scores for given alpha level
  seed_state - The initial state of the random number generating stream
               (see MATLAB documentation for "randstream") used to 
               generate the permutations. You can use this to reproduce
               the output of this function.
  est_alpha  - The estimated family-wise alpha level of the test.  With 
               permutation tests, a finite number of p-values are possible.
               This function tries to use an alpha level that is as close 
               as possible to the desired alpha level.  However, if the 
               sample size is small, a very limited number of p-values are 
               possible and the desired family-wise alpha level may be 
               impossible to acheive.

 Notes:
 -The null hypothesis of this test is that the data in the two groups are
 "exchangeable" (i.e., the data from a participant in Group A was just as
 likely to have come from Group B).  If the distributions of the two
 populations you're comparing differ in *any* way (i.e., not just in
 central tendency), the null hypothesis will be violated.  For example, if
 the ERPs of the two groups differ in their variability, but not their
 average value (e.g., one set of ERPs is from a group of children and the
 other adults), you'll be more likely than alpha to reject the null
 hypothesis.

 -Unlike a parametric test (e.g., an ANOVA), a discrete set of p-values
 are possible (at most the number of possible permutations).  Since the
 number of possible permutations grows rapdily with the number of
 participants, this is only an issue for small sample sizes (e.g., 3
 participants in each group).  When you have such a small sample size, the
 limited number of p-values may make the test overly conservative (e.g., 
 you might be forced to use an alpha level of .0286 since it is the biggest
 possible alpha level less than .05).

 Author:
 David Groppe
 March, 2009
 Kutaslab, San Diego

 References:

 Hemmelmann, et al. (2004) Multivariate tests for the evaluation of
 high-dimensional EEG data. Journal of Neuroscience Methods.

 Maris, E. (2004) Randomization tests for ERP topographies and whole 
 spatiotemporal data matrices. Psychophysiology

 Manly, B.F.J. (1997) Randomization, bootstrap, and Monte Carlo methods in
 biology. 2nd ed. Chapmn and Hall, London.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 %function [pval, t_orig, tmx_ptile, seed_state, est_alpha]=mxt_perm2(dataA,dataB,n_perm,alph,tail,verblevel,seed_state,freq_domain)
0002 %
0003 % mxt_perm2-Two sample permutation test based on a t-statistic and null
0004 % hypothesis of a mean difference of zero.  For one-tailed tests, the null distribution
0005 % is derived from the most extreme t-score of all permutations.  For two-tailed
0006 % tests, the null distribution is derived from the maximum absolute t-score of
0007 % all permutations.  Can handle multiple electrodes and time
0008 % points. According to Hemmelammnn et al. (2004), the t_max permutation
0009 % test has relatively good power for multivariate data whose dimensions are
0010 % highly correlated (like EEG/ERP data).  See Maris (2004) for an
0011 % explanation of how permutation tests like this control for multiple
0012 % comparisons.  Note, Maris uses Hotelling's multivariate T^2 statistic in
0013 % his paper (instead of the t_max used here) and he advocated performing
0014 % the tests over multiple time points (e.g., a 10 ms time window).
0015 % However, I found that the T^2 statistic did not have much power when the
0016 % sample size was near the number of electrodes and the results of the test
0017 % could vary greatly depending on the size of the time window.
0018 %
0019 % Required Inputs:
0020 %  dataA  - 3D matrix of ERPs from Group A (Channel x Time x Participant)
0021 %  dataB  - 3D matrix of ERPs from Group B (Channel x Time x Participant)
0022 %
0023 % Optional Inputs:
0024 %  n_perm - number of permutations {default=1000}.  Manly (1997) suggests
0025 %           using at least 1000 permutations for an alpha level of 0.05 and
0026 %           at least 5000 permutations for an alpha level of 0.01.
0027 %  alph   - test alpha level {default=.05}
0028 %  tail   - [1 | 0 | -1] If tail=1, the alternative hypothesis is that the
0029 %           means of Group A are greater than those of Group B (upper tailed test).
0030 %           If tail=0, the alternative hypothesis is that the means of the
0031 %           two groups are different (two tailed test).  If tail=-1, the
0032 %           alternative hypothesis is that the means of Group A are less
0033 %           than those of Group B (lower tailed test). {default: 0}
0034 %  verblevel   - An integer specifiying the amount of information you want
0035 %                this function to provide about what it is doing during runtime.
0036 %                 Options are:
0037 %                    0 - quiet, only show errors, warnings, and EEGLAB reports
0038 %                    1 - stuff anyone should probably know
0039 %                    2 - stuff you should know the first time you start working
0040 %                        with a data set {default value}
0041 %                    3 - stuff that might help you debug (show all
0042 %                        reports)
0043 %  seed_state  - The initial state of the random number generating stream
0044 %                (see MATLAB documentation for "randstream"). If you pass
0045 %                a value from a previous run of this function, it should
0046 %                reproduce the exact same values.
0047 %  freq_domain - If 0, command line report will be given in temporal units
0048 %                (e.g. time points).  Otherwise, the report will be given
0049 %                in frequency domain units (e.g., frequencies). {default:
0050 %                0}
0051 %
0052 % Outputs:
0053 %  pval       - p-value at each time point and electrode (corrected for
0054 %                multiple comparisons via permutation test)
0055 %  t_orig     - t-score at each time point and electrode
0056 %  tmx_ptile  - critical t-scores for given alpha level
0057 %  seed_state - The initial state of the random number generating stream
0058 %               (see MATLAB documentation for "randstream") used to
0059 %               generate the permutations. You can use this to reproduce
0060 %               the output of this function.
0061 %  est_alpha  - The estimated family-wise alpha level of the test.  With
0062 %               permutation tests, a finite number of p-values are possible.
0063 %               This function tries to use an alpha level that is as close
0064 %               as possible to the desired alpha level.  However, if the
0065 %               sample size is small, a very limited number of p-values are
0066 %               possible and the desired family-wise alpha level may be
0067 %               impossible to acheive.
0068 %
0069 % Notes:
0070 % -The null hypothesis of this test is that the data in the two groups are
0071 % "exchangeable" (i.e., the data from a participant in Group A was just as
0072 % likely to have come from Group B).  If the distributions of the two
0073 % populations you're comparing differ in *any* way (i.e., not just in
0074 % central tendency), the null hypothesis will be violated.  For example, if
0075 % the ERPs of the two groups differ in their variability, but not their
0076 % average value (e.g., one set of ERPs is from a group of children and the
0077 % other adults), you'll be more likely than alpha to reject the null
0078 % hypothesis.
0079 %
0080 % -Unlike a parametric test (e.g., an ANOVA), a discrete set of p-values
0081 % are possible (at most the number of possible permutations).  Since the
0082 % number of possible permutations grows rapdily with the number of
0083 % participants, this is only an issue for small sample sizes (e.g., 3
0084 % participants in each group).  When you have such a small sample size, the
0085 % limited number of p-values may make the test overly conservative (e.g.,
0086 % you might be forced to use an alpha level of .0286 since it is the biggest
0087 % possible alpha level less than .05).
0088 %
0089 % Author:
0090 % David Groppe
0091 % March, 2009
0092 % Kutaslab, San Diego
0093 %
0094 % References:
0095 %
0096 % Hemmelmann, et al. (2004) Multivariate tests for the evaluation of
0097 % high-dimensional EEG data. Journal of Neuroscience Methods.
0098 %
0099 % Maris, E. (2004) Randomization tests for ERP topographies and whole
0100 % spatiotemporal data matrices. Psychophysiology
0101 %
0102 % Manly, B.F.J. (1997) Randomization, bootstrap, and Monte Carlo methods in
0103 % biology. 2nd ed. Chapmn and Hall, London.
0104 
0105 %%%%%%%%%%%%%%%% REVISION LOG %%%%%%%%%%%%%%%%%
0106 %
0107 % 3/26/2010-fixed critical t-scores (tmx_ptiles) for two-tailed test.  I was
0108 % allowing them to be asymmetrical but this differed from how I was
0109 % computing p-values.  It turns out the way I was computing p-values (from
0110 % Hemmelmann) was effectively using the absolute value of all the t-scores.
0111 % Thus the critical t scores should be symmetrical.
0112 %
0113 % 4/1/2010-Estimated alpha level added
0114 %
0115 % 12/14/2010-freq_domain optional input added
0116 %
0117 % 3/17/2013-Randomization is now compatible with Matlab v13. Thanks to
0118 % Aaron Newman for the fix.
0119 
0120 %This code has an appropriate false positive rate when run on simulated
0121 %data at one time point:
0122 %
0123 % tested with script: sim_mxt_perm2.m
0124 % n_sim=4000, n_perm=3000, data=2x3, 16 participants per group, Gaussian, alpha=.05
0125 % Two-tailed test: Empirical alpha rate: 0.0495 (SE=.0034)
0126 % Upper tailed test: Empirical alpha rate: 0.0503 (SE=0.0035)
0127 % Lower tailed test: Empirical alpha rate: 0.0500 (SE=.0034)
0128 %
0129 % n_sim=4000, n_perm=3000, data=2x3, 16 participants Group A, 20 Group B, Gaussian, alpha=.05
0130 % Lower tailed test: Empirical alpha rate: 0.0502 (SE=.0035)
0131 % Upper tailed test: Empirical alpha rate: 0.0483 (SE=.0034)
0132 
0133 %%%%%%%%%%%%%%%% FUTURE IMPROVEMENTS %%%%%%%%%%%%%%%%%
0134 %
0135 % -Test for equality of trials or variance?
0136 % -Add alternate test statistics?
0137 
0138 function [pval, t_orig, tmx_ptile, seed_state, est_alpha]=mxt_perm2(dataA,dataB,n_perm,alph,tail,verblevel,seed_state,freq_domain)
0139 
0140 if nargin<2,
0141     error('You need to provide data for two groups of participants.');
0142 end
0143 
0144 if nargin<3,
0145     n_perm=2000;
0146 end
0147 
0148 if nargin<4,
0149     alph=.05;
0150 elseif (alph>=1) || (alph<=0)
0151     error('Argument ''alph'' needs to be between 0 and 1.');
0152 end
0153 
0154 if alph<=.01 && n_perm<5000,
0155     watchit(sprintf('You are probably using too few permutations for an alpha level of %f. Type ">>help mxt_perm2" for more info.',alph));
0156 elseif alph<=.05 && n_perm<1000,
0157     watchit(sprintf('You are probably using too few permutations for an alpha level of %f. Type ">>help mxt_perm2" for more info.',alph));
0158 end
0159 
0160 if nargin<5,
0161     tail=0;
0162 elseif (tail~=0) && (tail~=1) && (tail~=-1),
0163     error('Argument ''tail'' needs to be 0,1, or -1.');
0164 end
0165 
0166 if nargin<6,
0167     verblevel=2;
0168 end
0169 
0170 %Get random # generator state
0171 if verLessThan('matlab','8.1')
0172     defaultStream=RandStream.getDefaultStream; 
0173 else
0174     defaultStream=RandStream.getGlobalStream;
0175 end
0176 if (nargin<7) || isempty(seed_state),
0177     %Store state of random number generator
0178     seed_state=defaultStream.State;
0179 else
0180     defaultStream.State=seed_state; %reset random number generator to saved state
0181 end
0182 
0183 if (nargin<8),
0184    freq_domain=0; 
0185 end
0186 
0187 if length(size(dataA))~=3 || length(size(dataB))~=3 
0188     error('dataA and dataB need to be three dimensional (chan x time x participant)')
0189 end
0190 [n_chan, n_tpt, n_subsA]=size(dataA);
0191 [n_chan2, n_tpt2, n_subsB]=size(dataB);
0192 
0193 if verblevel,
0194     warning('off','all'); %for large # of subjects, nchoosek warns that its result is approximate
0195     n_psbl_prms=nchoosek(n_subsA+n_subsB,n_subsA);
0196     if n_psbl_prms<100,
0197         watchit(sprintf(['Due to the very limited number of participants in each group,' ...
0198             ' the total number of possible permutations is small.\nThus only a limited number of p-values (at most %d) are possible and the test might be overly conservative.'], ...
0199             n_psbl_prms));
0200     end
0201     warning('on','all');
0202 end
0203 
0204 if n_chan~=n_chan2,
0205     error('The number of channels in Group 1 (dataA) and Group 2 (dataB) need to be equal.');
0206 elseif n_tpt~=n_tpt2,
0207     error('The number of time points in Group 1 (dataA) and Group 2 (dataB) need to be equal.');
0208 end
0209 %combine data
0210 total_subs=n_subsA+n_subsB;
0211 data=dataA; 
0212 data(:,:,(n_subsA+1):total_subs)=dataB;
0213 
0214 if verblevel~=0,
0215     fprintf('mxt_perm2: Number of channels: %d\n',n_chan);
0216     if freq_domain,
0217         fprintf('mxt_perm2: Number of frequencies: %d\n',n_tpt);
0218     else
0219         fprintf('mxt_perm2: Number of time points: %d\n',n_tpt);
0220     end
0221     fprintf('mxt_perm2: Total # of comparisons: %d\n',n_chan*n_tpt);
0222     fprintf('mxt_perm2: Number of participants in Group A: %d\n',n_subsA);
0223     fprintf('mxt_perm2: Number of participants in Group B: %d\n',n_subsB);
0224     fprintf('t-score degrees of freedom: %d\n',total_subs-2); %edit for alt t-test??
0225 end
0226 VerbReport(sprintf('Executing permutation test with %d permutations...',n_perm),2,verblevel);
0227 if (verblevel>=2),
0228     fprintf('Permutations completed: ');
0229 end
0230 
0231 % Factors that are used to compute t-scores.  Saves time to compute them
0232 % now rather than to compute them anew for each permutation.
0233 df=n_subsA+n_subsB-2;
0234 mult_fact=(n_subsA+n_subsB)/(n_subsA*n_subsB);
0235 
0236 mx_t=zeros(1,n_perm);
0237 for perm=1:n_perm
0238     if ~rem(perm,100)
0239         if (verblevel>=2),
0240             if ~rem(perm-100,1000)
0241                 fprintf('%d',perm);
0242             else
0243                 fprintf(', %d',perm);
0244             end
0245             if ~rem(perm,1000)
0246                 fprintf('\n');
0247             end
0248         end
0249     end
0250     %randomly assign participants to conditions
0251     r=randperm(total_subs);
0252     grp1=r(1:n_subsA);
0253     grp2=r((n_subsA+1):total_subs);
0254     %compute most extreme t-score
0255     mx_t(perm)=tmax2(data,grp1,grp2,n_subsA,n_subsB,df,mult_fact);
0256 end
0257     
0258 %End of permutations, print carriage return if it hasn't already been done
0259 %(i.e., perm is NOT a multiple of 1000)
0260 if (verblevel>=2) && rem(perm,1000)
0261     fprintf('\n');
0262 end
0263 
0264 %Compute critical t's
0265 if tail==0,
0266     %two-tailed, test statistic is biggest absolute value of all t's
0267     mx_t=abs(mx_t);
0268     tmx_ptile(2)=prctile(mx_t,100-100*alph);
0269     tmx_ptile(1)=-tmx_ptile(2);
0270     est_alpha=mean(mx_t>=tmx_ptile(2));
0271 elseif tail==1,
0272     %upper tailed
0273     tmx_ptile=prctile(mx_t,100-100*alph);
0274     est_alpha=mean(mx_t>=tmx_ptile);
0275 else
0276     %tail=-1, lower tailed
0277     tmx_ptile=prctile(mx_t,alph*100);
0278     est_alpha=mean(mx_t<=tmx_ptile);
0279 end
0280 if verblevel~=0,
0281     fprintf('Desired family-wise alpha level: %f\n',alph);
0282     fprintf('Estimated actual family-wise alpha level: %f\n',est_alpha);
0283 end
0284 
0285 %Compute t-scores of actual observations
0286 [dummy t_orig]=tmax2(data,1:n_subsA,(n_subsA+1):total_subs,n_subsA,n_subsB,df,mult_fact);
0287 pval=zeros(n_chan,n_tpt);
0288 %compute p-values
0289 for t=1:n_tpt,
0290     for c=1:n_chan,
0291         if tail==0,
0292             pval(c,t)=mean(mx_t>=abs(t_orig(c,t))); %note mx_t are now all positive due to abs command above
0293         elseif tail==1,
0294             pval(c,t)=mean(mx_t>=t_orig(c,t));
0295         elseif tail==-1,
0296             pval(c,t)=mean(mx_t<=t_orig(c,t));
0297         end
0298     end
0299 end
0300 
0301 
0302 function [mx_t, all_t]=tmax2(dat,grp1,grp2,n_subsA,n_subsB,df,mult_fact)
0303 % might make this faster by moving it into the code
0304 
0305 x1=dat(:,:,grp1);
0306 x2=dat(:,:,grp2);
0307 
0308 sm1=sum(x1,3);
0309 mn1=sm1/n_subsA;
0310 ss1=sum(x1.^2,3)-(sm1.^2)/n_subsA;
0311 
0312 sm2=sum(x2,3);
0313 mn2=sm2/n_subsB;
0314 ss2=sum(x2.^2,3)-(sm2.^2)/n_subsB;
0315 
0316 pooled_var=(ss1+ss2)/df;
0317 stder=sqrt(pooled_var*mult_fact);
0318 
0319 all_t=(mn1-mn2)./stder;
0320 [mx1 id_row]=max(abs(all_t));
0321 [mx2 id_col]=max(mx1);
0322 mx_t=all_t(id_row(id_col),id_col);
0323 
0324

Generated on Tue 10-May-2016 16:37:45 by m2html © 2005