function [pval, t_orig, clust_info, seed_state, est_alpha]=clust_perm1(data,chan_hood,n_perm,fwer,tail,thresh_p,verblevel,seed_state,freq_domain) clust_perm1-One sample cluster-based permutation test using the "cluster mass" statistic and a null hypothesis of a mean of zero. This function can handle multiple electrodes and time points/frequencies. This test was originally proposed for MRI data by Bullmore et al. (1999) and for EEG/MEG analysis by Maris & Oostenveld (2007). Required Inputs: data - 3D matrix of ERP data (Channel x Time x Participant) chan_hood - 2D symmetric binary matrix that indicates which channels are considered neighbors of other channels. E.g., if chan_hood(2,10)=1, then Channel 2 and Channel 10 are nieghbors. You can produce a chan_hood matrix using the function spatial_neighbors.m. Optional Inputs: n_perm - Number of permutations {default=2000}. Manly (1997) suggests using at least 1000 permutations for an alpha level of 0.05 and at least 5000 permutations for an alpha level of 0.01. fwer - Desired family-wise error rate (i.e., alpha level) {default=.05} tail - [1 | 0 | -1] If tail=1, the alternative hypothesis is that the mean of the data is greater than 0 (upper tailed test). If tail=0, the alternative hypothesis is that the mean of the data is different than 0 (two tailed test). If tail=-1, the alternative hypothesis is that the mean of the data is less than 0 (lower tailed test). {default: 0} thresh_p - The test-wise p-value threshold for cluster inclusion. If a channel/time-point has a t-score that corresponds to an uncorrected p-value greater than thresh_p, it is assigned a p-value of 1 and not considered for clustering. Note that thresh_p automatically takes into account the tail of the test (e.g., you will get positive and negative t-score thresholds for a two-tailed test). verblevel - An integer specifying the amount of information you want this function to provide about what it is doing during runtime. Options are: 0 - quiet, only show errors, warnings, and EEGLAB reports 1 - stuff anyone should probably know 2 - stuff you should know the first time you start working with a data set {default value} 3 - stuff that might help you debug (show all reports) seed_state - The initial state of the random number generating stream (see MATLAB documentation for "randstream"). If you pass a value from a previous run of this function, it should reproduce the exact same values. freq_domain - If 0, command line report will be given in temporal units (e.g. time points). Otherwise, the report will be given in frequency domain units (e.g., frequencies). {default: 0} Outputs: pval - p-value at each time point and electrode (corrected for multiple comparisons via the permutation test) t_orig - t-score at each time point and electrode clust_info - A struct variable containing information about the clusters found. Depending on the tail of the test it will be composed of all or half of the following fields: pos_clust_pval: p-values of the positive clusters pos_clust_mass: t-score mass of the positive clusters pos_clust_ids: channel x time point matrix that indicated which positive cluster each channel/time point pair belongs to. E.g., if pos_clust_ids(1,2)=4, then the second time point of the first channel is a member of the fourth cluster. 0 indicates that the channel/time point is not a member of any positive cluster. neg_clust_pval: p-values of the negative clusters neg_clust_mass: t-score mass of the negative clusters neg_clust_ids: channel x time point matrix that indicated which negative cluster each channel/time point pair belongs to. E.g., if neg_clust_ids(1,2)=4, then the second time point of the first channel is a member of the fourth cluster. 0 indicates that the channel/time point is not a member of any negative cluster. seed_state - The initial state of the random number generating stream (see MATLAB documentation for "randstream") used to generate the permutations. You can use this to reproduce the output of this function. est_alpha - The estimated family-wise alpha level of the test. With permutation tests, a finite number of p-values are possible. This function tries to use an alpha level that is as close as possible to the desired alpha level. However, if the sample size is small, a very limited number of p-values are possible and the desired family-wise alpha level may be impossible to approximately achieve. Notes: -Unlike a parametric test (e.g., an ANOVA), a discrete set of p-values are possible (at most the number of possible permutations). Since the number of possible permutations grows rapidly with the number of participants, this is only an issue for small sample sizes (e.g., 6 participants). When you have such a small sample size, the limited number of p-values may make the test overly conservative (e.g., you might be forced to use an alpha level of .0286 since it is the biggest possible alpha level less than .05). -For two-tailed tests clust_perm1.m effectively performs an upper-tailed test and a lower-tailed test and multiplies the resulting p-values by 2. In other words, it does two tests and then applies Bonferroni correction. This is faster than performing a "proper" two-tailed permutation test since you only have to form clusters for one polarity (e.g., if you form negative clusters for each permutation, the negative of that distribution will be valid for computing p-values for postive clusters) but can result in p-values greater than 1. Author: David Groppe May, 2011 Kutaslab, San Diego References: Bullmore, E. T., Suckling, J., Overmeyer, S., Rabe-Hesketh, S., Taylor, E., & Brammer, M. J. (1999). Global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural MR images of the brain. IEEE Transactions on Medical Imaging, 18(1), 32-42. doi:10.1109/42.750253 Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177-190. doi:10.1016/j.jneumeth.2007.03.024 Manly, B.F.J. (1997) Randomization, bootstrap, and Monte Carlo methods in biology. 2nd ed. Chapmn and Hall, London.
0001 %function [pval, t_orig, clust_info, seed_state, est_alpha]=clust_perm1(data,chan_hood,n_perm,fwer,tail,thresh_p,verblevel,seed_state,freq_domain) 0002 % 0003 % clust_perm1-One sample cluster-based permutation test using the "cluster 0004 % mass" statistic and a null hypothesis of a mean of zero. This 0005 % function can handle multiple electrodes and time points/frequencies. 0006 % This test was originally proposed for MRI data by Bullmore et al. 0007 % (1999) and for EEG/MEG analysis by Maris & Oostenveld (2007). 0008 % 0009 % Required Inputs: 0010 % data - 3D matrix of ERP data (Channel x Time x Participant) 0011 % chan_hood - 2D symmetric binary matrix that indicates which channels are 0012 % considered neighbors of other channels. E.g., if 0013 % chan_hood(2,10)=1, then Channel 2 and Channel 10 are 0014 % nieghbors. You can produce a chan_hood matrix using the 0015 % function spatial_neighbors.m. 0016 % 0017 % Optional Inputs: 0018 % n_perm - Number of permutations {default=2000}. Manly (1997) suggests 0019 % using at least 1000 permutations for an alpha level of 0.05 and 0020 % at least 5000 permutations for an alpha level of 0.01. 0021 % fwer - Desired family-wise error rate (i.e., alpha level) {default=.05} 0022 % tail - [1 | 0 | -1] If tail=1, the alternative hypothesis is that the 0023 % mean of the data is greater than 0 (upper tailed test). If tail=0, 0024 % the alternative hypothesis is that the mean of the data is different 0025 % than 0 (two tailed test). If tail=-1, the alternative hypothesis 0026 % is that the mean of the data is less than 0 (lower tailed test). 0027 % {default: 0} 0028 % thresh_p - The test-wise p-value threshold for cluster inclusion. If 0029 % a channel/time-point has a t-score that corresponds to an 0030 % uncorrected p-value greater than thresh_p, it is assigned 0031 % a p-value of 1 and not considered for clustering. Note 0032 % that thresh_p automatically takes into account the tail of 0033 % the test (e.g., you will get positive and negative t-score 0034 % thresholds for a two-tailed test). 0035 % verblevel - An integer specifying the amount of information you want 0036 % this function to provide about what it is doing during runtime. 0037 % Options are: 0038 % 0 - quiet, only show errors, warnings, and EEGLAB reports 0039 % 1 - stuff anyone should probably know 0040 % 2 - stuff you should know the first time you start working 0041 % with a data set {default value} 0042 % 3 - stuff that might help you debug (show all 0043 % reports) 0044 % seed_state - The initial state of the random number generating stream 0045 % (see MATLAB documentation for "randstream"). If you pass 0046 % a value from a previous run of this function, it should 0047 % reproduce the exact same values. 0048 % freq_domain - If 0, command line report will be given in temporal units 0049 % (e.g. time points). Otherwise, the report will be given 0050 % in frequency domain units (e.g., frequencies). {default: 0051 % 0} 0052 % 0053 % 0054 % Outputs: 0055 % pval - p-value at each time point and electrode (corrected for 0056 % multiple comparisons via the permutation test) 0057 % t_orig - t-score at each time point and electrode 0058 % clust_info - A struct variable containing information about the 0059 % clusters found. Depending on the tail of the test it will 0060 % be composed of all or half of the following fields: 0061 % pos_clust_pval: p-values of the positive clusters 0062 % pos_clust_mass: t-score mass of the positive clusters 0063 % pos_clust_ids: channel x time point matrix that 0064 % indicated which positive cluster each channel/time point 0065 % pair belongs to. E.g., if pos_clust_ids(1,2)=4, then 0066 % the second time point of the first channel is a 0067 % member of the fourth cluster. 0 indicates that the 0068 % channel/time point is not a member of any positive 0069 % cluster. 0070 % neg_clust_pval: p-values of the negative clusters 0071 % neg_clust_mass: t-score mass of the negative clusters 0072 % neg_clust_ids: channel x time point matrix that 0073 % indicated which negative cluster each channel/time point 0074 % pair belongs to. E.g., if neg_clust_ids(1,2)=4, then 0075 % the second time point of the first channel is a 0076 % member of the fourth cluster. 0 indicates that the 0077 % channel/time point is not a member of any negative 0078 % cluster. 0079 % seed_state - The initial state of the random number generating stream 0080 % (see MATLAB documentation for "randstream") used to 0081 % generate the permutations. You can use this to reproduce 0082 % the output of this function. 0083 % est_alpha - The estimated family-wise alpha level of the test. With 0084 % permutation tests, a finite number of p-values are possible. 0085 % This function tries to use an alpha level that is as close 0086 % as possible to the desired alpha level. However, if the 0087 % sample size is small, a very limited number of p-values are 0088 % possible and the desired family-wise alpha level may be 0089 % impossible to approximately achieve. 0090 % 0091 % 0092 % Notes: 0093 % -Unlike a parametric test (e.g., an ANOVA), a discrete set of p-values 0094 % are possible (at most the number of possible permutations). Since the 0095 % number of possible permutations grows rapidly with the number of 0096 % participants, this is only an issue for small sample sizes (e.g., 6 0097 % participants). When you have such a small sample size, the 0098 % limited number of p-values may make the test overly conservative (e.g., 0099 % you might be forced to use an alpha level of .0286 since it is the biggest 0100 % possible alpha level less than .05). 0101 % 0102 % -For two-tailed tests clust_perm1.m effectively performs an upper-tailed 0103 % test and a lower-tailed test and multiplies the resulting p-values by 2. 0104 % In other words, it does two tests and then applies Bonferroni correction. 0105 % This is faster than performing a "proper" two-tailed permutation test 0106 % since you only have to form clusters for one polarity (e.g., if you form 0107 % negative clusters for each permutation, the negative of that distribution 0108 % will be valid for computing p-values for postive clusters) but can 0109 % result in p-values greater than 1. 0110 % 0111 % Author: 0112 % David Groppe 0113 % May, 2011 0114 % Kutaslab, San Diego 0115 % 0116 % References: 0117 % Bullmore, E. T., Suckling, J., Overmeyer, S., Rabe-Hesketh, S., Taylor, 0118 % E., & Brammer, M. J. (1999). Global, voxel, and cluster tests, by theory 0119 % and permutation, for a difference between two groups of structural MR 0120 % images of the brain. IEEE Transactions on Medical Imaging, 18(1), 32-42. 0121 % doi:10.1109/42.750253 0122 % 0123 % Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of 0124 % EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177-190. 0125 % doi:10.1016/j.jneumeth.2007.03.024 0126 % 0127 % Manly, B.F.J. (1997) Randomization, bootstrap, and Monte Carlo methods in 0128 % biology. 2nd ed. Chapmn and Hall, London. 0129 0130 %%%%%%%%%%%%%%%% REVISION LOG %%%%%%%%%%%%%%%%% 0131 % 0132 % 12/11/2011-Now uses Amy Guthormsen's recursiveless find_clusters.m. 0133 % 0134 % 3/17/2013-Randomization is now compatible with Matlab v13. Thanks to 0135 % Aaron Newman for the fix. 0136 % 0137 0138 %This code has an appropriate false positive rate when run on Gaussian 0139 %noise (26 channels, 25 time points, 16 subs, two-tailed test). Done with 0140 %sim_test_clust1.m 0141 % 0142 0143 0144 function [pval, t_orig, clust_info, seed_state, est_alpha]=clust_perm1(data,chan_hood,n_perm,fwer,tail,thresh_p,verblevel,seed_state,freq_domain) 0145 0146 if nargin<1, 0147 error('You need to provide data.'); 0148 end 0149 0150 if nargin<2, 0151 error('You need to provide a chan_hood matrix.'); 0152 end 0153 0154 if nargin<3, 0155 n_perm=2000; 0156 end 0157 0158 if nargin<4, 0159 fwer=.05; 0160 elseif (fwer>=1) || (fwer<=0) 0161 error('Argument ''fwer'' needs to be between 0 and 1.'); 0162 end 0163 0164 if fwer<=.01 && n_perm<5000, 0165 watchit(sprintf('You are probably using too few permutations for a FWER (i.e., alpha level) of %f. Type ">>help clust_perm1" for more info.',fwer)); 0166 elseif fwer<=.05 && n_perm<1000, 0167 watchit(sprintf('You are probably using too few permutations for a FWER (i.e., alpha level) of %f. Type ">>help clust_perm1" for more info.',fwer)); 0168 end 0169 0170 if nargin<5, 0171 tail=0; 0172 elseif (tail~=0) && (tail~=1) && (tail~=-1), 0173 error('Argument ''tail'' needs to be 0,1, or -1.'); 0174 end 0175 0176 if nargin<6, 0177 thresh_p=.05; 0178 elseif thresh_p<=0 || thresh_p>1, 0179 error('Argument thresh_p needs to take a value between 0 and 1'); 0180 end 0181 0182 if nargin<7, 0183 verblevel=2; 0184 end 0185 0186 %Get random # generator state 0187 if verLessThan('matlab','8.1') 0188 defaultStream=RandStream.getDefaultStream; 0189 else 0190 defaultStream=RandStream.getGlobalStream; 0191 end 0192 if (nargin<8) || isempty(seed_state), 0193 %Store state of random number generator 0194 seed_state=defaultStream.State; 0195 else 0196 defaultStream.State=seed_state; %reset random number generator to saved state 0197 end 0198 0199 if (nargin<9), 0200 freq_domain=0; 0201 end 0202 0203 0204 s=size(data); 0205 n_chan=s(1); 0206 n_pts=s(2); 0207 n_subs=s(3); 0208 if n_subs<2, 0209 error('You need data from at least two observations (e.g., participants) to perform a hypothesis test.') 0210 end 0211 0212 if n_subs<7, 0213 n_psbl_prms=2^n_subs; 0214 watchit(sprintf(['Due to the very limited number of participants,' ... 0215 ' the total number of possible permutations is small.\nThus only a limited number of p-values (at most %d) are possible and the test might be overly conservative.'], ... 0216 n_psbl_prms)); 0217 end 0218 0219 df=n_subs-1; %t-score degrees of freedom 0220 if verblevel~=0, 0221 fprintf('clust_perm1: Number of channels: %d\n',n_chan); 0222 if freq_domain, 0223 fprintf('clust_perm1: Number of frequencies: %d\n',n_pts); 0224 else 0225 fprintf('clust_perm1: Number of time points: %d\n',n_pts); 0226 end 0227 fprintf('clust_perm1: Total # of comparisons: %d\n',n_pts*n_chan); 0228 fprintf('clust_perm1: Number of participants: %d\n',n_subs); 0229 fprintf('t-score degrees of freedom: %d\n',df); 0230 end 0231 0232 if tail 0233 %one tailed test 0234 thresh_t=tinv(thresh_p,df); %note unless thresh_p is greater than .5, thresh_t will be negative 0235 else 0236 %two tailed test 0237 thresh_t=tinv(thresh_p/2,df); 0238 end 0239 0240 VerbReport(sprintf('Executing a cluster-based permutation test with %d permutations...',n_perm),2,verblevel); 0241 if (verblevel>=2), 0242 fprintf('Permutations completed: '); 0243 end 0244 0245 0246 %Constant factor for computing t, speeds up computing t to precalculate 0247 %now 0248 sqrt_nXnM1=sqrt(n_subs*(n_subs-1)); 0249 0250 mn_clust_mass=zeros(1,n_perm); 0251 sn=zeros(1,1,n_subs); 0252 for perm=1:n_perm 0253 if ~rem(perm,100) 0254 if (verblevel>=2), 0255 if ~rem(perm-100,1000) 0256 fprintf('%d',perm); 0257 else 0258 fprintf(', %d',perm); 0259 end 0260 if ~rem(perm,1000) 0261 fprintf('\n'); 0262 end 0263 end 0264 end 0265 %randomly set sign of each participant's data 0266 sn(1,1,1:n_subs)=(rand(1,n_subs)>.5)*2-1; 0267 sn_mtrx=repmat(sn,[n_chan n_pts 1]); 0268 0269 d_perm=data.*sn_mtrx; 0270 0271 %computes t-score of permuted data across all channels and time 0272 %points or frequencies 0273 sm=sum(d_perm,3); 0274 mn=sm/n_subs; 0275 sm_sqrs=sum(d_perm.^2,3)-(sm.^2)/n_subs; 0276 stder=sqrt(sm_sqrs)/sqrt_nXnM1; 0277 t=mn./stder; 0278 0279 [clust_ids, n_clust]=find_clusters(t,thresh_t,chan_hood,-1); 0280 0281 %get most extremely negative t-score (sign isn't important since we asumme 0282 %symmetric distribution of null hypothesis for one sample test) 0283 mn_clust_mass(perm)=find_mn_mass(clust_ids,t,n_clust); 0284 end 0285 0286 0287 %End permutations completed line 0288 if (verblevel>=2) && rem(perm,1000) 0289 fprintf('\n'); 0290 end 0291 0292 % Estimate true FWER of test 0293 if tail==0, 0294 %two-tailed 0295 tmx_ptile=prctile(mn_clust_mass,100*fwer/2); 0296 est_alpha=mean(mn_clust_mass<=tmx_ptile)*2; 0297 else 0298 %one tailed 0299 tmx_ptile=prctile(mn_clust_mass,100*fwer); 0300 est_alpha=mean(mn_clust_mass<=tmx_ptile); 0301 end 0302 if verblevel~=0, 0303 fprintf('Desired family-wise error rate: %f\n',fwer); 0304 fprintf('Estimated actual family-wise error rate: %f\n',est_alpha); 0305 end 0306 0307 %computes t-scores of observations at all channels and time 0308 %points/frequencies 0309 sm=sum(data,3); 0310 mn=sm/n_subs; 0311 sm_sqrs=sum(data.^2,3)-(sm.^2)/n_subs; 0312 stder=sqrt(sm_sqrs)/sqrt_nXnM1; 0313 t_orig=mn./stder; 0314 0315 0316 %compute p-values 0317 pval=ones(n_chan,n_pts); 0318 if tail==0, 0319 %positive clusters 0320 [clust_ids, n_clust]=find_clusters(t_orig,-thresh_t,chan_hood,1); %note thresh_t is negative by default 0321 0322 clust_info.pos_clust_pval=ones(1,n_clust); 0323 clust_info.pos_clust_mass=zeros(1,n_clust); 0324 clust_info.pos_clust_ids=clust_ids; 0325 for a=1:n_clust, 0326 use_ids=find(clust_ids==a); 0327 clust_mass=sum(t_orig(use_ids)); 0328 clust_p=mean(mn_clust_mass<=(-clust_mass))*2; %multiply by 2 since we're effectively doing Bonferroni correcting for doing two tests (an upper tail and lower tail) 0329 pval(use_ids)=clust_p; 0330 clust_info.pos_clust_pval(a)=clust_p; 0331 clust_info.pos_clust_mass(a)=clust_mass; 0332 end 0333 0334 %negative clusters 0335 [clust_ids, n_clust]=find_clusters(t_orig,thresh_t,chan_hood,-1); %note thresh_t is negative by default 0336 clust_info.neg_clust_pval=ones(1,n_clust); 0337 clust_info.neg_clust_mass=zeros(1,n_clust); 0338 clust_info.neg_clust_ids=clust_ids; 0339 for a=1:n_clust, 0340 use_ids=find(clust_ids==a); 0341 clust_mass=sum(t_orig(use_ids)); 0342 clust_p=mean(mn_clust_mass<=clust_mass)*2; %multiply by 2 since we're effectively doing Bonferroni correcting for doing two tests (an upper tail and lower tail) 0343 pval(use_ids)=clust_p; 0344 clust_info.neg_clust_pval(a)=clust_p; 0345 clust_info.neg_clust_mass(a)=clust_mass; 0346 end 0347 elseif tail==1, 0348 %upper tailed 0349 [clust_ids, n_clust]=find_clusters(t_orig,-thresh_t,chan_hood,1); %note thresh_t is negative by default 0350 clust_info.pos_clust_pval=ones(1,n_clust); 0351 clust_info.pos_clust_mass=zeros(1,n_clust); 0352 clust_info.pos_clust_ids=clust_ids; 0353 for a=1:n_clust, 0354 use_ids=find(clust_ids==a); 0355 clust_mass=sum(t_orig(use_ids)); 0356 clust_p=mean(mn_clust_mass<=(-clust_mass)); 0357 pval(use_ids)=clust_p; 0358 clust_info.pos_clust_pval(a)=clust_p; 0359 clust_info.pos_clust_mass(a)=clust_mass; 0360 end 0361 else 0362 %lower tailed 0363 [clust_ids, n_clust]=find_clusters(t_orig,thresh_t,chan_hood,-1); %note thresh_t is negative by default 0364 clust_info.neg_clust_pval=ones(1,n_clust); 0365 clust_info.neg_clust_mass=zeros(1,n_clust); 0366 clust_info.neg_clust_ids=clust_ids; 0367 for a=1:n_clust, 0368 use_ids=find(clust_ids==a); 0369 clust_mass=sum(t_orig(use_ids)); 0370 clust_p=mean(mn_clust_mass<=clust_mass); 0371 pval(use_ids)=clust_p; 0372 clust_info.neg_clust_pval(a)=clust_p; 0373 clust_info.neg_clust_mass(a)=clust_mass; 0374 end 0375 end 0376 0377 %%% End of Main Function %%% 0378 0379 0380 function mn_clust_mass=find_mn_mass(clust_ids,data_t,n_clust) 0381 0382 mn_clust_mass=0; 0383 0384 %looking for most negative cluster mass 0385 for z=1:n_clust, 0386 use_ids=(clust_ids==z); 0387 use_mass=sum(data_t(use_ids)); 0388 if use_mass<mn_clust_mass, 0389 mn_clust_mass=use_mass; 0390 end 0391 end