Description of clust

function [pval, t_orig, clust_info, seed_state, est_alpha]=clust_perm1(data,chan_hood,n_perm,fwer,tail,thresh_p,verblevel,seed_state,freq_domain)

clust_perm1-One sample cluster-based permutation test using the "cluster
mass" statistic and a null hypothesis of a mean of zero. This
function can handle multiple electrodes and time points/frequencies.
This test was originally proposed for MRI data by Bullmore et al.
(1999) and for EEG/MEG analysis by Maris & Oostenveld (2007).

Required Inputs:
data - 3D matrix of ERP data (Channel x Time x Participant)
chan_hood - 2D symmetric binary matrix that indicates which channels are
considered neighbors of other channels. E.g., if
chan_hood(2,10)=1, then Channel 2 and Channel 10 are
nieghbors. You can produce a chan_hood matrix using the
function spatial_neighbors.m.

Optional Inputs:
n_perm - Number of permutations {default=2000}. Manly (1997) suggests
using at least 1000 permutations for an alpha level of 0.05 and
at least 5000 permutations for an alpha level of 0.01.
fwer - Desired family-wise error rate (i.e., alpha level) {default=.05}
tail - [1 | 0 | -1] If tail=1, the alternative hypothesis is that the
mean of the data is greater than 0 (upper tailed test). If tail=0,
the alternative hypothesis is that the mean of the data is different
than 0 (two tailed test). If tail=-1, the alternative hypothesis
is that the mean of the data is less than 0 (lower tailed test).
{default: 0}
thresh_p - The test-wise p-value threshold for cluster inclusion. If
a channel/time-point has a t-score that corresponds to an
uncorrected p-value greater than thresh_p, it is assigned
a p-value of 1 and not considered for clustering. Note
that thresh_p automatically takes into account the tail of
the test (e.g., you will get positive and negative t-score
thresholds for a two-tailed test).
verblevel - An integer specifying the amount of information you want
this function to provide about what it is doing during runtime.
Options are:
0 - quiet, only show errors, warnings, and EEGLAB reports
1 - stuff anyone should probably know
2 - stuff you should know the first time you start working
with a data set {default value}
3 - stuff that might help you debug (show all
reports)
seed_state - The initial state of the random number generating stream
(see MATLAB documentation for "randstream"). If you pass
a value from a previous run of this function, it should
reproduce the exact same values.
freq_domain - If 0, command line report will be given in temporal units
(e.g. time points). Otherwise, the report will be given
in frequency domain units (e.g., frequencies). {default:
0}

Outputs:
pval - p-value at each time point and electrode (corrected for
multiple comparisons via the permutation test)
t_orig - t-score at each time point and electrode
clust_info - A struct variable containing information about the
clusters found. Depending on the tail of the test it will
be composed of all or half of the following fields:
pos_clust_pval: p-values of the positive clusters
pos_clust_mass: t-score mass of the positive clusters
pos_clust_ids: channel x time point matrix that
indicated which positive cluster each channel/time point
pair belongs to. E.g., if pos_clust_ids(1,2)=4, then
the second time point of the first channel is a
member of the fourth cluster. 0 indicates that the
channel/time point is not a member of any positive
cluster.
neg_clust_pval: p-values of the negative clusters
neg_clust_mass: t-score mass of the negative clusters
neg_clust_ids: channel x time point matrix that
indicated which negative cluster each channel/time point
pair belongs to. E.g., if neg_clust_ids(1,2)=4, then
the second time point of the first channel is a
member of the fourth cluster. 0 indicates that the
channel/time point is not a member of any negative
cluster.
seed_state - The initial state of the random number generating stream
(see MATLAB documentation for "randstream") used to
generate the permutations. You can use this to reproduce
the output of this function.
est_alpha - The estimated family-wise alpha level of the test. With
permutation tests, a finite number of p-values are possible.
This function tries to use an alpha level that is as close
as possible to the desired alpha level. However, if the
sample size is small, a very limited number of p-values are
possible and the desired family-wise alpha level may be
impossible to approximately achieve.

Notes:
-Unlike a parametric test (e.g., an ANOVA), a discrete set of p-values
are possible (at most the number of possible permutations). Since the
number of possible permutations grows rapidly with the number of
participants, this is only an issue for small sample sizes (e.g., 6
participants). When you have such a small sample size, the
limited number of p-values may make the test overly conservative (e.g.,
you might be forced to use an alpha level of .0286 since it is the biggest
possible alpha level less than .05).

-For two-tailed tests clust_perm1.m effectively performs an upper-tailed
test and a lower-tailed test and multiplies the resulting p-values by 2.
In other words, it does two tests and then applies Bonferroni correction.
This is faster than performing a "proper" two-tailed permutation test
since you only have to form clusters for one polarity (e.g., if you form
negative clusters for each permutation, the negative of that distribution
will be valid for computing p-values for postive clusters) but can
result in p-values greater than 1.

Author:
David Groppe
May, 2011
Kutaslab, San Diego

References:
Bullmore, E. T., Suckling, J., Overmeyer, S., Rabe-Hesketh, S., Taylor,
E., & Brammer, M. J. (1999). Global, voxel, and cluster tests, by theory
and permutation, for a difference between two groups of structural MR
images of the brain. IEEE Transactions on Medical Imaging, 18(1), 32-42.
doi:10.1109/42.750253

Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of
EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177-190.
doi:10.1016/j.jneumeth.2007.03.024

Manly, B.F.J. (1997) Randomization, bootstrap, and Monte Carlo methods in
biology. 2nd ed. Chapmn and Hall, London.

0001 %function [pval, t_orig, clust_info, seed_state, est_alpha]=clust_perm1(data,chan_hood,n_perm,fwer,tail,thresh_p,verblevel,seed_state,freq_domain)
0002 %
0003 % clust_perm1-One sample cluster-based permutation test using the "cluster
0004 %             mass" statistic and a null hypothesis of a mean of zero.  This
0005 %             function can handle multiple electrodes and time points/frequencies.
0006 %             This test was originally proposed for MRI data by Bullmore et al.
0007 %             (1999) and for EEG/MEG analysis by Maris & Oostenveld (2007).
0008 %
0009 % Required Inputs:
0010 %  data      - 3D matrix of ERP data (Channel x Time x Participant)
0011 %  chan_hood - 2D symmetric binary matrix that indicates which channels are
0012 %              considered neighbors of other channels. E.g., if
0013 %              chan_hood(2,10)=1, then Channel 2 and Channel 10 are
0014 %              nieghbors. You can produce a chan_hood matrix using the
0015 %              function spatial_neighbors.m.
0016 %
0017 % Optional Inputs:
0018 %  n_perm          - Number of permutations {default=2000}.  Manly (1997) suggests
0019 %                    using at least 1000 permutations for an alpha level of 0.05 and
0020 %                    at least 5000 permutations for an alpha level of 0.01.
0021 %  fwer            - Desired family-wise error rate (i.e., alpha level) {default=.05}
0022 %  tail            - [1 | 0 | -1] If tail=1, the alternative hypothesis is that the
0023 %                    mean of the data is greater than 0 (upper tailed test).  If tail=0,
0024 %                    the alternative hypothesis is that the mean of the data is different
0025 %                    than 0 (two tailed test).  If tail=-1, the alternative hypothesis
0026 %                    is that the mean of the data is less than 0 (lower tailed test).
0027 %                    {default: 0}
0028 %  thresh_p        - The test-wise p-value threshold for cluster inclusion. If
0029 %                    a channel/time-point has a t-score that corresponds to an
0030 %                    uncorrected p-value greater than thresh_p, it is assigned
0031 %                    a p-value of 1 and not considered for clustering. Note
0032 %                    that thresh_p automatically takes into account the tail of
0033 %                    the test (e.g., you will get positive and negative t-score
0034 %                    thresholds for a two-tailed test).
0035 %  verblevel       - An integer specifying the amount of information you want
0036 %                    this function to provide about what it is doing during runtime.
0037 %                     Options are:
0038 %                      0 - quiet, only show errors, warnings, and EEGLAB reports
0039 %                      1 - stuff anyone should probably know
0040 %                      2 - stuff you should know the first time you start working
0041 %                          with a data set {default value}
0042 %                      3 - stuff that might help you debug (show all
0043 %                        reports)
0044 %  seed_state      - The initial state of the random number generating stream
0045 %                    (see MATLAB documentation for "randstream"). If you pass
0046 %                    a value from a previous run of this function, it should
0047 %                    reproduce the exact same values.
0048 %  freq_domain     - If 0, command line report will be given in temporal units
0049 %                    (e.g. time points).  Otherwise, the report will be given
0050 %                    in frequency domain units (e.g., frequencies). {default:
0051 %                    0}
0052 %
0053 %
0054 % Outputs:
0055 %  pval       - p-value at each time point and electrode (corrected for
0056 %               multiple comparisons via the permutation test)
0057 %  t_orig     - t-score at each time point and electrode
0058 %  clust_info - A struct variable containing information about the
0059 %               clusters found.  Depending on the tail of the test it will
0060 %               be composed of all or half of the following fields:
0061 %                 pos_clust_pval: p-values of the positive clusters
0062 %                 pos_clust_mass: t-score mass of the positive clusters
0063 %                 pos_clust_ids:  channel x time point matrix that
0064 %                   indicated which positive cluster each channel/time point
0065 %                   pair belongs to. E.g., if pos_clust_ids(1,2)=4, then
0066 %                   the second time point of the first channel is a
0067 %                   member of the fourth cluster. 0 indicates that the
0068 %                   channel/time point is not a member of any positive
0069 %                   cluster.
0070 %                 neg_clust_pval: p-values of the negative clusters
0071 %                 neg_clust_mass: t-score mass of the negative clusters
0072 %                 neg_clust_ids:  channel x time point matrix that
0073 %                   indicated which negative cluster each channel/time point
0074 %                   pair belongs to. E.g., if neg_clust_ids(1,2)=4, then
0075 %                   the second time point of the first channel is a
0076 %                   member of the fourth cluster. 0 indicates that the
0077 %                   channel/time point is not a member of any negative
0078 %                   cluster.
0079 %  seed_state - The initial state of the random number generating stream
0080 %               (see MATLAB documentation for "randstream") used to
0081 %               generate the permutations. You can use this to reproduce
0082 %               the output of this function.
0083 %  est_alpha  - The estimated family-wise alpha level of the test.  With
0084 %               permutation tests, a finite number of p-values are possible.
0085 %               This function tries to use an alpha level that is as close
0086 %               as possible to the desired alpha level.  However, if the
0087 %               sample size is small, a very limited number of p-values are
0088 %               possible and the desired family-wise alpha level may be
0089 %               impossible to approximately achieve.
0090 %
0091 %
0092 % Notes:
0093 % -Unlike a parametric test (e.g., an ANOVA), a discrete set of p-values
0094 % are possible (at most the number of possible permutations).  Since the
0095 % number of possible permutations grows rapidly with the number of
0096 % participants, this is only an issue for small sample sizes (e.g., 6
0097 % participants).  When you have such a small sample size, the
0098 % limited number of p-values may make the test overly conservative (e.g.,
0099 % you might be forced to use an alpha level of .0286 since it is the biggest
0100 % possible alpha level less than .05).
0101 %
0102 % -For two-tailed tests clust_perm1.m effectively performs an upper-tailed
0103 % test and a lower-tailed test and multiplies the resulting p-values by 2.
0104 % In other words, it does two tests and then applies Bonferroni correction.
0105 % This is faster than performing a "proper" two-tailed permutation test
0106 % since you only have to form clusters for one polarity (e.g., if you form
0107 % negative clusters for each permutation, the negative of that distribution
0108 % will be valid for computing p-values for postive clusters) but can
0109 % result in p-values greater than 1.
0110 %
0111 % Author:
0112 % David Groppe
0113 % May, 2011
0114 % Kutaslab, San Diego
0115 %
0116 % References:
0117 % Bullmore, E. T., Suckling, J., Overmeyer, S., Rabe-Hesketh, S., Taylor,
0118 % E., & Brammer, M. J. (1999). Global, voxel, and cluster tests, by theory
0119 % and permutation, for a difference between two groups of structural MR
0120 % images of the brain. IEEE Transactions on Medical Imaging, 18(1), 32-42.
0121 % doi:10.1109/42.750253
0122 %
0123 % Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of
0124 % EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177-190.
0125 % doi:10.1016/j.jneumeth.2007.03.024
0126 %
0127 % Manly, B.F.J. (1997) Randomization, bootstrap, and Monte Carlo methods in
0128 % biology. 2nd ed. Chapmn and Hall, London.
0129 
0130 %%%%%%%%%%%%%%%% REVISION LOG %%%%%%%%%%%%%%%%%
0131 %
0132 % 12/11/2011-Now uses Amy Guthormsen's recursiveless find_clusters.m.
0133 %
0134 % 3/17/2013-Randomization is now compatible with Matlab v13. Thanks to
0135 % Aaron Newman for the fix.
0136 %
0137 
0138 %This code has an appropriate false positive rate when run on Gaussian
0139 %noise (26 channels, 25 time points, 16 subs, two-tailed test). Done with
0140 %sim_test_clust1.m
0141 %
0142 
0143 
0144 function [pval, t_orig, clust_info, seed_state, est_alpha]=clust_perm1(data,chan_hood,n_perm,fwer,tail,thresh_p,verblevel,seed_state,freq_domain)
0145 
0146 if nargin<1,
0147     error('You need to provide data.');
0148 end
0149 
0150 if nargin<2,
0151     error('You need to provide a chan_hood matrix.');
0152 end
0153 
0154 if nargin<3,
0155     n_perm=2000;
0156 end
0157 
0158 if nargin<4,
0159     fwer=.05;
0160 elseif (fwer>=1) || (fwer<=0)
0161     error('Argument ''fwer'' needs to be between 0 and 1.');
0162 end
0163 
0164 if fwer<=.01 && n_perm<5000,
0165     watchit(sprintf('You are probably using too few permutations for a FWER (i.e., alpha level) of %f. Type ">>help clust_perm1" for more info.',fwer));
0166 elseif fwer<=.05 && n_perm<1000,
0167     watchit(sprintf('You are probably using too few permutations for a FWER (i.e., alpha level) of %f. Type ">>help clust_perm1" for more info.',fwer));
0168 end
0169 
0170 if nargin<5,
0171     tail=0;
0172 elseif (tail~=0) && (tail~=1) && (tail~=-1),
0173     error('Argument ''tail'' needs to be 0,1, or -1.');
0174 end
0175 
0176 if nargin<6,
0177     thresh_p=.05;
0178 elseif thresh_p<=0 || thresh_p>1,
0179     error('Argument thresh_p needs to take a value between 0 and 1');
0180 end
0181 
0182 if nargin<7,
0183     verblevel=2;
0184 end
0185 
0186 %Get random # generator state
0187 if verLessThan('matlab','8.1')
0188     defaultStream=RandStream.getDefaultStream; 
0189 else
0190     defaultStream=RandStream.getGlobalStream;
0191 end
0192 if (nargin<8) || isempty(seed_state),
0193     %Store state of random number generator
0194     seed_state=defaultStream.State;
0195 else
0196     defaultStream.State=seed_state; %reset random number generator to saved state
0197 end
0198 
0199 if (nargin<9),
0200     freq_domain=0;
0201 end
0202 
0203 
0204 s=size(data);
0205 n_chan=s(1);
0206 n_pts=s(2);
0207 n_subs=s(3);
0208 if n_subs<2,
0209     error('You need data from at least two observations (e.g., participants) to perform a hypothesis test.')
0210 end
0211 
0212 if n_subs<7,
0213     n_psbl_prms=2^n_subs;
0214     watchit(sprintf(['Due to the very limited number of participants,' ...
0215         ' the total number of possible permutations is small.\nThus only a limited number of p-values (at most %d) are possible and the test might be overly conservative.'], ...
0216         n_psbl_prms));
0217 end
0218 
0219 df=n_subs-1; %t-score degrees of freedom
0220 if verblevel~=0,
0221     fprintf('clust_perm1: Number of channels: %d\n',n_chan);
0222     if freq_domain,
0223         fprintf('clust_perm1: Number of frequencies: %d\n',n_pts);
0224     else
0225         fprintf('clust_perm1: Number of time points: %d\n',n_pts);
0226     end
0227     fprintf('clust_perm1: Total # of comparisons: %d\n',n_pts*n_chan);
0228     fprintf('clust_perm1: Number of participants: %d\n',n_subs);
0229     fprintf('t-score degrees of freedom: %d\n',df);
0230 end
0231 
0232 if tail
0233     %one tailed test
0234     thresh_t=tinv(thresh_p,df); %note unless thresh_p is greater than .5, thresh_t will be negative
0235 else
0236     %two tailed test
0237     thresh_t=tinv(thresh_p/2,df);
0238 end
0239 
0240 VerbReport(sprintf('Executing a cluster-based permutation test with %d permutations...',n_perm),2,verblevel);
0241 if (verblevel>=2),
0242     fprintf('Permutations completed: ');
0243 end
0244 
0245 
0246 %Constant factor for computing t, speeds up computing t to precalculate
0247 %now
0248 sqrt_nXnM1=sqrt(n_subs*(n_subs-1));
0249 
0250 mn_clust_mass=zeros(1,n_perm);
0251 sn=zeros(1,1,n_subs);
0252 for perm=1:n_perm
0253     if ~rem(perm,100)
0254         if (verblevel>=2),
0255             if ~rem(perm-100,1000)
0256                 fprintf('%d',perm);
0257             else
0258                 fprintf(', %d',perm);
0259             end
0260             if ~rem(perm,1000)
0261                 fprintf('\n');
0262             end
0263         end
0264     end
0265     %randomly set sign of each participant's data
0266     sn(1,1,1:n_subs)=(rand(1,n_subs)>.5)*2-1;
0267     sn_mtrx=repmat(sn,[n_chan n_pts 1]);
0268     
0269     d_perm=data.*sn_mtrx;
0270     
0271     %computes t-score of permuted data across all channels and time
0272     %points or frequencies
0273     sm=sum(d_perm,3);
0274     mn=sm/n_subs;
0275     sm_sqrs=sum(d_perm.^2,3)-(sm.^2)/n_subs;
0276     stder=sqrt(sm_sqrs)/sqrt_nXnM1;
0277     t=mn./stder;
0278     
0279     [clust_ids, n_clust]=find_clusters(t,thresh_t,chan_hood,-1);
0280 
0281     %get most extremely negative t-score (sign isn't important since we asumme
0282     %symmetric distribution of null hypothesis for one sample test)
0283     mn_clust_mass(perm)=find_mn_mass(clust_ids,t,n_clust);
0284 end
0285 
0286 
0287 %End permutations completed line
0288 if (verblevel>=2) && rem(perm,1000)
0289     fprintf('\n');
0290 end
0291 
0292 % Estimate true FWER of test
0293 if tail==0,
0294     %two-tailed
0295     tmx_ptile=prctile(mn_clust_mass,100*fwer/2);
0296     est_alpha=mean(mn_clust_mass<=tmx_ptile)*2;
0297 else
0298     %one tailed
0299     tmx_ptile=prctile(mn_clust_mass,100*fwer);
0300     est_alpha=mean(mn_clust_mass<=tmx_ptile);
0301 end
0302 if verblevel~=0,
0303     fprintf('Desired family-wise error rate: %f\n',fwer);
0304     fprintf('Estimated actual family-wise error rate: %f\n',est_alpha);
0305 end
0306 
0307 %computes t-scores of observations at all channels and time
0308 %points/frequencies
0309 sm=sum(data,3);
0310 mn=sm/n_subs;
0311 sm_sqrs=sum(data.^2,3)-(sm.^2)/n_subs;
0312 stder=sqrt(sm_sqrs)/sqrt_nXnM1;
0313 t_orig=mn./stder;
0314 
0315 
0316 %compute p-values
0317 pval=ones(n_chan,n_pts);
0318 if tail==0,
0319     %positive clusters
0320     [clust_ids, n_clust]=find_clusters(t_orig,-thresh_t,chan_hood,1); %note thresh_t is negative by default
0321 
0322     clust_info.pos_clust_pval=ones(1,n_clust);
0323     clust_info.pos_clust_mass=zeros(1,n_clust);
0324     clust_info.pos_clust_ids=clust_ids;
0325     for a=1:n_clust,
0326         use_ids=find(clust_ids==a);
0327         clust_mass=sum(t_orig(use_ids)); 
0328         clust_p=mean(mn_clust_mass<=(-clust_mass))*2; %multiply by 2 since we're effectively doing Bonferroni correcting for doing two tests (an upper tail and lower tail)
0329         pval(use_ids)=clust_p;
0330         clust_info.pos_clust_pval(a)=clust_p;
0331         clust_info.pos_clust_mass(a)=clust_mass;
0332     end
0333     
0334     %negative clusters
0335     [clust_ids, n_clust]=find_clusters(t_orig,thresh_t,chan_hood,-1); %note thresh_t is negative by default
0336     clust_info.neg_clust_pval=ones(1,n_clust);
0337     clust_info.neg_clust_mass=zeros(1,n_clust);
0338     clust_info.neg_clust_ids=clust_ids;
0339     for a=1:n_clust,
0340         use_ids=find(clust_ids==a);
0341         clust_mass=sum(t_orig(use_ids));
0342         clust_p=mean(mn_clust_mass<=clust_mass)*2; %multiply by 2 since we're effectively doing Bonferroni correcting for doing two tests (an upper tail and lower tail)
0343         pval(use_ids)=clust_p;
0344         clust_info.neg_clust_pval(a)=clust_p;
0345         clust_info.neg_clust_mass(a)=clust_mass;
0346     end
0347 elseif tail==1,
0348     %upper tailed
0349     [clust_ids, n_clust]=find_clusters(t_orig,-thresh_t,chan_hood,1); %note thresh_t is negative by default
0350     clust_info.pos_clust_pval=ones(1,n_clust);
0351     clust_info.pos_clust_mass=zeros(1,n_clust);
0352     clust_info.pos_clust_ids=clust_ids;
0353     for a=1:n_clust,
0354         use_ids=find(clust_ids==a);
0355         clust_mass=sum(t_orig(use_ids));
0356         clust_p=mean(mn_clust_mass<=(-clust_mass));
0357         pval(use_ids)=clust_p;
0358         clust_info.pos_clust_pval(a)=clust_p;
0359         clust_info.pos_clust_mass(a)=clust_mass;
0360     end
0361 else
0362     %lower tailed
0363     [clust_ids, n_clust]=find_clusters(t_orig,thresh_t,chan_hood,-1); %note thresh_t is negative by default
0364     clust_info.neg_clust_pval=ones(1,n_clust);
0365     clust_info.neg_clust_mass=zeros(1,n_clust);
0366     clust_info.neg_clust_ids=clust_ids;
0367     for a=1:n_clust,
0368         use_ids=find(clust_ids==a);
0369         clust_mass=sum(t_orig(use_ids));
0370         clust_p=mean(mn_clust_mass<=clust_mass);
0371         pval(use_ids)=clust_p;
0372         clust_info.neg_clust_pval(a)=clust_p;
0373         clust_info.neg_clust_mass(a)=clust_mass;
0374     end
0375 end
0376 
0377 %%% End of Main Function %%%
0378 
0379 
0380 function mn_clust_mass=find_mn_mass(clust_ids,data_t,n_clust)
0381 
0382 mn_clust_mass=0;
0383 
0384 %looking for most negative cluster mass
0385 for z=1:n_clust,
0386     use_ids=(clust_ids==z);
0387     use_mass=sum(data_t(use_ids));
0388     if use_mass<mn_clust_mass,
0389         mn_clust_mass=use_mass;
0390     end
0391 end

clust_perm1

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE