Home > matlabmk > mregress.m

# mregress

## PURPOSE MREGRESS Performs multiple linear regression analysis of X (independent) on Y (dependent).

## SYNOPSIS function [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance]= mregress(Y, X, INTCPT)

## DESCRIPTION ``` MREGRESS  Performs multiple linear regression analysis of X (independent) on Y (dependent).

Usage:

[Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ...
= mregress(Y, X, INTCPT)

INTCPT = 1; include a y-intercept in the model
INTCPT = 0; DO NOT include a y-intercept in the model

Returns:

Coefficients         - Regression coefficients
S_err                - Standard error of estimate of Y(derived from
the residuals)
XTXI                 - inverse of X' * X
R_sq                 - R-squared
coefficient of determination--Zarr,
1999, pg 423).  Adjusts R-squared based
on the number of predictors (i.e., it
shrinks R_sq as the number of predictors
increases)
F_val                - F-value for the regression and siginificance level (p-value for F)
Coef_stats           - Coefficients with their standard deviations, T-values, and p-values
Y_hat                - Fitted values
residuals            - Residuals
covariance           - Covariance matrix ( XTXI * S_err^2 )

G. Anthony Reina
Motor Control Lab
The Neurosciences Institute
Created: 4 Aug 1998
Last Update: 10/8/1998 by GAR

Please note that for the case when the intercept of the model equals zero, the
definition of R-squared and the F-statistic change mathematically. For a linear
model containing the y-intercept, R-squared refers to the amount of variance around
the mean of the dependent variable (y) which is explained by the variance around the
mean of the independent variables (x). For a linear model NOT containing the
y-intercept, R-squared measures the amount of variance around ZERO of the dependent
variable which is explained by the variance around ZERO of the independent variable.
If the same equation for R-squared is used for both with and without a y-intercept
(namely R-squared = [Sum of Squares of the Regression] / [ Total sum of the squares]),
then R-squared may be a NEGATIVE value for some data. For this reason,
this subroutine will calculate R-squared using the total un-corrected sum of the
squares. In effect, this approach avoids negative R-squares but may lack any
meaningful interpretation for the "goodness-of-fit" in the model. It has been
suggested by some texts that a more useful approach is to always use the case
where y-intercept is included in the model. However, as with all statistical
analyses, it is prudent to simply be aware of what your data "looks" like and
what your statistical tools are actually measuring in order to generate a useful
analysis.

For further reading on regression through the origin (i.e. without a y-intercept),

Neter J, Kutner MH, Nachtsheim CJ, and Wasserman W. "Applied Linear
Statistical Models" 4th ed. Irwin publishing (Boston, 1996), pp 159-163.

Myers R, "Classical and Modern Regression with Applications" Duxbury Press
(Boston, 1986), p. 30.

Zar, J. H. (1999). Biostatistical Analysis (Fourth Ed.). Upper Saddle River,
New Jersey: Prentice Hall.```

## CROSS-REFERENCE INFORMATION This function calls:
This function is called by:
• items_regress items_regress() - Performs multiple regression analysis on ERPs to each
• rm_regress rm_regress() - Performs repeated measures multiple regression analysis
• rm_regress_mass rm_regress() - Performs repeated measures multiple regression analysis

## SOURCE CODE ```0001 function [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ...
0002           = mregress(Y, X, INTCPT)
0003
0004 % MREGRESS  Performs multiple linear regression analysis of X (independent) on Y (dependent).
0005 %
0006 %           Usage:
0007 %
0008 %           [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ...
0009 %                                                     = mregress(Y, X, INTCPT)
0010 %
0011 %        INTCPT = 1; include a y-intercept in the model
0012 %           INTCPT = 0; DO NOT include a y-intercept in the model
0013 %
0014 %           Returns:
0015 %
0016 %           Coefficients         - Regression coefficients
0017 %           S_err                - Standard error of estimate of Y(derived from
0018 %                                  the residuals)
0019 %           XTXI                 - inverse of X' * X
0020 %           R_sq                 - R-squared
0022 %                                  coefficient of determination--Zarr,
0023 %                                  1999, pg 423).  Adjusts R-squared based
0024 %                                  on the number of predictors (i.e., it
0025 %                                  shrinks R_sq as the number of predictors
0026 %                                  increases)
0027 %           F_val                - F-value for the regression and siginificance level (p-value for F)
0028 %           Coef_stats           - Coefficients with their standard deviations, T-values, and p-values
0029 %           Y_hat                - Fitted values
0030 %           residuals            - Residuals
0031 %        covariance           - Covariance matrix ( XTXI * S_err^2 )
0032 %
0033 %
0034 % G. Anthony Reina
0035 % Motor Control Lab
0036 % The Neurosciences Institute
0037 % Created: 4 Aug 1998
0038 % Last Update: 10/8/1998 by GAR
0039 %
0040 % Please note that for the case when the intercept of the model equals zero, the
0041 % definition of R-squared and the F-statistic change mathematically. For a linear
0042 % model containing the y-intercept, R-squared refers to the amount of variance around
0043 % the mean of the dependent variable (y) which is explained by the variance around the
0044 % mean of the independent variables (x). For a linear model NOT containing the
0045 % y-intercept, R-squared measures the amount of variance around ZERO of the dependent
0046 % variable which is explained by the variance around ZERO of the independent variable.
0047 % If the same equation for R-squared is used for both with and without a y-intercept
0048 % (namely R-squared = [Sum of Squares of the Regression] / [ Total sum of the squares]),
0049 % then R-squared may be a NEGATIVE value for some data. For this reason,
0050 % this subroutine will calculate R-squared using the total un-corrected sum of the
0051 % squares. In effect, this approach avoids negative R-squares but may lack any
0052 % meaningful interpretation for the "goodness-of-fit" in the model. It has been
0053 % suggested by some texts that a more useful approach is to always use the case
0054 % where y-intercept is included in the model. However, as with all statistical
0055 % analyses, it is prudent to simply be aware of what your data "looks" like and
0056 % what your statistical tools are actually measuring in order to generate a useful
0057 % analysis.
0058 %
0059 % For further reading on regression through the origin (i.e. without a y-intercept),
0060 % please refer to:
0061 %
0062 %    Neter J, Kutner MH, Nachtsheim CJ, and Wasserman W. "Applied Linear
0063 %             Statistical Models" 4th ed. Irwin publishing (Boston, 1996), pp 159-163.
0064 %
0065 %    Myers R, "Classical and Modern Regression with Applications" Duxbury Press
0066 %             (Boston, 1986), p. 30.
0067 %
0068 %    Zar, J. H. (1999). Biostatistical Analysis (Fourth Ed.). Upper Saddle River,
0069 %             New Jersey: Prentice Hall.
0070
0071 % Modifications:
0072 % - 5/20/2010, Added adjusted R squared (line of code cut and paste from
0073 % MATHWORKS file exchange) and comments.
0074
0075 if (nargin < 2)
0076    error('mregress requires at least 2 input variables. Type ''help mregress''.');
0077 end
0078
0079 if (nargin == 2)
0080    INTCPT = 0;
0081 end
0082
0083 % Check that independent (X) and dependent (Y) data have compatible dimensions
0084 % ----------------------------------------------------------------------------
0085 [n_x, k] = size(X);
0086 [n_y,columns] = size(Y);
0087
0088 if n_x ~= n_y,
0089     error('The number of rows in Y must equal the number of rows in X.');
0090 end
0091
0092 if columns ~= 1,
0093     error('Y must be a vector, not a matrix');
0094 end
0095
0096 n = n_x;
0097
0098 %  Solve for the regression coefficients using ordinary least-squares
0099 %  ------------------------------------------------------------------
0100
0101    if (INTCPT == 1)
0102      X = [ ones(n,1) X ]   ;
0103    end
0104
0105     XTXI = inv(X' * X);
0106     Coefficients = XTXI * X' * Y ;
0107
0108 %  Calculate the fitted regression values
0109 %  --------------------------------------
0110
0111     Y_hat = X * Coefficients;
0112
0113 %  Calculate R-squared
0114 %  -------------------
0115 % The calculation used for R-squared and the F-statistic here are based
0116 % on the total, un-corrected sum of the squares as describe by Neter and
0117 % Myers. Note that the meaning of R-squared changes for the case of
0118 % regression without a y-intercept. This approach will yield the same
0119 % results as SysStat, SAS, SPSS and BMDP but will differ from that of
0120 % Excel, Quattro Pro, and the MATLAB regress.m function (for the case of
0121 % no y-intercept in the model -- all packages including this one will
0122 % agree for the case of linear regression with a
0123 % y-intercept). Essentially, it is wise to find a way to
0124 % keep the y-intercept (even if it is near zero) in the model to analyze
0125 % it in a meaningful way that everyone can understand.
0126
0127 if (INTCPT == 1)
0128
0129    RSS = norm(Y_hat - mean(Y))^2;   % Regression sum of squares.
0130    TSS = norm(Y - mean(Y))^2;       % Total sum of squares (regression plus residual).
0131    R_sq = RSS / TSS;                % R-squared statistic.
0132
0133 else
0134
0135    RSS = norm(Y_hat)^2;             % Regression sum of squares.
0136    TSS = norm(Y)^2;                 % Total, un-corrected sum of squares.
0137    R_sq = RSS / TSS;                % R-squared statistic.
0138
0139 end
0140 R_sq_ad = 1 - ((n-1)/(n-(k+1))) * (1 - R_sq); %adjusted R-sqrd value; it shrinks R-sqrd as the number of predictors increases
0141
0142 % \$\$\$ % Alternative calculation of R-squared
0143 % \$\$\$ % ====================================
0144 % \$\$\$ % The follwing equation is from Judge G, et al. "An Introduction to the theory
0145 % \$\$\$ % and practice of econometrics", New York : Wiley, 1982. It is the
0146 % \$\$\$ % squared (Pearson) correlation coefficient between the predicted and
0147 % \$\$\$ % dependent variables. It is the same equation regardless of whether an
0148 % \$\$\$ % intercept is included in the model; however, it may yield a negative
0149 % \$\$\$ % R-squared for a particularily bad fit.
0150 % \$\$\$ covariance_Y_hat_and_Y = (Y_hat - mean(Y_hat))' * (Y - mean(Y));
0151 % \$\$\$ covariance_Y_hat_and_Y_hat = (Y_hat - mean(Y_hat))' * (Y_hat - mean(Y_hat));
0152 % \$\$\$ covariance_Y_and_Y = (Y - mean(Y))' * (Y - mean(Y));
0153 % \$\$\$ R_sq = (covariance_Y_hat_and_Y / covariance_Y_hat_and_Y_hat) * ...
0154 % \$\$\$        (covariance_Y_hat_and_Y / covariance_Y_and_Y);
0155
0156
0157 %  Calculate residuals and standard error
0158 %  --------------------------------------
0159
0160     residuals = Y - Y_hat;
0161
0162     if (INTCPT == 1)
0163        S_err = sqrt(residuals' * residuals / (n - k - 1) );
0164     else
0165        S_err = sqrt(residuals' * residuals / (n - k) );
0166     end
0167
0168 %  Calculate the standard deviation and t-values for the regression coefficients
0169 %  -----------------------------------------------------------------------------
0170
0171     covariance = XTXI .* S_err^2;
0172
0173     C = sqrt(diag(covariance, 0));
0174
0175     % (n.b. Need to perform a 2-tailed t-test)
0176     % ****************************************
0177     if (INTCPT == 1)
0178       p_value = 2 * (1 - tcdf(abs(Coefficients./C), (n - (k + 1))));
0179     else
0180       p_value = 2 * (1 - tcdf(abs(Coefficients./C), (n - k)));
0181     end
0182
0183     Coef_stats = [ Coefficients, C, (Coefficients./C), p_value];
0184
0185
0186 % Estimator of error variance.
0187 % ----------------------------
0188
0189 if (INTCPT == 1)
0190
0191      SSR_residuals = norm(Y - Y_hat)^2;
0192      TSS = norm(Y - mean(Y))^2;     % Total sum of squares (regression plus residual).
0193
0194      F_val = (TSS - SSR_residuals) / k / ( SSR_residuals / (n - (k + 1)));
0195
0196      F_val = [F_val (1 - fcdf(F_val, k, (n - (k + 1)))) ];
0197
0198 else
0199
0200      SSR_residuals = norm(Y - Y_hat)^2;
0201      TSS = norm(Y)^2;                % Total sum of squares (regression plus residual).
0202
0203      F_val = (TSS - SSR_residuals) / k / ( SSR_residuals / (n-k));
0204
0205      F_val = [F_val (1 - fcdf(F_val, k, (n - k))) ];
0206
0207 end
0208
0209```

Generated on Tue 10-May-2016 16:37:45 by m2html © 2005