Home > matlabmk > mregress.m

mregress

PURPOSE ^

MREGRESS Performs multiple linear regression analysis of X (independent) on Y (dependent).

SYNOPSIS ^

function [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance]= mregress(Y, X, INTCPT)

DESCRIPTION ^

 MREGRESS  Performs multiple linear regression analysis of X (independent) on Y (dependent).

           Usage:

           [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ... 
                                                     = mregress(Y, X, INTCPT)

        INTCPT = 1; include a y-intercept in the model
           INTCPT = 0; DO NOT include a y-intercept in the model

           Returns:

           Coefficients         - Regression coefficients
           S_err                - Standard error of estimate of Y(derived from
                                  the residuals)
           XTXI                 - inverse of X' * X
           R_sq                 - R-squared
           R_sq_ad              - Adjusted R-squared (a.k.a, the adjusted
                                  coefficient of determination--Zarr,
                                  1999, pg 423).  Adjusts R-squared based
                                  on the number of predictors (i.e., it 
                                  shrinks R_sq as the number of predictors 
                                  increases)
           F_val                - F-value for the regression and siginificance level (p-value for F)
           Coef_stats           - Coefficients with their standard deviations, T-values, and p-values
           Y_hat                - Fitted values
           residuals            - Residuals
        covariance           - Covariance matrix ( XTXI * S_err^2 )


 G. Anthony Reina
 Motor Control Lab
 The Neurosciences Institute
 Created: 4 Aug 1998
 Last Update: 10/8/1998 by GAR

 Please note that for the case when the intercept of the model equals zero, the
 definition of R-squared and the F-statistic change mathematically. For a linear
 model containing the y-intercept, R-squared refers to the amount of variance around
 the mean of the dependent variable (y) which is explained by the variance around the
 mean of the independent variables (x). For a linear model NOT containing the
 y-intercept, R-squared measures the amount of variance around ZERO of the dependent
 variable which is explained by the variance around ZERO of the independent variable.
 If the same equation for R-squared is used for both with and without a y-intercept
 (namely R-squared = [Sum of Squares of the Regression] / [ Total sum of the squares]),
 then R-squared may be a NEGATIVE value for some data. For this reason, 
 this subroutine will calculate R-squared using the total un-corrected sum of the
 squares. In effect, this approach avoids negative R-squares but may lack any
 meaningful interpretation for the "goodness-of-fit" in the model. It has been
 suggested by some texts that a more useful approach is to always use the case
 where y-intercept is included in the model. However, as with all statistical
 analyses, it is prudent to simply be aware of what your data "looks" like and
 what your statistical tools are actually measuring in order to generate a useful
 analysis.

 For further reading on regression through the origin (i.e. without a y-intercept),
 please refer to:

    Neter J, Kutner MH, Nachtsheim CJ, and Wasserman W. "Applied Linear 
             Statistical Models" 4th ed. Irwin publishing (Boston, 1996), pp 159-163.

    Myers R, "Classical and Modern Regression with Applications" Duxbury Press
             (Boston, 1986), p. 30.

    Zar, J. H. (1999). Biostatistical Analysis (Fourth Ed.). Upper Saddle River, 
             New Jersey: Prentice Hall.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ...
0002           = mregress(Y, X, INTCPT)
0003 
0004 % MREGRESS  Performs multiple linear regression analysis of X (independent) on Y (dependent).
0005 %
0006 %           Usage:
0007 %
0008 %           [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ...
0009 %                                                     = mregress(Y, X, INTCPT)
0010 %
0011 %        INTCPT = 1; include a y-intercept in the model
0012 %           INTCPT = 0; DO NOT include a y-intercept in the model
0013 %
0014 %           Returns:
0015 %
0016 %           Coefficients         - Regression coefficients
0017 %           S_err                - Standard error of estimate of Y(derived from
0018 %                                  the residuals)
0019 %           XTXI                 - inverse of X' * X
0020 %           R_sq                 - R-squared
0021 %           R_sq_ad              - Adjusted R-squared (a.k.a, the adjusted
0022 %                                  coefficient of determination--Zarr,
0023 %                                  1999, pg 423).  Adjusts R-squared based
0024 %                                  on the number of predictors (i.e., it
0025 %                                  shrinks R_sq as the number of predictors
0026 %                                  increases)
0027 %           F_val                - F-value for the regression and siginificance level (p-value for F)
0028 %           Coef_stats           - Coefficients with their standard deviations, T-values, and p-values
0029 %           Y_hat                - Fitted values
0030 %           residuals            - Residuals
0031 %        covariance           - Covariance matrix ( XTXI * S_err^2 )
0032 %
0033 %
0034 % G. Anthony Reina
0035 % Motor Control Lab
0036 % The Neurosciences Institute
0037 % Created: 4 Aug 1998
0038 % Last Update: 10/8/1998 by GAR
0039 %
0040 % Please note that for the case when the intercept of the model equals zero, the
0041 % definition of R-squared and the F-statistic change mathematically. For a linear
0042 % model containing the y-intercept, R-squared refers to the amount of variance around
0043 % the mean of the dependent variable (y) which is explained by the variance around the
0044 % mean of the independent variables (x). For a linear model NOT containing the
0045 % y-intercept, R-squared measures the amount of variance around ZERO of the dependent
0046 % variable which is explained by the variance around ZERO of the independent variable.
0047 % If the same equation for R-squared is used for both with and without a y-intercept
0048 % (namely R-squared = [Sum of Squares of the Regression] / [ Total sum of the squares]),
0049 % then R-squared may be a NEGATIVE value for some data. For this reason,
0050 % this subroutine will calculate R-squared using the total un-corrected sum of the
0051 % squares. In effect, this approach avoids negative R-squares but may lack any
0052 % meaningful interpretation for the "goodness-of-fit" in the model. It has been
0053 % suggested by some texts that a more useful approach is to always use the case
0054 % where y-intercept is included in the model. However, as with all statistical
0055 % analyses, it is prudent to simply be aware of what your data "looks" like and
0056 % what your statistical tools are actually measuring in order to generate a useful
0057 % analysis.
0058 %
0059 % For further reading on regression through the origin (i.e. without a y-intercept),
0060 % please refer to:
0061 %
0062 %    Neter J, Kutner MH, Nachtsheim CJ, and Wasserman W. "Applied Linear
0063 %             Statistical Models" 4th ed. Irwin publishing (Boston, 1996), pp 159-163.
0064 %
0065 %    Myers R, "Classical and Modern Regression with Applications" Duxbury Press
0066 %             (Boston, 1986), p. 30.
0067 %
0068 %    Zar, J. H. (1999). Biostatistical Analysis (Fourth Ed.). Upper Saddle River,
0069 %             New Jersey: Prentice Hall.
0070 
0071 % Modifications:
0072 % - 5/20/2010, Added adjusted R squared (line of code cut and paste from
0073 % MATHWORKS file exchange) and comments.
0074 
0075 if (nargin < 2)
0076    error('mregress requires at least 2 input variables. Type ''help mregress''.');
0077 end
0078 
0079 if (nargin == 2) 
0080    INTCPT = 0;
0081 end
0082 
0083 % Check that independent (X) and dependent (Y) data have compatible dimensions
0084 % ----------------------------------------------------------------------------
0085 [n_x, k] = size(X);
0086 [n_y,columns] = size(Y);
0087 
0088 if n_x ~= n_y, 
0089     error('The number of rows in Y must equal the number of rows in X.'); 
0090 end 
0091 
0092 if columns ~= 1, 
0093     error('Y must be a vector, not a matrix'); 
0094 end
0095 
0096 n = n_x;
0097 
0098 %  Solve for the regression coefficients using ordinary least-squares
0099 %  ------------------------------------------------------------------
0100 
0101    if (INTCPT == 1)
0102      X = [ ones(n,1) X ]   ; 
0103    end
0104  
0105     XTXI = inv(X' * X);
0106     Coefficients = XTXI * X' * Y ;
0107 
0108 %  Calculate the fitted regression values
0109 %  --------------------------------------
0110 
0111     Y_hat = X * Coefficients;
0112 
0113 %  Calculate R-squared
0114 %  -------------------
0115 % The calculation used for R-squared and the F-statistic here are based
0116 % on the total, un-corrected sum of the squares as describe by Neter and
0117 % Myers. Note that the meaning of R-squared changes for the case of
0118 % regression without a y-intercept. This approach will yield the same
0119 % results as SysStat, SAS, SPSS and BMDP but will differ from that of
0120 % Excel, Quattro Pro, and the MATLAB regress.m function (for the case of
0121 % no y-intercept in the model -- all packages including this one will
0122 % agree for the case of linear regression with a
0123 % y-intercept). Essentially, it is wise to find a way to
0124 % keep the y-intercept (even if it is near zero) in the model to analyze
0125 % it in a meaningful way that everyone can understand.
0126 
0127 if (INTCPT == 1)
0128 
0129    RSS = norm(Y_hat - mean(Y))^2;   % Regression sum of squares.
0130    TSS = norm(Y - mean(Y))^2;       % Total sum of squares (regression plus residual).
0131    R_sq = RSS / TSS;                % R-squared statistic.
0132 
0133 else
0134 
0135    RSS = norm(Y_hat)^2;             % Regression sum of squares.
0136    TSS = norm(Y)^2;                 % Total, un-corrected sum of squares.
0137    R_sq = RSS / TSS;                % R-squared statistic.
0138 
0139 end
0140 R_sq_ad = 1 - ((n-1)/(n-(k+1))) * (1 - R_sq); %adjusted R-sqrd value; it shrinks R-sqrd as the number of predictors increases
0141 
0142 % $$$ % Alternative calculation of R-squared
0143 % $$$ % ====================================
0144 % $$$ % The follwing equation is from Judge G, et al. "An Introduction to the theory
0145 % $$$ % and practice of econometrics", New York : Wiley, 1982. It is the
0146 % $$$ % squared (Pearson) correlation coefficient between the predicted and
0147 % $$$ % dependent variables. It is the same equation regardless of whether an
0148 % $$$ % intercept is included in the model; however, it may yield a negative
0149 % $$$ % R-squared for a particularily bad fit.
0150 % $$$ covariance_Y_hat_and_Y = (Y_hat - mean(Y_hat))' * (Y - mean(Y));
0151 % $$$ covariance_Y_hat_and_Y_hat = (Y_hat - mean(Y_hat))' * (Y_hat - mean(Y_hat));
0152 % $$$ covariance_Y_and_Y = (Y - mean(Y))' * (Y - mean(Y));
0153 % $$$ R_sq = (covariance_Y_hat_and_Y / covariance_Y_hat_and_Y_hat) * ...
0154 % $$$        (covariance_Y_hat_and_Y / covariance_Y_and_Y);
0155 
0156 
0157 %  Calculate residuals and standard error
0158 %  --------------------------------------
0159 
0160     residuals = Y - Y_hat;
0161 
0162     if (INTCPT == 1)
0163        S_err = sqrt(residuals' * residuals / (n - k - 1) );
0164     else
0165        S_err = sqrt(residuals' * residuals / (n - k) );
0166     end
0167 
0168 %  Calculate the standard deviation and t-values for the regression coefficients
0169 %  -----------------------------------------------------------------------------
0170 
0171     covariance = XTXI .* S_err^2;
0172     
0173     C = sqrt(diag(covariance, 0));
0174 
0175     % (n.b. Need to perform a 2-tailed t-test)
0176     % ****************************************
0177     if (INTCPT == 1)
0178       p_value = 2 * (1 - tcdf(abs(Coefficients./C), (n - (k + 1))));
0179     else
0180       p_value = 2 * (1 - tcdf(abs(Coefficients./C), (n - k)));
0181     end
0182 
0183     Coef_stats = [ Coefficients, C, (Coefficients./C), p_value];
0184 
0185 
0186 % Estimator of error variance.
0187 % ----------------------------
0188 
0189 if (INTCPT == 1) 
0190 
0191      SSR_residuals = norm(Y - Y_hat)^2;
0192      TSS = norm(Y - mean(Y))^2;     % Total sum of squares (regression plus residual).
0193 
0194      F_val = (TSS - SSR_residuals) / k / ( SSR_residuals / (n - (k + 1)));
0195 
0196      F_val = [F_val (1 - fcdf(F_val, k, (n - (k + 1)))) ];
0197 
0198 else
0199 
0200      SSR_residuals = norm(Y - Y_hat)^2;
0201      TSS = norm(Y)^2;                % Total sum of squares (regression plus residual).
0202 
0203      F_val = (TSS - SSR_residuals) / k / ( SSR_residuals / (n-k));
0204 
0205      F_val = [F_val (1 - fcdf(F_val, k, (n - k))) ];
0206 
0207 end
0208 
0209

Generated on Tue 10-May-2016 16:37:45 by m2html © 2005