MREGRESS Performs multiple linear regression analysis of X (independent) on Y (dependent). Usage: [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ... = mregress(Y, X, INTCPT) INTCPT = 1; include a y-intercept in the model INTCPT = 0; DO NOT include a y-intercept in the model Returns: Coefficients - Regression coefficients S_err - Standard error of estimate of Y(derived from the residuals) XTXI - inverse of X' * X R_sq - R-squared R_sq_ad - Adjusted R-squared (a.k.a, the adjusted coefficient of determination--Zarr, 1999, pg 423). Adjusts R-squared based on the number of predictors (i.e., it shrinks R_sq as the number of predictors increases) F_val - F-value for the regression and siginificance level (p-value for F) Coef_stats - Coefficients with their standard deviations, T-values, and p-values Y_hat - Fitted values residuals - Residuals covariance - Covariance matrix ( XTXI * S_err^2 ) G. Anthony Reina Motor Control Lab The Neurosciences Institute Created: 4 Aug 1998 Last Update: 10/8/1998 by GAR Please note that for the case when the intercept of the model equals zero, the definition of R-squared and the F-statistic change mathematically. For a linear model containing the y-intercept, R-squared refers to the amount of variance around the mean of the dependent variable (y) which is explained by the variance around the mean of the independent variables (x). For a linear model NOT containing the y-intercept, R-squared measures the amount of variance around ZERO of the dependent variable which is explained by the variance around ZERO of the independent variable. If the same equation for R-squared is used for both with and without a y-intercept (namely R-squared = [Sum of Squares of the Regression] / [ Total sum of the squares]), then R-squared may be a NEGATIVE value for some data. For this reason, this subroutine will calculate R-squared using the total un-corrected sum of the squares. In effect, this approach avoids negative R-squares but may lack any meaningful interpretation for the "goodness-of-fit" in the model. It has been suggested by some texts that a more useful approach is to always use the case where y-intercept is included in the model. However, as with all statistical analyses, it is prudent to simply be aware of what your data "looks" like and what your statistical tools are actually measuring in order to generate a useful analysis. For further reading on regression through the origin (i.e. without a y-intercept), please refer to: Neter J, Kutner MH, Nachtsheim CJ, and Wasserman W. "Applied Linear Statistical Models" 4th ed. Irwin publishing (Boston, 1996), pp 159-163. Myers R, "Classical and Modern Regression with Applications" Duxbury Press (Boston, 1986), p. 30. Zar, J. H. (1999). Biostatistical Analysis (Fourth Ed.). Upper Saddle River, New Jersey: Prentice Hall.
0001 function [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ... 0002 = mregress(Y, X, INTCPT) 0003 0004 % MREGRESS Performs multiple linear regression analysis of X (independent) on Y (dependent). 0005 % 0006 % Usage: 0007 % 0008 % [Coefficients, S_err, XTXI, R_sq, R_sq_ad, F_val, Coef_stats, Y_hat, residuals, covariance] ... 0009 % = mregress(Y, X, INTCPT) 0010 % 0011 % INTCPT = 1; include a y-intercept in the model 0012 % INTCPT = 0; DO NOT include a y-intercept in the model 0013 % 0014 % Returns: 0015 % 0016 % Coefficients - Regression coefficients 0017 % S_err - Standard error of estimate of Y(derived from 0018 % the residuals) 0019 % XTXI - inverse of X' * X 0020 % R_sq - R-squared 0021 % R_sq_ad - Adjusted R-squared (a.k.a, the adjusted 0022 % coefficient of determination--Zarr, 0023 % 1999, pg 423). Adjusts R-squared based 0024 % on the number of predictors (i.e., it 0025 % shrinks R_sq as the number of predictors 0026 % increases) 0027 % F_val - F-value for the regression and siginificance level (p-value for F) 0028 % Coef_stats - Coefficients with their standard deviations, T-values, and p-values 0029 % Y_hat - Fitted values 0030 % residuals - Residuals 0031 % covariance - Covariance matrix ( XTXI * S_err^2 ) 0032 % 0033 % 0034 % G. Anthony Reina 0035 % Motor Control Lab 0036 % The Neurosciences Institute 0037 % Created: 4 Aug 1998 0038 % Last Update: 10/8/1998 by GAR 0039 % 0040 % Please note that for the case when the intercept of the model equals zero, the 0041 % definition of R-squared and the F-statistic change mathematically. For a linear 0042 % model containing the y-intercept, R-squared refers to the amount of variance around 0043 % the mean of the dependent variable (y) which is explained by the variance around the 0044 % mean of the independent variables (x). For a linear model NOT containing the 0045 % y-intercept, R-squared measures the amount of variance around ZERO of the dependent 0046 % variable which is explained by the variance around ZERO of the independent variable. 0047 % If the same equation for R-squared is used for both with and without a y-intercept 0048 % (namely R-squared = [Sum of Squares of the Regression] / [ Total sum of the squares]), 0049 % then R-squared may be a NEGATIVE value for some data. For this reason, 0050 % this subroutine will calculate R-squared using the total un-corrected sum of the 0051 % squares. In effect, this approach avoids negative R-squares but may lack any 0052 % meaningful interpretation for the "goodness-of-fit" in the model. It has been 0053 % suggested by some texts that a more useful approach is to always use the case 0054 % where y-intercept is included in the model. However, as with all statistical 0055 % analyses, it is prudent to simply be aware of what your data "looks" like and 0056 % what your statistical tools are actually measuring in order to generate a useful 0057 % analysis. 0058 % 0059 % For further reading on regression through the origin (i.e. without a y-intercept), 0060 % please refer to: 0061 % 0062 % Neter J, Kutner MH, Nachtsheim CJ, and Wasserman W. "Applied Linear 0063 % Statistical Models" 4th ed. Irwin publishing (Boston, 1996), pp 159-163. 0064 % 0065 % Myers R, "Classical and Modern Regression with Applications" Duxbury Press 0066 % (Boston, 1986), p. 30. 0067 % 0068 % Zar, J. H. (1999). Biostatistical Analysis (Fourth Ed.). Upper Saddle River, 0069 % New Jersey: Prentice Hall. 0070 0071 % Modifications: 0072 % - 5/20/2010, Added adjusted R squared (line of code cut and paste from 0073 % MATHWORKS file exchange) and comments. 0074 0075 if (nargin < 2) 0076 error('mregress requires at least 2 input variables. Type ''help mregress''.'); 0077 end 0078 0079 if (nargin == 2) 0080 INTCPT = 0; 0081 end 0082 0083 % Check that independent (X) and dependent (Y) data have compatible dimensions 0084 % ---------------------------------------------------------------------------- 0085 [n_x, k] = size(X); 0086 [n_y,columns] = size(Y); 0087 0088 if n_x ~= n_y, 0089 error('The number of rows in Y must equal the number of rows in X.'); 0090 end 0091 0092 if columns ~= 1, 0093 error('Y must be a vector, not a matrix'); 0094 end 0095 0096 n = n_x; 0097 0098 % Solve for the regression coefficients using ordinary least-squares 0099 % ------------------------------------------------------------------ 0100 0101 if (INTCPT == 1) 0102 X = [ ones(n,1) X ] ; 0103 end 0104 0105 XTXI = inv(X' * X); 0106 Coefficients = XTXI * X' * Y ; 0107 0108 % Calculate the fitted regression values 0109 % -------------------------------------- 0110 0111 Y_hat = X * Coefficients; 0112 0113 % Calculate R-squared 0114 % ------------------- 0115 % The calculation used for R-squared and the F-statistic here are based 0116 % on the total, un-corrected sum of the squares as describe by Neter and 0117 % Myers. Note that the meaning of R-squared changes for the case of 0118 % regression without a y-intercept. This approach will yield the same 0119 % results as SysStat, SAS, SPSS and BMDP but will differ from that of 0120 % Excel, Quattro Pro, and the MATLAB regress.m function (for the case of 0121 % no y-intercept in the model -- all packages including this one will 0122 % agree for the case of linear regression with a 0123 % y-intercept). Essentially, it is wise to find a way to 0124 % keep the y-intercept (even if it is near zero) in the model to analyze 0125 % it in a meaningful way that everyone can understand. 0126 0127 if (INTCPT == 1) 0128 0129 RSS = norm(Y_hat - mean(Y))^2; % Regression sum of squares. 0130 TSS = norm(Y - mean(Y))^2; % Total sum of squares (regression plus residual). 0131 R_sq = RSS / TSS; % R-squared statistic. 0132 0133 else 0134 0135 RSS = norm(Y_hat)^2; % Regression sum of squares. 0136 TSS = norm(Y)^2; % Total, un-corrected sum of squares. 0137 R_sq = RSS / TSS; % R-squared statistic. 0138 0139 end 0140 R_sq_ad = 1 - ((n-1)/(n-(k+1))) * (1 - R_sq); %adjusted R-sqrd value; it shrinks R-sqrd as the number of predictors increases 0141 0142 % $$$ % Alternative calculation of R-squared 0143 % $$$ % ==================================== 0144 % $$$ % The follwing equation is from Judge G, et al. "An Introduction to the theory 0145 % $$$ % and practice of econometrics", New York : Wiley, 1982. It is the 0146 % $$$ % squared (Pearson) correlation coefficient between the predicted and 0147 % $$$ % dependent variables. It is the same equation regardless of whether an 0148 % $$$ % intercept is included in the model; however, it may yield a negative 0149 % $$$ % R-squared for a particularily bad fit. 0150 % $$$ covariance_Y_hat_and_Y = (Y_hat - mean(Y_hat))' * (Y - mean(Y)); 0151 % $$$ covariance_Y_hat_and_Y_hat = (Y_hat - mean(Y_hat))' * (Y_hat - mean(Y_hat)); 0152 % $$$ covariance_Y_and_Y = (Y - mean(Y))' * (Y - mean(Y)); 0153 % $$$ R_sq = (covariance_Y_hat_and_Y / covariance_Y_hat_and_Y_hat) * ... 0154 % $$$ (covariance_Y_hat_and_Y / covariance_Y_and_Y); 0155 0156 0157 % Calculate residuals and standard error 0158 % -------------------------------------- 0159 0160 residuals = Y - Y_hat; 0161 0162 if (INTCPT == 1) 0163 S_err = sqrt(residuals' * residuals / (n - k - 1) ); 0164 else 0165 S_err = sqrt(residuals' * residuals / (n - k) ); 0166 end 0167 0168 % Calculate the standard deviation and t-values for the regression coefficients 0169 % ----------------------------------------------------------------------------- 0170 0171 covariance = XTXI .* S_err^2; 0172 0173 C = sqrt(diag(covariance, 0)); 0174 0175 % (n.b. Need to perform a 2-tailed t-test) 0176 % **************************************** 0177 if (INTCPT == 1) 0178 p_value = 2 * (1 - tcdf(abs(Coefficients./C), (n - (k + 1)))); 0179 else 0180 p_value = 2 * (1 - tcdf(abs(Coefficients./C), (n - k))); 0181 end 0182 0183 Coef_stats = [ Coefficients, C, (Coefficients./C), p_value]; 0184 0185 0186 % Estimator of error variance. 0187 % ---------------------------- 0188 0189 if (INTCPT == 1) 0190 0191 SSR_residuals = norm(Y - Y_hat)^2; 0192 TSS = norm(Y - mean(Y))^2; % Total sum of squares (regression plus residual). 0193 0194 F_val = (TSS - SSR_residuals) / k / ( SSR_residuals / (n - (k + 1))); 0195 0196 F_val = [F_val (1 - fcdf(F_val, k, (n - (k + 1)))) ]; 0197 0198 else 0199 0200 SSR_residuals = norm(Y - Y_hat)^2; 0201 TSS = norm(Y)^2; % Total sum of squares (regression plus residual). 0202 0203 F_val = (TSS - SSR_residuals) / k / ( SSR_residuals / (n-k)); 0204 0205 F_val = [F_val (1 - fcdf(F_val, k, (n - k))) ]; 0206 0207 end 0208 0209