viernes, 23 de enero de 2009

trad. statistics [b-]

[- forecasting methods]
b-3) PARTIAL LEAST SQUARES REGRESSION
wikipedia.com - "in statistics, the method of partial least squares regression (PLS-regression) bears some relation to principal component analysis; instead of finding the hyperplanes of minimum variance, it finds a linear model describing some predicted variables in terms of other observable variables"
this method is recommended for cases where standard regressions show certain instability [i.e., more predictors than observations or multicollinearity among predictors]

drawbacks
* partial solutions - mathematical-expressions-based [latent variables] outputs, rather than directly applicable ones [equations]
* no predictive capabilities - qualitative results [most influential predictor over measurements, interdependence among predictors, etc.], instead of quantitative ones [equations]


trendingBot point of view

as soon as the number of independent variables goes beyond some level [dimensional restrictions for regression methods will be matter of future posts - 3D (2 independent vs. 1 dependent) can be taken as a good estimation], standard regression methods are not reliable enough and traditional statistics preferred to consider roughly estimating methods, rather than trying a different approach to the problem

lame example
40 values for 5 independent variables [X_a, X_b, X_c, X_d, X_e], affecting a dependent one [Y_1] (and, eventually, two further dependent variables)

1. PLS regression [PLS path modelling]
- X_c is the most influential variable over Y_1
- all the variables, except X_a & X_e, are positively correlated with Y_1
- from Y_1, Y_2 and Y_3, it can be stated that every fluctuation in X_b is compensated by the addition of X_a & X_c (evolution (among different phenomena (Y)) of any interest?)

2. trendingBot
NOTE: best trends = showing the lowest mean error after being applied to the original data
Y_1 = X_a^0.42+5.21*X_c-X_e - exp. error = 5%
Y_2 = X_c^-1.3*X_c-X_a - exp. error = 3.6%
Y_3 = X_c-X_e/2 - exp. error = 8.1%