martes, 21 de abril de 2009

pause in updates

visit our webpage to know more about
* the whole existing methods review
* trendingBot theoretical base

miércoles, 28 de enero de 2009

trad. statistics [b-]

[- forecasting methods]
b-5) PROBABILITY-RELATED METHODS (randomness)

relevant theories to be included in this group are
- (generalised) method of moments
- bayesian method
- predictive modelling
- method of instrumental variables (IV)
- 2SLS/3SLS
- seemingly unrelated regression

wikipedia.com - "probability is the likelihood or chance that something is the case or that an event will occur"
thus, these methods do not predict future behaviours on the basis of past ones [effects on the dependent variable from variations in the independent one(s)], but the probability of an event [= invariant phenomenom = not describable as a result of the interaction between independent/dependent variables] to occur


trendingBot point of view

trend finding [trendingBot]
- predict definable & more-or-less-certain behaviours
- indicated every time some kind of repetitive character is present

probability
- gives some certainty to randomness
- indicated for undefinable and/or random behaviours


both theories are applicable to different fields - trendinBot can not deal with random behaviours - probability can not understand/describe phenomena, just estimate how likely they would happen under specific conditions

thus and from the point of view of the current clasiffication, neither probability should be included within the forecasting methods or any method defined as such should deal with randomness

sábado, 24 de enero de 2009

trad. statistics [b-]

[- forecasting methods]
b-4) PRINCIPAL COMPONENT ANALYSIS
wikipedia.com - "principal component analysis (PCA) involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components"

its basic structure, equivalently to the one from PLS regressions, consists in two matrices, X [independent variables] & Y [dependent variable(s)] - the differences between both methods are a consequence from the mathematical models used to relate these matrices
linear model -> PLS regression
hyperplanes of minimum variance -> principal component analysis

the aforementioned distinction is not relevant for the present study and, thus the PLS regression post describable enough

viernes, 23 de enero de 2009

trad. statistics [b-]

[- forecasting methods]
b-3) PARTIAL LEAST SQUARES REGRESSION
wikipedia.com - "in statistics, the method of partial least squares regression (PLS-regression) bears some relation to principal component analysis; instead of finding the hyperplanes of minimum variance, it finds a linear model describing some predicted variables in terms of other observable variables"
this method is recommended for cases where standard regressions show certain instability [i.e., more predictors than observations or multicollinearity among predictors]

drawbacks
* partial solutions - mathematical-expressions-based [latent variables] outputs, rather than directly applicable ones [equations]
* no predictive capabilities - qualitative results [most influential predictor over measurements, interdependence among predictors, etc.], instead of quantitative ones [equations]


trendingBot point of view

as soon as the number of independent variables goes beyond some level [dimensional restrictions for regression methods will be matter of future posts - 3D (2 independent vs. 1 dependent) can be taken as a good estimation], standard regression methods are not reliable enough and traditional statistics preferred to consider roughly estimating methods, rather than trying a different approach to the problem

lame example
40 values for 5 independent variables [X_a, X_b, X_c, X_d, X_e], affecting a dependent one [Y_1] (and, eventually, two further dependent variables)

1. PLS regression [PLS path modelling]
- X_c is the most influential variable over Y_1
- all the variables, except X_a & X_e, are positively correlated with Y_1
- from Y_1, Y_2 and Y_3, it can be stated that every fluctuation in X_b is compensated by the addition of X_a & X_c (evolution (among different phenomena (Y)) of any interest?)

2. trendingBot
NOTE: best trends = showing the lowest mean error after being applied to the original data
Y_1 = X_a^0.42+5.21*X_c-X_e - exp. error = 5%
Y_2 = X_c^-1.3*X_c-X_a - exp. error = 3.6%
Y_3 = X_c-X_e/2 - exp. error = 8.1%

sábado, 10 de enero de 2009

trad. statistics [b-]

[- forecasting methods]
b-2) EXTRAPOLATION METHODS
wikipedia.com - "in mathematics, extrapolation is the process of constructing new data points outside a discrete set of known data points"
although there are no essential differences between extrapolation & interpolation methods, the expected accuracy from their results do differ quite appreciably; this fact and the main intention underlying the present clasiffication [highlighting the opposition probable/predictable vs. random/unpredictable] are the only reasons explaining this specific subtype (outside the regression methods)

nobody doubts that an increase in the uncertainty is the immediate consequence of any extrapolating process, however the logical attitude resulting from this idea [no extrapolating] seems to be not so clear; or, at least, this is what anyone could understand after noticing the wide variety of existing extrapolation methods
- linear extrapolation
- polynomial extrapolation
- conic extrapolation
- french curve extrapolation
and, even, methods specifically developed for computer coding
- Richardson extrapolation
- Aitken extrapolation


trendingBot point of view

extrapolating has to be considered as the last resource and, in any case, to be clearly differentiated from interpolating

a lame example
raw data - X (independent) ∈ [5,10] and Y(dependent) ∈ [10,20]
* Y values, for any X within the aforementioned range, may be predicted - 7.5 -> 15
* on the other hand, Y values, for X outside it, may only be rougly estimated - 15 -> 30

thus, predicting implies uncertainty but, usually, a more or less controllable one [a sensible set of minimum conditions has to be stablished in order to guarantee the predictive character] - (roughly) estimating implies uncontrollable uncertainty, hence should be used just as a preliminary idea and its results never be called "predictions"

miércoles, 7 de enero de 2009

trad. statistics [b-]

[- forecasting methods]
b-1) TIME SERIES ANALYSIS
wikipedia.com - "in statistics, signal processing, and many other fields, a time series is a sequence of data points, measured typically at successive times, spaced at (often uniform) time intervals. Time series analysis comprises methods that attempt to understand such time series"
there are many models specifically designed to maximised time series, that is, to understand the implicit behaviour and, hence, to predict future events on the basis of this information

main classification
1. linear dependence [~ linear regressions]
three main types
- autoregressive (AR) models
- integrated (I) models
- moving average (MA) models
over these ones, there are still two combinations [autoregressive moving average (ARMA) models & autoregressive integrating moving average (ARIMA) models] & one generalisation [autoregressive fractionally integrated moving average (ARFIMA) models] of them

2. non-linear dependence or autoregressive conditional heteroskedasticity models
[~ non-linear regressions]
- generalised autoregressive conditional heteroskedacity [GARCH] models
- threshold autoregressive conditional heteroskedacity [TARCH] models
- exponential generalised autoregressive conditional heteroskedacity [EGARCH] models
...

all these models have two characteristics in common
a. account for just two variables [dependent vs. independent]
b. try to understand stochastic [= random] processes


trendingBot point of view

a. why not applying conventional regression methods?
statitstics' answer -> random essence has to be accounted for (?)

b. stochastic/random ~ impossible to be predicted - ... then?
b.2.- a weighted (based on sensible assumptions) regression method shouldn´t be defined as stochastic, if the weightages are applied on a regular and consistent basis
b.3.- probably, the randomness might be removed, in case a more adequate set of variables would be chosen

CONCLUSION 1 time series analysing methods can be defined as extensions of conventional regression methods to stochastic behaviours
CONCLUSION 2 trendingBot's result for any (stochastic) time series = "trend not found"

miércoles, 31 de diciembre de 2008

trad. statistics [a-]

[- regression methods]

PROBLEM [data analysis] - what is the best way to maximise any type of information? to understand any behaviour?
- the ideal result would be a doubtless answer, that is, a mathematical answer -

regression methods - determine the influence from the independent variables behaviour [or predictors] over the dependent one [or measurement]
advantage
- acceptable accuracy and "relatively controlable uncertainty" (no arbitrary user intervention)
disadvantage
- maximum number of independent variables = practically limited

SOLUTION?
1. Virtually, any information can be adapted to the aforementioned duality [independent/dependent variables]; however, this is a problem common to all the forecasting methods
NOTE: any data suitable to be predicted, that is, not random
2. ideas to bear in mind
* NEVER extrapolate
* predictive character only under certain conditions [i.e., minimum repetitions & goodness of the fit]
* weightages, user defined parameters, etc. only under extreme circumstances => "no trend" has to be as valid as any numerical result

but... no behaviour [at least, no one worthly to be predicted] can be described by attending at a low number of variables
NOTE: this limitations regarding the number of independent variables will be treated in future posts