...
My comment was directed solely to support the notion that with statistical/predictive models it's very easy to develop one that does exceeding well on the training data (historical data) but not do well in the future. I was not referencing any specific model or anything that you've proposed.
Parsimony is a term often used by statisticians and just means using a simple model that's not too complex nor with too many variables. One reason they prefer simple models is that it helps one avoid models that do well on the historical/training data but not in the future (also called overfitting).
Using a holdout set means that one takes the historical data and only uses a subset to explore and develop their model. So for example, I might develop my model (i.e. train parameters like how many days the window for the moving average should be) on data from 1920 to 1970 and then test it from 1971 to 2014. If the model is good, it should do well on the data from 1971 to 2014 which it never saw.
Both these methods can help mitigate (but not eliminate) the problem of developing models that do well historically but not in the future.
...
I agree with everything you said, and as you my comments are directed at models in general rather than any specific model.
Just a few additional observations. While the holdout method is reasonable in theory it is actually impossible for a human to accomplish, for the simple reason that it works only if you apply it only once. But the problem with us humans is that we like to tinker. So if the first model, developed on one subset of the data is not good enough, we will modify it and run it again against the holdout data, and we will continue to do this until we get good results with both the back tested and the holdout data. The problem is now that we are just fooling ourselves, we don't have any holdout data, we used it to make the model. We have only succeeded in making us more confident about our model, which is great at back testing may have no ability to forecast.
Another approach which can work, depending on the type of data is to generate random variables with the same statistical parameters as the original data (mean, stdv, autocorrelation, derivatives, etc.) and using the algorithm used to generate the original model, use it on these variables to produce a model and forecast. Run this many times with different random series and you can develop a distribution of prediction results, and you can see how far out on the curve your real model falls. Are you better than 95%, 99% of the random models, something like this.
But here we have exactly the same problem as we have with holdout data. If we don't get a better model than chance, we will keep tinkering until we do. And if we do a hundred models (on average) we will get out to the 99% level.
So in the end, there is actually no practical way to test a model.