Which Data Set Should I Use?

nico08

Recycles dryer sheets
Joined
Feb 6, 2010
Messages
429
Hi. I am using a retirement calculator to help me assess the amount of money I will need in my investment portfolio in order to FIRE with a certain degree of success.

So I have rates of return and standard deviations for different asset allocations (60% stock/40% bond, etc.). One set of data covers 1964-2013 (50 years) and the other set of data covers 1928-2013 (86 years).
The 50 year data set gives me much better probability of success than does the 86 years data set.

Do you think one data set in more predictive of future rates of return and standard deviations than another? Of course I would like to use the 50 year data set, but if it not the most representative choice, then I will need to go with the 86 year data set.

Thank you for your advice.
 
Normally when doing a study (any study not specifically for SWR), you would ideally decide beforehand as to what data was going to be used based on your knowledge of how it was collected, representativeness, structural changes in what you are studying, and potential for errors. Picking a data set after looking at the results is very, very, bad methodology.

Personally, I would lean toward using the longer period as it has more data and is a superset of the other.

Another thing to note, is that if you change the inputs slight but get very different results then your system is unstable. This means that you can't really rely too much on any results.
 
Normally when doing a study (any study not specifically for SWR), you would ideally decide beforehand as to what data was going to be used based on your knowledge of how it was collected, representativeness, structural changes in what you are studying, and potential for errors. Picking a data set after looking at the results is very, very, bad methodology.

Personally, I would lean toward using the longer period as it has more data and is a superset of the other.

Another thing to note, is that if you change the inputs slight but get very different results then your system is unstable. This means that you can't really rely too much on any results.

Hi Photoguy. Yes, I was tending to think the data set that includes more years, would be the better choice. I am thinking that the longer data set would represent more of the events that caused market downturns. And so I think it would more likely be the more conservative choice.
 
When I set my investment earnings rate assumption for my deterministic retirement analysis, I used the 1926-current data set and that also seems to me to be the better choice in your situation.
 
Back
Top Bottom