Monte Carlo or Historical in ***** and Firecalc?

What would be a better way to plan for a specific week of the year, a historical analysis of the limited data we have for temps and precipitation for that week, or a random mix of temperatures and precipitation for the year?

Of course, market cycles are not as predictable as the seasons, it's an analogy. But as an analogy, I think it has some value.

You are asking the equivalent of why do we use models for anything? why not just look up what happened in the past?

I think looking up what happened in the past is a very good strategy and the first thing one should do. But it is not always sufficient for answering the questions we may have.

For example, models can help us when the underlying system is changing. We couldn't use just the past data to predict temperatures in 20 years because of global warming. The analogy with FIRECALC would be not adjusting the results for the high current equity valuations.

Another thing models are generally better for are predicting extreme outcomes. In the weather analogy, an example would be what is the probability of setting a record high or low temperature on a given day? This cannot be answered without a model.

FIRECALC cannot predict anything worse than US history. But if you think about it, there is some chance that there will be a worse outcome in our lifetimes. If we have a 100 years of return history, what's the chance that next year will be worse? Perhaps a reasonable/default guess would be roughly ~1%. When we are talking about failure rates of 0-5%, this may be significant.
 
Last edited:
I tried Flexible Planner. It gives me 100% using a fixed 5.5% return, no st deviation. Then I used the monte carlo. I used the moderate risk assumptions and it gave me 85%. I tried the other risk categories and got about the same. Much higher than the 67% monte carlo I was getting with *****.

You must have a lot of money :D. I always get better results with Firecalc than with FRP.

For the record I don't think one is better than another (seems like there are people who hang their hat on one planner). Since they are all simulations of one kind the more the better for you to plan. Historical may be great but while in the markets the previous year does have some bearing on the following ones (something that MC don't do well) there is no perfect correlation either.

So the more simulations that give you passing grades the better you can feel about yourself and the future...but it's still not perfect. The results I have tell me that if I fail to have enough for retirement the world is probably in really bad shape so at least I'll have a lot of company :greetings10:
 
You are asking the equivalent of why do we use models for anything? why not just look up what happened in the past?

I can't agree with that - models are very useful for some things. For example, a while back I was having a discussion on the various pros/cons to chilling systems for beer brewing to cool the hot wort to the proper temperature for the yeast. Since my field is electronics, but I know that thermal properties have exact parallels, I modeled various systems in SPICE (using resistors for thermal conductivity, voltage for temperature deltas, and capacitors for thermal mass).

So I could model the effects of things w/o building them, or model conditions outside of what I might be able to set up physically.

That's way different from M-C and random sequences. Maybe a financial model could answer questions like - what happens if we have 10 years of high inflation next? But to me, that's way different than saying that running some random sequences give us any indication of how much we might expect the future to be worse than the past.

I think looking up what happened in the past is a very good strategy and the first thing one should do. But it is not always sufficient for answering the questions we may have.

For example, models can help us when the underlying system is changing. We couldn't use just the past data to predict temperatures in 20 years because of global warming. The analogy with FIRECALC would be not adjusting the results for the high current equity valuations.

You're right - the historical tools are not looking at any long term biases that might drive the economy one way or the other, but I'm not convinced there is anything like that that we can estimate.

Another thing models are generally better for are predicting extreme outcomes. In the weather analogy, an example would be what is the probability of setting a record high or low temperature on a given day? This cannot be answered without a model.

I disagree, historical data and probability will give you an idea of that.

Hmmm, OK, maybe I see what you are saying here. In the historical data we don't have anything that exceeds history (obviously!), so it can't estimate the odds of a new record without applying some probability (modelling).

But here is how I would approach that (and now that I think of it, this is my informal 'fudge factor', now we could apply some math to that to make it seem more 'real'- but in reality, our terms will just be 'fudge-factors' anyhow). So let's take the entire range of results from a historical run, and then pick some StdDev that we like (a fudge-factor), and see how much we might vary from past results. I think that makes more sense than randomizing the data sequences. Thoughts?

-ERD50
 
I can't agree with that - models are very useful for some things. For example, a while back I was having a discussion on the various pros/cons to chilling systems for beer brewing to cool the hot wort to the proper temperature for the yeast. Since my field is electronics, but I know that thermal properties have exact parallels, I modeled various systems in SPICE (using resistors for thermal conductivity, voltage for temperature deltas, and capacitors for thermal mass).

...

But here is how I would approach that (and now that I think of it, this is my informal 'fudge factor', now we could apply some math to that to make it seem more 'real'- but in reality, our terms will just be 'fudge-factors' anyhow). So let's take the entire range of results from a historical run, and then pick some StdDev that we like (a fudge-factor), and see how much we might vary from past results. I think that makes more sense than randomizing the data sequences. Thoughts?

-ERD50

I don't understand how your "fudge-factor" idea would work. There are only n/m separate results (where n is the number of years in the historical time series and m is the retirement period) so for a 40 year retirement you'd only have about 3 separate outcomes. How would you apply your fudge-factor to those outcomes?

A bit of a diversion, but back in the day when I worked in the Valley on analog integrated circuits, we used SPICE *with MC* to model the effect of random process variations on circuit performance.
 
Proposed:
- We would like to use historical data, but the real-world historical base is too short to allow a large number of 40 year runs.
- We would like to to keep all historical data from one year together (e.g. so we don't have 13% inflation together with ST interest rates of 1%).
- We would like to preserve the real-world sequences (that is year-to-year continuity) to a large degree, but are willing to compromise this somewhat to achieve a larger number of runs.

Approach: Divide the historical record into many, many 15 year chunks (for example) and run them in various sequences to achieve the desired 40 year run. There would be 2-3 "discontinuities" over each 40 year series, but that would seem preferable to randomness every year, and it opens up a much larger number of 40 year "runs". Yes, there would be cases when the returns were much worse than actually occured (because 2-3 "bad" periods got concatenated), but the same thing could happen with MC.

I'm sure this is not a new idea--does it have a name in the stats "biz"?
 
That's way different from M-C and random sequences. Maybe a financial model could answer questions like - what happens if we have 10 years of high inflation next? But to me, that's way different than saying that running some random sequences give us any indication of how much we might expect the future to be worse than the past.

There's a wide variety of MC models. Some are like Fred123's bootstrap which is essentially just rearranging the existing the data and there are others which have both a deterministic and random component (e.g. some of the models by Pfau).

But here is how I would approach that (and now that I think of it, this is my informal 'fudge factor', now we could apply some math to that to make it seem more 'real'- but in reality, our terms will just be 'fudge-factors' anyhow). So let's take the entire range of results from a historical run, and then pick some StdDev that we like (a fudge-factor), and see how much we might vary from past results. I think that makes more sense than randomizing the data sequences. Thoughts?

From a practical perspective, I think using historical and adding a safety factor is actually a pretty good way to go. It prevents one from being too clever and outsmarting oneself (lots of bright people put too much faith in their models and got burned).

Regarding random sequences, if you are talking about Fred123's specific method (bootstrap), I think there are some good and bad things about it. On the good side:
- bootstrap can yield confidence bounds and give a distribution of results instead of a point estimate as with FIRECALC
- bootstrap avoids the issue of certain years being overweighted (appearing in more runs than others)
- bootstrap avoids the issue of overlapping 30 year periods which introduces statistical dependencies (very bad and hard to deal with)
- if you believe there is no serial correlation then the resampling is more or less fine

On the bad side
- bootstrap resamples the existing data, so there's no chance of any single year being worse than history. Obviously there is some (hopefully small) chance that we could have a worse year than in the past.
- because bootstrap is not based on a model, we can't run what if scenarios or tailor the analysis to account for things like high valuations
- if you believe there is serial correlation or reversion to the mean, the bootstrap assumption is not very appealing as it is WRONG

For this last point, even if you believe the assumption (no serial correlation) is wrong, I think you can still get useful information out of it. As the bootstrap assumes no reversion to the mean, we know that drawdowns are likely going to be longer and deeper than FIRECALC. So this gives us a pessimistic estimate which is also helpful in planning.

The more complex MC models, such as those used by Pfau are nice in that they allow what if scenarios but I'm somewhat skeptical of the work. From the papers I've seen, they use quite a few equations/parameters and don't detail how they are computed or spend any time in the paper validating them. Also, given how few people work in this area, probably nobody has replicated their simulation. It would be very easy for them to have bugs or other errors.

I guess my philosophy is to start with the historical record (ala FIRECALC) and then use the MC simulations to get an idea of how FIRECALC might be biased (what direction and by how much -- this can also feed into how you determine the safety factor). MC methods are also helpful to understand the behavior of various drawdown algorithms (e.g. Kitces/Pfau tested their rising glidepath method with MC and it would have been very difficult for them to draw any conclusions from just Firecalc) and in this case, model imperfections are not as problematic due to the paired comparison.
 
edit - I posted before I saw the 2 recent posts from samclem and photoguy, so the following does not reflect any of their valuable input...

I don't understand how your "fudge-factor" idea would work. There are only n/m separate results (where n is the number of years in the historical time series and m is the retirement period) so for a 40 year retirement you'd only have about 3 separate outcomes. How would you apply your fudge-factor to those outcomes?

I'm not following you. A 40 year run in a tool like FIRECalc provides 104 data points. Yes, they overlap, but we have seen very different outcomes in sequences just a few years apart, even though much of the data overlaps - (I used $30K and $1M for 40 years in this example):

FIRECalc looked at the 104 possible 40 year periods in the available data, starting with a portfolio of $1,000,000 and spending your specified amounts each year thereafter.

Here is how your portfolio would have fared in each of the 104 cycles. The lowest and highest portfolio balance throughout your retirement was $568,462 to $14,626,484, with an average of $3,812,159. (Note: values are in terms of the dollars as of the beginning of the retirement period for each cycle.)

So I'm saying, why not apply some fudge to that range of outputs? But it all comes down to trying to predict the future in one way or the other, which we cannot do, so I think it really comes down to a gut feel of how much buffer you feel comfortable with.

On the other side of the coin, there are a few people who have planned for a relatively high WR%, saying, 'hey it passed 75% of the time, good enough for me!'. And when you look at the range of historical outputs, they obviously may end up on a future path that is close to one of the majority of the successful paths, and they'll live high on the hog and leave a bundle for heirs/charity if they want. Ya' never know.



A bit of a diversion, but back in the day when I worked in the Valley on analog integrated circuits, we used SPICE *with MC* to model the effect of random process variations on circuit performance.

And our design guys/gals did as well. It makes good sense when you want to model components with (say) 10% tolerance, and see if the circuit will perform within limits as those components randomly vary within those 10% limits.

It doesn't try to predict the future and 'guess' about whether the components might go beyond 10% limits. So I think it is way different than M-C for a financial analysis.

As a further aside, I wasn't sure if it modeled ALL the R's of a specific value all going 10% high/low at the same time, or if it randomized those? I think one school of thought was that these R values would vary randomly, but I think many automated processes tended to have a very good short term repeat-ability, but a longer term drift. So, for example, a batch of 10% 100 Ohm R's might all come off the reel very close to 91 Ohms, and the next in sequence might all be very close to 92 Ohms, then 93 Ohms, etc, until they got to 109 Ohms, and then the manufacturer stopped and reset the machine (having characterized the process, and knowing that in this case the R value drifted up over time), and then it was producing 91 Ohm values again. All in spec, minimal downtime on the machine, but a clear pattern to the value, not random at all.

Where were we? ;)

-ERD50
 
Last edited:
Jim Otar's MC simulator does something like this. That said, he also doesn't particularly like MC.
 
Regarding random sequences, if you are talking about Fred123's specific method (bootstrap), I think there are some good and bad things about it. On the good side:
- bootstrap can yield confidence bounds and give a distribution of results instead of a point estimate as with FIRECALC
- bootstrap avoids the issue of certain years being overweighted (appearing in more runs than others)
- bootstrap avoids the issue of overlapping 30 year periods which introduces statistical dependencies (very bad and hard to deal with)
- if you believe there is no serial correlation then the resampling is more or less fine

On the bad side
- bootstrap resamples the existing data, so there's no chance of any single year being worse than history. Obviously there is some (hopefully small) chance that we could have a worse year than in the past.
- because bootstrap is not based on a model, we can't run what if scenarios or tailor the analysis to account for things like high valuations
- if you believe there is serial correlation or reversion to the mean, the bootstrap assumption is not very appealing as it is WRONG

For this last point, even if you believe the assumption (no serial correlation) is wrong, I think you can still get useful information out of it. As the bootstrap assumes no reversion to the mean, we know that drawdowns are likely going to be longer and deeper than FIRECALC. So this gives us a pessimistic estimate which is also helpful in planning.

Great summary, though I'd argue a bit with the criticism that you can't run a scenario because the bootstrap is non-parametric. Not because it isn't true, but because scenarios (such as high valuation) reside in the eye of the beholder and, as such, are too subjective. But that's just a nit.

The more complex MC models, such as those used by Pfau are nice in that they allow what if scenarios but I'm somewhat skeptical of the work. From the papers I've seen, they use quite a few equations/parameters and don't detail how they are computed or spend any time in the paper validating them. Also, given how few people work in this area, probably nobody has replicated their simulation. It would be very easy for them to have bugs or other errors.
A couple of years ago I read a paper that Pfau (along with a co-author who I can't recall) published that used an AR(1) model on stock/bond returns to simulate retirement outcomes. I was curious enough about it that I wrote a little program to attempt to confirm their results, and came up with very different numbers. I exchanged a couple of e-mails with Pfau about this, and he didn't really have any answers, and it was pretty clear that there wasn't a lot of time spent validating the model. So, at least in this case, your suspicions are correct.
 
I'm not following you. A 40 year run in a tool like FIRECalc provides 104 data points. Yes, they overlap, but we have seen very different outcomes in sequences just a few years apart, even though much of the data overlaps - (I used $30K and $1M for 40 years in this example):

What is commonly though of as "calculating the standard deviation" is really calculating an *estimate* of the standard deviation of a population based on a set of random and independent samples from the population. If those samples aren't random and independent (as in the case of the overlapping time series in FC), then you have problems with your estimate. That's why it's so hard to generate accurate polling numbers -- very difficult to randomly sample voters.
 
A couple of years ago I read a paper that Pfau (along with a co-author who I can't recall) published that used an AR(1) model on stock/bond returns to simulate retirement outcomes. I was curious enough about it that I wrote a little program to attempt to confirm their results, and came up with very different numbers. I exchanged a couple of e-mails with Pfau about this, and he didn't really have any answers, and it was pretty clear that there wasn't a lot of time spent validating the model. So, at least in this case, your suspicions are correct.


I am strongly in the favor of the historically simulation for among other reasons they are a lot easier to check. I suggest that Pfau errors aren't the exception, rather the rule. Pfau is just more transparent and more open to criticism than the average MC calculator on some financial website.

I also investigated one of Pfau's papers from a couple of years which concluded both that SWR should be much lower in the 2.5% range and much lower equities around 20-30%. Now I can understand lowering a SWR in today low interest rate, and high stock valuations, although a couple of years before the 2013 30% gains they weren't really high.

The reason that Pfau calculators came up with these results, is he dropped the average return of the market from 9.5 to 7% IIRC. His rational was the US market outperformed international and therefore the US markets returns should be more inline with International markets. Now I guess you make a case for lower US equity returns going forward, but not for that reason. 50 and 30 years ago, many foreign stock markets were more like the Kleptocracy of Russian markets today. Insiders, and government official skimmed off billions of corporate profits instead of returning money to shareholders. Capital markets around the globe have evolved to look more like the US markets (and Canadian, and Australian which also have had fantastic market returns over the 100 years) and even the US market is more transparent. This means investors are getting a higher percentage of corporate profits.

I suspect that all MC have similar assumptions, which can easily be argued with but aren't easily checked, cause the calculators don't have peer reviewed papers associated with them like Pfau's do.

Not that an historical calculators are error free. For instance it was only discovered a few years ago that FIRECalc, doesn't actually the change in value of bonds but rather just credits the interest. E.g. you have a 50/50 portfolio, and LT interest rates are 5% in year you collect $25,000 interest, but if interest rates drop to 4% the next year, the value of your bond portfolio doesn't increase but remains constant. Now there is no easy solution to problem. But the fact that you could compare FIRECalcs calculations to say the total return of the long term treasury bond fund, made it easier to spot the error.

Frankly, I think MC calculators are downright dangerous.
 
I also investigated one of Pfau's papers from a couple of years which concluded both that SWR should be much lower in the 2.5% range and much lower equities around 20-30%. Now I can understand lowering a SWR in today low interest rate, and high stock valuations, although a couple of years before the 2013 30% gains they weren't really high.

The reason that Pfau calculators came up with these results, is he dropped the average return of the market from 9.5 to 7% IIRC. His rational was the US market outperformed international and therefore the US markets returns should be more inline with International markets.
This reminded me of the instance a couple of years ago where Dr Pfau's paper said the 4% WR was probably too high--but buried in there was an assumed additional 1% advisor's fee. Well, yeah, if you put it like that . . .
 
Back
Top Bottom