Click here to subscribe to Newcula
When making a forecast, the first step must be to understand the underlying assumptions you are making. The problems most often present in failed forecasts result from what Elon Musk calls “reasoning from analogy,” which means taking something similar and applying or extrapolating it to the current situation.
For example, the S&P 500 stock index was at 2,255 at the beginning of 2017. If you read mutual fund prospectuses from a bank, it may tell you that the market goes up by 7%, or 8%, or 11% a year on average, depending on the time frame it chooses to refer to. So you might be tempted to take 2,255 and multiply it by 1.08, and say that the S&P will reach 2,446 by the end of the year and call it a day.
It’s not wrong because it’s too simple. Complexity is easy: take the second derivative of both GDP growth and the U-16 rolling 6-month rolling unemployment rate in the US, the midpoint between 5- and 30- year treasury yields, the inverse of wage increase divided by productivity growth, and the convexity of a basket of AAA corporate bonds, perform a multivariate regression analysis and project forward from the 90-day index moving average to arrive at 2481.59212 by Dec 29, 2017 at 2:32pm Eastern Standard time.
We just found a more complex, reassuring way of reasoning from analogy to arrive at precisely the wrong result. Beware geeks bearing formulas.
Building Your Model
If what you are trying to predict is predictable at all, then there must be a cause-effect relationship which you have to discover. This is called a forecast model and it’s the essential starting point of making any prediction.
A forecast model consists of a set assumptive causal relationships that can use known or available information ascertain unknown information. The known and unknown doesn’t have to be present versus future. It could be present versus present, or even past versus past.
You can build a model, and then wait to see if it works in the future. However, having to wait days, weeks, or even years to tweak and improve the initial model means it will take an unacceptable amount of time to get something reasonably accurate. So it’s much better to try to first predict the present.
I’ve found that the best way to check your assumptions of causal relationships is by attempting to make a guess, and force yourself to build a mental model on a blank canvas.
You might now be asking how learning non-random guessing helps us with forecasting.
There is an old Danish saying, “It’s difficult to make predictions, especially about the future.” Following Fermi’s methodology for making initial guesses is a great way to start building a model from scratch. It forces us to explicitly state what our assumed cause-effect relationships are.
Then, instead of waiting for the future to arrive, we can test this initial model with real data from the past to make sure that given we have the correct inputs, the output is close to correct.
Predicting the Present
Suppose I ask you “How many piano tuners are there in Chicago?” 10? 300? 1,500? 10,000? How would you even go about making a rough estimate?
Enrico Fermi is a Nobel Prize winning physicist known as the architect of the nuclear age. He was in New Mexico July, 1945 to bear witness to the Trinity Test – the first nuclear weapon ever detonated. He managed to estimate the power of the blast by dropping strips of paper and then marking how far they flew during the blast. His guess of 10,000 tons of TNT equivalent was almost an exact match to the mechanical gauges at the blast site. Higher fidelity instruments and long calculations later found the blast to be worth 18,200 tons of TNT.
How did he make such a good estimate with so little information? The technique is now known as Fermi’s Estimate and it can be described in 4 steps:
- Break the question into component parts
This is the most critical part of coming up with a good estimate as well as a forecast model. Essentially, breaking an estimate into smaller parts forces you to see whether you truly understand what you are estimating.
Do you really understand how the value you’re looking for is derived?
Building a mental model this way allows you to see if you have any logical flaws, and helps you uncover unknown factors.
- Make rough estimates for each component
Now that you have smaller components, you more easily put good figures to them.
As you’re making these smaller guesses, you’ll see which component of the model you have the least data about, or has the greatest uncertainty.
- Use simple arithmetics to arrive at an estimate
I will get into small mental math cheats to help you with harder calculations, but it’s nothing a scrap piece of paper or a phone calculator can’t handle.
During this process, you’ll see which component of your estimate produces the most variation in the final outcome.
- Compare the estimate to the actual result
After the comparison, if the result is radically different, you’re left with only two possibilities; either you’re understanding of the problem is lacking, and you missed crucial components to the problem, or your model is actually good, but you just lacked data and made bad guesses. In both cases, you will have discovered the next steps in how to improve your model – either adding/removing/rethinking the base assumptions, or acquiring better data.
Piano Tuners of Chicago
Going back to the piano tuners in Chicago, here’s how Fermi went about the estimate.
The most basic assumption here is that supply and demand are roughly in balance in the piano tuning market in Chicago. That is, there aren’t masses of unemployed piano tuners, nor are there an incredibly high number of piano owners who can’t find a piano tuner, regardless of the price they’re willing to pay.
This underlying assumption allows Fermi to break the problem into two major components: supply and demand.
On the demand side, his list of components and component assumptions are:
- There are about 9,000,000 people living in the Greater Chicago area.
- Each household averages about 2 people, so there are 4,500,000 households.
- Roughly 1 household in 20 has a regularly tuned piano*
We can immediately see that this is the most uncertain assumption
- Pianos are tuned about once a year on average.
The four points above means that there are 225,000 regularly tuned pianos each needing one tuning a year for a total of 225,000 tunings.
On the supply side, we can start with each individual piano tuner:
- Each piano tuner works 5 days a week, 50 weeks a year, for 8 hours a day.
- Each piano tuning takes about 2 hours, including travel time.
These two points mean that each piano tuner can do about 1,000 tunings a year.
225,000 ÷ 1,000 is an estimate of 225 piano tuners. The actual number of piano tuners back then was 290. What a guess!
Even more magical is now that we have a mental model, we can add data to it to get real insights into the situation. Beside the scenario where all of our input assumptions are correct and the estimate is great, there are two ways we can be wrong.
Case 1 – The estimate is way off
Suppose the number was 2,900. What could we do?
We can figure out if we are missing something crucial, such as music schools with thousands of pianos in aggregate. In that case, there would be more than 225,000 pianos that need tuning.
Next, we can check the assumptions we assigned to each component. It could be that most pianos need tuning 5 or 6 times a year rather than one. Or it could be that pianos take a really long time to tune. Maybe 8 hours and not 2. Or it could be that piano tuners are inherently lazy and work only 4 hours a day, 4 days a week.
The point is, since our estimate was low, we know which way each of our component estimates must go to compensate. As we gather definitive data, we can use the process of elimination to discover our faulty assumption much faster.
Case 2 – Estimate is good, but one assumption is wrong
Now suppose the estimate was close (which it was) but we find out that one of the assumptions is totally off. That would immediately tell you that at least one other assumption is wrong, which offset the first error. For example, in fact, the actual population of Chicago was only about 4,000,00 in the 1950s, and there were only 1,200,000 households (more than 3 people each).
Demographic information is easy to find. And since Fermi overestimated number of households nearly 4-fold but got a close estimate, it must means one or more of his other assumptions are underestimates. There are probably more pianos per household, or they get tuned more than once a year, or piano tuning takes much longer than two hours, so each tuner tunes fewer than 1,000 a year. Again, some of these data is readily discoverable, and we can use the process of elimination. By the time we get to the truly unknowable quantity, we might be able to state its value with some certainty using our mental model.
Now let’s try this for ourselves. I came up with this question on my way home today while thinking about this post: How many passenger vehicles are there in [Vancouver] (or in the [city you live])?
Step 1 – Break it down
My broad assumptions are that passenger vehicles belong to households, and that very young people and very old people don’t drive.
Step 2 – Put some estimates in there
I break down the component parts further and assign estimates:
- There are about 3,000,000 people in Vancouver.
- Most people live till about 80, and so there are 20% of people below the age of 15, and 10% of people above the age of 75, which leaves us only 70% potential drivers, or 2,100,000 people.
- Each household has about 2.1 people (I deliberately used an easy number to help mental math), leaving us with 1,000,000 households in Vancouver. About half of them live reasonably close to good transit (as opposed to in the suburbs) and the other half live far from transit.
- For those living near transit, there are 0.5 vehicles per household – meaning either half of them don’t drive and the other half have 1 car each, or that 75% of people don’t drive, but the remainder have 2 vehicles each. Both are pretty plausible.
- For those living away from transit, there are 1.8 vehicles per household.
Step 3 – Simple math
Based on our assumptions, we have the vehicles owned by the half close to transit plus vehicles owned by the half far from transit.
That’s 1,000,000 × 0.5 + 1,000,000 × 1.8 which is
500,000 + 1,800,000 = 2,300,000
Step 4 – Check out the source
I googled this and found that at the beginning of 2016, there were 1,632,402 vehicles in Vancouver1, meaning I was off by only about 40%. Then I went to find the demographic data which was simple. Vancouver in fact only has 2,450,000 residents. If I plug that into my model, I would have gotten within 4.9% which is not half bad for a guess.
Then again, it’s not just any guess – it’s a Fermi’s Estimate.
Not only can I pat myself on the back, I actually got a lot of information. My assumptions about the number of people living close and far from transit must be at least approximately correct, as are the impact that has on car ownership. If I could get some reliable figures for those numbers, I can further refine my model.
The goal of this estimate isn’t to come up with exactly the number of vehicles, but rather clearly identify the assumptions buried within a guess. Now if I were to suppose the number of vehicles will be 10% higher in Vancouver in 3 years, I know exactly how my underlying assumptions must change for that to be true, and I can judge whether those changes are reasonable and rational.
Bonus: Mental Math Cheats
Before concluding, let me quickly show you two simple mental math cheats.
First, if you are multiplying together two numbers like 5.4 and 7.7, simply round them both. You can do 5 × 8 = 40. The actual answer is 42.35. We’re trying to get the final estimate somewhere within a 5- to 10-fold error range, so this 5.54% error is not a big deal.
Second, if you’re working with really big numbers like Fermi with the explosion estimate you can use an average of the exponent. Say you’re asked to multiply 7,200 and 1,980,000.
7,200 is 7.2 ⋅ 103 and 1,980,000 is 1.98 ⋅ 106 – just count how many times you have to move the comma.
Use the first trick to round the integers so that you have 7 ⋅ 103 × 2 ⋅ 106. If we remember back to highschool math, multiplying exponents means you have to add them:
7 × 2 = 14 which = 1.4 ⋅ 101
1.4 × 101 × 103 × 106 = 1.4 ⋅ 10(1+3+6) = 1.4 × 1010 or 14,000,000,000
The actual answer is 14,256,000,000 so we were within 1.8% – wow!
How about division: 1,980,000 ÷ 7,200? No problem.
7,200 becomes 7.2 ⋅ 103. 1,980,000 is roughly 21 ⋅ 105. I rounded 1.98 up a bit more to 2.1 and then borrowed 10 from the exponent to make it easily divisible by 7.
(21 ⋅ 105) ÷ (7 ⋅ 103) = 3 ⋅ 10(5-3) = 3 ⋅ 102 or 300
The actual answer is 275 so we’re still just 9.1% off, even after deliberately rounding 1.98 higher than we should have.
Even the most complex problems can be broken down using Fermi’s estimate. In fact, the greater the complexity, greater to need to apply Fermi’s methodology.
Building a Fermi’s Estimate is the critical first step to building a good model for making predictions.
Don’t forget to subscribe, share, or leave a comment!
Keep reading part 2.
1 Metro Vancouver – Registered Vehicles (pdf)