During my last job, I felt like I was unable to appropriately cater to my physical health in a habitual manner. Since leaving, I have developed, and maintained, a healthy physical lifestyle; alternating between running, the gym, hiking, and skateboarding. During some running, my wife asked me how her garmin was tracking her heart zones, what’s a healthy range during exercise (outside of exercise too), and what happens if you exceed your maximum heart rate. I enjoyed thinking about that last one a lot, actually.
After this conversation, I was able to find the formula for calculating the heart zones and also the paper where this calculation was begat.
Source: https://www.polar.com/blog/running-heart-rate-zones-basics/
The formula is the result of a linear regression. Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable’s value is called the independent variable.
In this maximum heart rate case, 220 is the intercept and there is a linear dependent variable estimated slope coefficient. There are numerous exogenous factors to these tests that can have an impact on the result of the test subject, at any time of testing, including stress levels, how fit the person is, how hot/humid the weather is, etc.
Source: https://www.sciencedirect.com/science/article/pii/0091743572900795?via%3Dihub
After thinking about this regression, I remembered that this is a method that can be used when forecasting revenue from advertising.
Revenue = β0 + β1*Ad Spending
The regression line shows the relationship between, in this case, the two variables (dependent being revenue and independent being ad spending). Within this model, both beta coefficients are estimates of the population. Given that there are numerous factors that can impact a business’ revenue, which are not being measured in this model, these coefficients can only be an estimated value of the business’ generated revenue. Using the following sample data, we can measure this regression line:
| Online Store # | Online Advertising Dollars (‘000s) | Monthly E-commerce Sales (in ‘000s) |
| 1 | 1.7 | 368 |
| 2 | 1.5 | 340 |
| 3 | 2.8 | 665 |
| 4 | 5 | 954 |
| 5 | 1.3 | 331 |
| 6 | 2.2 | 556 |
| 7 | 1.3 | 376 |
(Data points: https://www.intellspot.com/linear-regression-examples/)
The regression line (of best fit) that can be seen located within the dots is the regression line; it is the minimised distance between all data points to the line. It has essentially optimised this line between the data points.
- The coefficient β0 would represent total expected revenue when ad spending is zero,
- The coefficient β1 would represent the average change in total revenue when ad spending is increased by one unit (e.g. one dollar):
- If β1 is negative, it would mean that more ad spending is associated with less revenue,
- If β1 is close to zero, it would mean that ad spending has little effect on revenue, and
- If β1 is positive, it would mean more ad spending is associated with more revenue.
- Depending on the value of β1, a company may decide to either decrease or increase their ad spending.
These regression lines are generated by using the Ordinary Least Squares (OLS) method. This method is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares. This principle is concerned with minimising the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the input dataset and the output of the (linear) function of the independent variable. Here is a visual, Excel example of what this all means.
The first two columns, x and y, provide the coordinates of the above scatter plot graph. The Regression column is the straight line equation, y=mx+c, utilising the x (independent) and y (dependent) values from columns A and B. OLS is the difference between the regression and the dependent variable values for each xi value. So, to make sure the regression has successfully adhered to the OLS method, it is possible to sum all of the OLS column values and this should equal 0.
| x | y | Regression | OLS | OLS Check |
| 1 | 2 | 4.43 | -2.43 | 0.00 |
| 2 | 4 | 5.07 | -1.07 | |
| 3 | 5 | 5.71 | -0.71 | |
| 4 | 8 | 6.35 | 1.65 | |
| 5 | 9 | 6.98 | 2.02 | |
| 6 | 9 | 7.62 | 1.38 | |
| 7 | 6 | 8.26 | -2.26 | |
| 8 | 9 | 8.90 | 0.10 | |
| 9 | 11 | 9.54 | 1.46 | |
| 10 | 7 | 10.18 | -3.18 | |
| 11 | 16 | 10.82 | 5.18 | |
| 12 | 9 | 11.46 | -2.46 | |
| 13 | 19 | 12.10 | 6.90 | |
| 14 | 10 | 12.74 | -2.74 | |
| 15 | 14 | 13.38 | 0.62 | |
| 16 | 8 | 14.02 | -6.02 | |
| 17 | 16 | 14.65 | 1.35 | |
| 18 | 17 | 15.29 | 1.71 | |
| 19 | 16 | 15.93 | 0.07 | |
| 20 | 15 | 16.57 | -1.57 |
Regression = INTERCEPT($B$2:$B$21,$A$2:$A$21)+SLOPE($B$2:$B$21,$A$2:$A$21)*A2
OLS = B2–C2
OLS Check = SUM(D2:D21)
Just for fun, this output shows the statistical information for this dataplot.
| Dependent | Independent | |
| Coefficients | 0.9 | 2.5 |
| Standard errors | 0.2149196971 | 1.209420693 |
| r^2; SE_y | 0.7147058824 | 1.664760815 |
| F; degrees of freedom | 17.53608247 | 7 |
| SS_reg; SS_resid | 48.6 | 19.4 |
It is with this statistical information and methods, that businesses can forecast sales based on various parameters. Their optimised forecasting will be dependent on various factors:
- Approaching the data generating and collection process,
- The data that is being collected,
- Models that are selected to analyse the data,
- Macroeconomic factors that might affect bias, i.e. seasonality within markets,
- Utilising accurate and non-redundant data, and
- Correct interpretation and linear storytelling from insights gathered from this process.


2 responses to “13 – Optimisation 2: Estimating Outcomes and Forecasting from the Past”
[…] find the profit maximising position of a business, understand how we estimate models to best predict future events, and consider numerous formulae to solve for the optimised point. What we will be going through […]
[…] third instalment of the optimisation series. If you want to play catch up, here are the first and second instalments. After studying matrices at university, I realised that I didn’t understand them. A […]