99 – How My Website Statistics are Tracking (Part III)

If you’ve been following this series, you would have seen my initial post (Part I) and subsequent post (Part II). If you’re not up to speed, I’m attempting to forecast my website views over time. I’ve started with a fairly simple model and I’m adjusting it over time to see how accurate I can make it. Here is how it’s been tracking so far:

TimeframePart I (F)Actual
(to calendar date)
Part II (F)Actual
(to calendar date)
Wk 3 (F)
Actual

(to calendar date)
2nd Feb – 8th Feb;
9th Feb – 15th Feb;
16th Feb – 22nd Feb

91

225

170

136

145

146
February41257813257676530
March483577292
February – April1,376572,5862571,909530
January – June2,8966495,7808944,7881,298
July – December3,3538,9617,101
January – December6,24964914,74189411,8891,167

Of course the week where I don’t post the next week’s forecast is the one where the week’s (16th Feb – 22nd Feb) prediction is off by one view. You just got to trust me on this one.

Below are the next forecasted views:

  • Next 7 Days (01/03/2026 – 08/03/2026): 124
  • Current Calendar Month (01/03/2026 – 31/03/2026): 633
  • Next Calendar Month (01/04/2026 – 30/04/2026): 633
  • Next 3 Calendar Months (01/03/2026 – 31/05/2026): 2,020
  • First Half of Year (01/01/2026 – 30/06/2026): 4,014
  • Last Half of Year (01/07/2026 – 31/12/2026): 5,198
  • Full Year (31/12/2026): 9,212

Thoughts

I think what’s important is to track the changes being made. Common practice is to continuously evolve without appropriate reflection on where you started and the changes made. These posts outline maintaining multiple models and tracking the efficiency of each one, highlighting both good and bad decisions.

Although this isn’t a continuous integration/continuous development (CI/CD) system, I’ve been thinking of this DevOps methodology when applying it to this context. Yes, my “production” is just me, but it is a code that I run weekly to gather insights. In my rant post last week, I made reference to DORA. I’ll be pulling that back in this week, but specifically the 4 DORA metrics for measuring two critical aspects of DevOps (I’ve crossed out the metrics that don’t actually pertain to this use case):

  • Velocity metrics track how quickly your organisation delivers software:
    • Deployment frequency: How often code is deployed to production,
    • Lead time for changes: How long it takes code to reach production,
  • Stability metrics measure your software’s reliability:
    • Change failure rate: How often deployments cause production failures,
    • Time to restore service: How quickly service recovers after failures.

If this were a larger business, I would be discussing the efficacy of the models regarding their input/outputs and where we can make trade-offs for forecasting ability. In the Part II post, there’s an outline of these suggested trade-offs/next steps indicating various avenues to take the model. The results are developing an elongated decision tree with tracked code for each branch that you’ve created (welcome to Git):

Latest Code Update

Whilst considering the forecasted numbers above, in combination with their confidence intervals (CIs), I realised that I was attempting to make the model fit the data as opposed to help drive the model’s accurate and consistent (where possible) output. Ultimately, this is an incremental approach to updating model(s) where I’ll be analysing the efficacy and accuracy over several months. This approach has come from years of experience from creating too many projects in data, music, written assignments, etc., where the considered number of branches has not been correctly documented. This has often resulted in inaccurate reflection. Resolution can also signal the potential end for a model/analysis methodology. 

I’ve spent time considering how to approach this forecasting problem, especially using the next steps from that previous post (Part II). As a reminder, here are the options that I considered:

  • Log transform views
  • Add spike dummy variables
  • Add structural growth dummy variable
  • Include day-of-week seasonality
  • Fit SARIMAX with exogenous regressors
  • Validate on rolling windows
  • Change model to underlying GARCH approach

I reordered the list based on the script that I was using at the time, keeping the Log transformation option at the top, mainly due to it being an easy lift:

  1. Log transform views
  2. Fit SARIMAX with exogenous regressors
  3. Add spike dummy variables
  4. Add structural growth dummy variable
  5. Include day-of-week seasonality
  6. Validate on rolling windows
  7. Change model to underlying GARCH approach

Spoiler alert: Whilst integrating these different approaches, I was noticing that the confidence intervals were growing and the forecasted values were moving a fair bit.

The Log Transformation Views process has been selected because it sets to stabilise the variance, account for seasonality, and convert the form of observed growth into a linear trend. Below are the output values I was receiving:

===Forecast totals===

01/03/2026 – 07/03/2026: 106

01/03/2026 – 31/03/2026: 527

01/04/2026 – 30/04/2026: 527

01/03/2026 – 31/05/2026: 1,610

01/01/2026 – 30/06/2026: 3,406

01/07/2026 – 31/12/2026: 3,275

01/01/2026 – 31/12/2026: 6,681

As you can imagine, I was satisfied with this output. My intuition says that the expected final outcome is lower than I would expect; we’re 1/6 of the way through the year and already at 1,262. I think about it in two ways, ceteris paribus:

  1. Monthly percentage of the year, and

1,262/(1/6) = 7,572

  1. Solve for x by cross multiplication, where x = total year views and views = time;

1,262 views in 31+28 days,

1,262 = 59 days

x views = 365 days

1,262*365 = 59x

x = 1,262*365/59

x = 7,807

I continued on this journey of lower confidence intervals, whilst maintaining a realistic end output given how the data is trending. Next was Fit SARIMAX with exogenous regressors. Since SARIMAX already supports exogenous variables, I combined my Google Search Console data and ran the analysis. I chose to do this in an attempt to observe any potential spurious correlations and how to derive that third, fourth, or ninth causal factor. Due to the timeliness of the Google data availability, there is a 3 day lag and it doesn’t provide the most accurate and timely reference points:

===Forecast totals===

01/03/2026 – 07/03/2026: 121

01/03/2026 – 31/03/2026: 605

01/04/2026 – 30/04/2026: 605

01/03/2026 – 31/05/2026: 1,324

01/01/2026 – 30/06/2026: 3,932

01/07/2026 – 31/12/2026: 5,433

01/01/2026 – 31/12/2026: 9,392

I also attempted a few different libraries on the data that I integrated, specifically prophet and xgboost. I’m going to run them in parallel over the next month to critique their validity, but I observed that, for my use case and current dataset, the confidence intervals (CIs) and forecasted views for prophet were wider and tracking alongside the original model(s), whereas the same aspects for xgboost were narrower CIs but lower (unlikely) view counts for the different time periods.

Prophet

===Forecast totals===

01/03/2026 – 07/03/2026: 86

01/03/2026 – 31/03/2026: 615

01/04/2026 – 30/04/2026: 615

01/03/2026 – 31/05/2026: 811

01/01/2026 – 30/06/2026: 2,912

01/07/2026 – 31/12/2026: 2,506

01/01/2026 – 31/12/2026: 5,431

XGBoost

===Forecast totals===

01/03/2026 – 07/03/2026: 78

01/03/2026 – 31/03/2026: 393

01/04/2026 – 30/04/2026: 393

01/03/2026 – 31/05/2026: 823

01/01/2026 – 30/06/2026: 2,903

01/07/2026 – 31/12/2026: 2,501

01/01/2026 – 31/12/2026: 5,417

Results

Here are the current models that I am caching for the near future:

  • Model 1: Auto-updating SARIMAX parameters
  • Model 2: Log Views
  • Model 3: Google Analytics
  • Model 4: Prophet
  • Model 5: XGBoost

From the analysis that I have completed, it has concluded that both of the following models should continue to be preferred over the others:

  • Model 1: Auto-updating SARIMAX parameters
  • Model 3: Google Analytics

This is due to their forecasted values (views) being the best suited based on the confidence interval range across the various measures (week, current month, next month, etc). Here is a summary of those findings:

Dataset

ModelValueCI UpperCI LowerRange
Week1139.950.9229.0178
Current month1646.4587.1705.6119
Next month197.438.1156.6119
Next 3 months11634.8617.32652.22035
First 6 month14561.72458.56664.94206
Last 6 months166113199.916774.013574
Full year111173.04385.917960.013574
Week2109.220.6491.4471
Current month2619.1602.9687.384
Next month220.13.988.384
Next 3 months21134.5195.25491.25296
First 6 month23495.51590.813018.611428
Last 6 months23399688.536481.035793
Full year26894.21924.537717.035793
Week386.335.2201.0166
Current month3614.9605.5635.830
Next month30.00.00.0
Next 3 months3811.2328.51894.01566
First 6 month32912.01915.15147.93233
Last 6 months32506.01015.95847.94832
Full year35431.72936.611027.78091
Week486.335.2201.0166
Current month4614.9605.5635.830
Next month40.00.00.0
Next 3 months4811.2328.51894.01566
First 6 month42912.01915.15147.93233
Last 6 months42506.01015.95847.94832
Full year45431.72936.611027.78091
Week591.235.9219.9184
Current month5614.8605.2637.032
Next month50.00.00.0
Next 3 months55605.82942.211793.48851
First 6 month52912.01915.15147.93233
Last 6 months52506.01015.95847.94832
Full year55431.72936.611027.78091

Preferred model based on the minimum average confidence interval range (across all measures):

Model: 3 – Google Analytics

Model NumberAverage Range ValueModel Name
14,829Auto-updating SARIMAX parameters
212,663Log Views
32,986Google Analytics
42,990Prophet
55,385XGBoost

Preferred model based on the lowest CI range (across all measures):

MeasureCI RangeModel NumberModel Name
Week165.82Log Views
Current month30.33Google Analytics
Next month84.42Log Views
Next 3 Months1,565.53Google Analytics
First 6 months3,232.83Google Analytics
Last 6 months4,8323Google Analytics
Full year8,091.13Google Analytics

Preferred model based on the maximin maximin principle; maximise the output for the minimised cost (greatest forecasted value for each measure based on the minimised confidence interval):

MeasureForecasted ValueModel NumberModel Name
Week86.32Log Views
Current month614.93Google Analytics
Next month20.12Log Views
Next 3 Months811.23Google Analytics
First 6 months2,9123Google Analytics
Last 6 months2,5063Google Analytics
Full year5,431.73Google Analytics

When considering the numbers above, it doesn’t take into account the scale of those values over the confidence interval ranges, relative to each measure:

Measure12345MinimumModel
Week1.68%12.45%5.55%5.55%5.82%1.68%1
Current month0.53%0.39%0.14%0.14%0.15%0.14%3
Next month3.51%12.12%3.51%1
Next 3 Months3.59%13.48%5.57%5.57%4.56%3.59%1
First 6 months2.66%9.44%3.20%3.20%3.20%2.66%1
Last 6 months5.93%30.40%5.57%5.57%5.57%5.57%3
Full year3.51%14.99%4.30%4.30%4.30%3.51%1

Each model’s cell possess the Coefficient of Variation (CV) divided by the model-measure’s value:

=(Confidence Interval/2)/SQRT(DAYS360(“01/01/2023″,”31/12/2026”))/Measure Value

The table above was derived by:

=INDEX($D$43:$I$77,MATCH(1,($F83=$D$43:$D$77)*(G$82=$E$43:$E$77),0),3)

=INDEX($D$43:$I$77,MATCH(1,($F92=$D$43:$D$77)*(G$82=$E$43:$E$77),0),6)

Conclusion

Although I have statistical evidence for the models that I’ll be following, I have decided that I’ll be running the following in parallel:

  • Model 1: Auto-updating SARIMAX parameters
  • Model 3: Google Analytics
  • Model 4: Prophet
  • Model 5: XGBoost

Model 3 requires a bit more automation to be completed, specifically funneling the daily Google Analytics into the file that holds the WordPress data, processing them together.

Leave a comment