by Jonathan Widarsa

by Jonathan Widarsa

on the theory and practice of unveiling structure behind data.


  • What K-means Says about Stocks

    What K-means Says about Stocks

    One natural question we should ask is whether stocks can be grouped by how they behave, rather than just by the sector labels someone assigned them decades ago. For example, here’s one of many things sector labels don’t capture: a tech company and a utility might sit in completely different industries, yet move in near-perfect lockstep […]

    Read more: What K-means Says about Stocks
  • Penalized Regression for Stock Returns

    Penalized Regression for Stock Returns

    Predicting asset returns is a difficult game where the usual rules of regression are stress-tested by noisy signals, correlated features, and the ever-present risk of overfitting to market microstructure. While ordinary least squares (OLS) gives us a clean starting point, penalized regression methods like Ridge, LASSO, and Elastic Net offer a principled way to impose […]

    Read more: Penalized Regression for Stock Returns
  • Continuous Latent States with Kalman Filters

    Continuous Latent States with Kalman Filters

    In the previous article, we introduced Hidden Markov Models (HMMs) as a way to capture volatility regime-switching in SPY returns. By decomposing returns into distinct states, i.e., low, medium, and high volatility, we’re able to uncover meaningful structure that a single continuous model like GARCH could not explicitly represent. However, HMMs assume that the market […]

    Read more: Continuous Latent States with Kalman Filters
  • HMMs for Volatility Regime-Switching

    HMMs for Volatility Regime-Switching

    Previously, we saw that modeling variance, rather than the mean, provides a much more effective way of capturing financial time series dynamics. GARCH models, in particular, are able to reproduce volatility clustering and persistence, making them a strong baseline for volatility modeling. However, GARCH comes with an important structural assumption, which is that volatility evolves […]

    Read more: HMMs for Volatility Regime-Switching
  • GARCH  Sees What ARIMA Cannot

    GARCH Sees What ARIMA Cannot

    In the previous article, we fit AR, MA, ARMA, and ARIMA models to SPY log returns and watched them systematically fail in a very specific way. The residuals showed volatility clustering, the QQ plots showed fat tails, and ACF plot on squared residuals confirmed that the variance itself was autocorrelated. ARIMA models the conditional mean. […]

    Read more: GARCH Sees What ARIMA Cannot
  • Can ARIMA Predict SPY Data?

    Can ARIMA Predict SPY Data?

    This is (hopefully) a beginner-friendly tutorial in attempting to model SPY data using linear time series models. Specifically, we take a look at basic properties of the data, the fitness of AR, MA, ARMA, and ARIMA models, and, spoiler alert, why they suck at the job. *** A good refresher from my article on linear […]

    Read more: Can ARIMA Predict SPY Data?