O'Reilly Proves Oblique Mondrian Trees Beat Axis-Aligned
TL;DR
- Eliza O'Reilly's paper proves oblique Mondrian forests achieve minimax optimal convergence rates on ridge-function data where axis-aligned trees cannot.
- For general ridge functions, no weighting of axis-aligned splits can match the rate oblique splits obtain, regardless of covariate distribution.
- The analysis uses random tessellation theory from stochastic geometry, tying convergence to the relevant feature subspace rather than ambient dimension.
A result from this month's statistics literature is worth pulling out of the math.ST feed, because it formalizes something decision-tree practitioners have argued about for years. Eliza O'Reilly's paper, revised on arXiv in June, proves that oblique decision trees, which split the data along linear combinations of features rather than along a single axis, are not just sometimes better than their axis-aligned cousins. They are provably better in a specific, important regime.
The setting is what statisticians call multi-index models or ridge functions, situations where the output really depends on a low-dimensional subspace of the input even when the input itself has many dimensions. O'Reilly introduces 'oblique Mondrian trees and forests' and uses random tessellation theory from stochastic geometry to derive convergence rates. The headline claim is that with a good choice of the directional distribution governing the directions of the hyperplane splits, these estimators adapt to the dimension reduction class of multi-index models. The matching lower bound is the more decisive part: for any choice of weights over the covariates, axis-aligned splits cannot achieve those improved rates for general ridge functions.
The practitioner-facing reason this matters is straightforward. O'Reilly observes that 'the geometry of axis-aligned partitions limits the model's ability to capture dependencies between dimensions,' and the paper puts a formal floor under how much that geometric restriction costs when the underlying signal actually lives along an oblique direction. Axis-aligned forests remain the workhorse in most tabular pipelines, so a clean asymptotic gap is a useful thing to be able to cite.
The honest caveat is that this is a theoretical paper, 45 pages of bounds and tessellation arguments, not a benchmark sweep. There are no empirical comparisons on real datasets in the material I can see, no discussion of the compute cost of selecting oblique splits at scale, and no practical recipe for estimating the dimension of the relevant feature subspace. What the paper gives you is the asymptotic story; the engineering story is still open.
The forward-looking part is who picks this up. Library maintainers now have a citation for adding oblique-split variants alongside axis-aligned defaults, and anyone with a good heuristic for choosing the directional distribution has a clearly marked research lane to work in.
Shared on Bluesky by 2 AI experts
-
Don't know the author, but have become quite a fan of her work. Is always quite cool and original (sometimes conceptually, sometimes in terms of theoretical machinery &c.) arxiv.org/abs/2407.02458
View on Bluesky →
Originally reported by arxiv.org
Read the original article →Original headline: Statistical Advantages of Oblique Randomized Decision Trees and Forests