Posted by

Javier Moreno

on August 14, 2018

Share to


More Posts...

The Mondrian Process: Digging Out The Roots Of Mondrian Trees

by Javier Moreno | August 14, 2018

Trees in Blossom

Trees in Blossom, Piet Mondrian, 1912

A few weeks ago in our research seminar, Brian proposed a discussion about Mondrian trees as implemented by skgarden (a scikit-learn extension specialized on tree-based models). In their implementation, these are Bayesian decision tree objects with some nice schema for spreading uncertainty that seemed fast to train. When collected in ensembles, they seem to become a serious contender to random forests.

Due to the nature of our products, we are always looking for scalable approaches to solving our problems that allow us to control and expose uncertainty both internally and externally. We want to understand how reliable our projections are given our modelling assumptions and available data. We would also like to provide well-founded confidence indicators for our clients. Since they use our predictions to make expensive operational decisions, it is important for them to be aware of how strong these estimates are. Offering expected measurements of uncertainty provides our clients with a clearer view of the information their historical data provides and helps them focus their efforts in sections of the process where their expertise will be the most valuable. They serve also as a source of openness and trust in our relationship.

When the topic of the seminar was announced, I was curious about the name and decided to investigate a bit deeper to try to understand if it was related to the famous Dutch abstract painter. There was, indeed, a connection: Mondrian trees were decision trees based on Mondrian processes, a family of distributions of guillotine partitions on \(\mathbb{R}^M\) named after Piet Mondrian due to the resemblance of these partitions to Mondrian’s most famous paintings. Although originally introduced by Roy and Teh in 2009 as a potential prior distribution in Bayesian models of relational data, subsequent work by the same authors and Lakshminarayanan showed in 2014 how they could be used for growing a new type of random forest with two main desirable qualities:

  1. Since they are Bayesian, their predictions come equipped with proper uncertainty estimations.
  2. Because of the way they grow, they are also capable of efficient online training.

Both qualities together make Mondrian forests a very promising technique for the type of applications we care about.

For this post, I would like to spend some time explaining the basics of Mondrian processes and illustrating some of their core properties, as these are central to the strengths of Mondrian forests.

Growing Mondrians

Mondrian Processes are methodical constructions of random k-d trees, a sort of projective family of distribution of trees with certain complexity restrictions. Naïvely, given an axis-aligned box \(\Theta =\Theta_0 \times \cdots\times \Theta_{M-1} \subseteq \mathbb{R}^M\) (with \(\Theta_m\subseteq \mathbb{R}\) an interval) and a budget \(b\in \mathbb{R}^{\geq 0}\), you build cuts recurrently as follows:

  1. Cost of the cut: sample \(c\sim \text{Exp}\left(\sum_m \mu\left(\Theta_m\right)\right)\), where \(\mu\) is some measure on the reals (you can even have a different measure for each axis if you like). If it is too expensive (\(c > b\)), you’ve reached a leaf. Due to the nature of the exponential distribution, the larger the size of the box, the cheaper a cut will be.
  2. Axis and position of the cut: else, continue by selecting first an axis to cut (sampling at random in proportion to the measure of each interval) and then a point on the chosen interval (sampling uniformly). This process will give priority to splits on the longest axis.
  3. Two internal boxes are born and could be cut: make the cut, obtain two complementary boxes, and try to get cuts for each of them separately assuming a new budget of \(b-c\).

The following plot shows how cuts are gradually added to a box, one at a time. Each box is accompanied by is budget with the original budget being 1:

For our current implementation, let’s use for \(\mu\) the Lebesgue measure. Since we will only be calculating it for unions of finite intervals, it is just a simple sum of lengths.

Expanding Mondrians

The first cool fact about Mondrians is that on each cut the resulting two new Mondrians downstream are conditionally independent from each other over the base that spawned them. The past determines their future in a very isolated way.

A second result related to this construction states that if \(\Xi=\Xi_0 \times\cdots\times\Xi_{M-1}\subset \Theta\subset\mathbb{R}^M\) and there is a Mondrian \(\mathfrak{p}\) of budget \(b\) on \(\Theta\), then \(\mathfrak{p}|_{\Xi}\), its restriction to the smaller box, is also a Mondrian of the same budget on the smaller domain. This is what the authors of the original paper call consistency. From this property you can derive a process for sampling a Mondrian on \(\Theta\) conditional to its restriction on a smaller domain \(\Xi\). In that case, assume you already have a Mondrian \(\mathfrak{q}\) built on \(\Xi\). Sampling \(\mathfrak{p}\) on \(\Theta\) conditional to \(\mathfrak{p}|_{\Xi} = \mathfrak{q}\) would go roughly like this:

  1. Sample \(c\sim \text{Exp}\left(\sum_m \mu\left(\Theta_m\setminus\Xi_m\right)\right)\), this would be the cost of your first cut outside of \(\Xi\).
  2. If the first cut of \(\mathfrak{q}\) happens to be cheaper than \(c\), then you simply extend this cut to your larger domain and continue the process on each side, now with smaller conditionals on the two sides of \(\mathfrak{q}\) that it defines. The cost of this cut is the original cost of it in the smaller domain.
  3. Else, sample a cut in a similar fashion as in the original algorithm, taken into consideration that you will have to sample it from the area where the cut will not pass through \(\Xi\).

This consistency property, as you may imagine, is key to guarantee the possibility of effective online training for Mondrian trees.

Implementing Mondrians

The function grow_mondrian grows a Mondrian on a given box with a provided budget. An optional parameter allows you to set a Mondrian on a smaller domain as a condition. It handles the general (non-conditional) case by assuming you are conditioning on an empty Mondrian. The implementation relies on a very shallow Mondrian helper class and a few utility functions with what I hope are descriptive enough names. If you would like to check more details, here is a Jupyter notebook with all the code.

def grow_mondrian(box, budget, given_mondrian=None):
    if given_mondrian is None:
        given_mondrian = Mondrian(None, budget)

    if not given_mondrian.extended_by(box):
        raise ValueError('Incompatible boxes: given mondrian box must be contained in new box.')

    mondrian = Mondrian(box, budget)

    cost = cost_next_cut(linear_dimension(box) - linear_dimension(given_mondrian.box))

    next_budget = budget - cost

    given_mondrian_next_budget = given_mondrian.cut_budget if given_mondrian.has_cut() else 0

    if next_budget < given_mondrian_next_budget:
        if given_mondrian.has_cut():
            mondrian.cut_axis = given_mondrian.cut_axis
            mondrian.cut_point = given_mondrian.cut_point
            mondrian.cut_budget = given_mondrian_next_budget

            left, right = cut_boxes(box, mondrian.cut_axis, mondrian.cut_point)

            mondrian.left = grow_mondrian(left, given_mondrian_next_budget, given_mondrian.left)
            mondrian.right = grow_mondrian(right, given_mondrian_next_budget, given_mondrian.right)
    else:
        dimensions_outer = dimensions(box)
        dimensions_inner = dimensions(given_mondrian.box, size=len(box))
        mondrian.cut_axis = random_axis(dimensions_outer - dimensions_inner)
        outer_interval = box[mondrian.cut_axis]

        if given_mondrian.is_empty():
            inner_interval = [outer_interval[0], outer_interval[0]]
        else:
            inner_interval = given_mondrian.box[mondrian.cut_axis]

        mondrian.cut_point, cut_side = sample_interval_difference(outer_interval, inner_interval)
        mondrian.cut_budget = next_budget

        left, right = cut_boxes(box, mondrian.cut_axis, mondrian.cut_point)

        if cut_side: # entire given_mondrian to the left
            mondrian.left = grow_mondrian(left, next_budget, given_mondrian)
            mondrian.right = grow_mondrian(right, next_budget, Mondrian(None, next_budget))
        else: # all given_mondrian to the right
            mondrian.left = grow_mondrian(left, next_budget, Mondrian(None, next_budget))
            mondrian.right = grow_mondrian(right, next_budget, given_mondrian)

    return mondrian

Painting Mondrians

Composition in red, yellow, blue and black, Piet Mondrian, 1921

Once implemented and considering the artistic inspiration, it made sense to build some utilities for plotting properly coloured two-dimensional Mondrians. Beyond the visuals, these plots are also helpful for understanding the particular random behaviour of Mondrians.

My first question was about the effect of the budget. How does complexity grows as budget increases? A result from the original paper provides some clues: if \(\Theta = [0, A] \times [0, B]\subseteq \mathbb{R}^2\), the expected number of slices along each dimension for a Mondrian on \(\Theta\) with budget \(b\) are \(bA\) and \(bB\), while the expected total number of partitions is \((1 + bA)(1 + bB)\). How does this look? Let us take a few samples for different budgets on \(\Theta = [0, 1] \times [0, 1]\).

budget = 1

budget = 2

budget = 64

budget = 128

What about conditional Mondrians? How do they look? For this, let us first sample one Mondrian on \([0, 0.5] \times [0, 0.5]\) and then, given this Mondrian, sample Mondrians on \(\Theta\) conditional on \(\mathfrak{p}\) with the same budget (\(8\)). We have shaded with grey the area corresponding to \(\mathfrak{p}\) on the conditional samples.

As yet another illustration of the way conditional Mondrians behave, we can recurrently zoom out starting from a small Mondrian on \(\Theta\), and then expanding it to larger and larger domains (doubling the length of the intervals). The shaded areas represent the space occupied by the previous Mondrian.

Finally, and just because it is easy, let’s replace rectangles by ellipses!

Going Mondrian

Do you want to learn more about these arboreal entities? Here are some helpful resources: