Average triangle area in the Cantor dust

\(\DeclareMathOperator{\Spec}{Spec}\) \(\DeclareMathOperator{\Proj}{Proj}\) \(\DeclareMathOperator{\dom}{dom}\) \(\DeclareMathOperator{\ran}{ran}\) \(\DeclareMathOperator{\ar}{ar}\) \(\DeclareMathOperator{\var}{var}\) \(\DeclareMathOperator{\pred}{Pred(V,\mathcal{R})}\) \(\DeclareMathOperator{\encode}{encode}\) \(\DeclareMathOperator{\decode}{decode}\) \(\DeclareMathOperator{\supp}{supp}\) \(\DeclareMathOperator{\lcm}{lcm}\) \(\DeclareMathOperator{\seq}{seq}\) \(\DeclareMathOperator{\var}{var}\)

1. Introduction to the problem

Consider the Cantor set \(\mathcal{C}\) of the real line. The Cantor dust is the set \(\mathcal{C}^2\) of the real plane. Both sets are probability spaces; the Cantor set is equipped with a Borel probability measure \(\mu\) and the Cantor dust inherits the Borel probability measure \(\mu^2\).

An ordered triangle is a triple of points \((P,Q,R)\) from \(\mathcal{C}^2\). The following question is asked: given three random points \(P,Q,R\) of \(\mathcal{C}^2\), what is the average of the area of the ordered triangle with vertices \(P,Q,R\)?

In this article will we show the progress we have for this question.

2. The construction of the Cantor set

The Cantor set \(\mathcal{C}\) is constructed in steps.

We start with the unit interval \(\mathcal{C}_0 := [0,1]\). In the zeroth step, we remove one open interval denoted by \(I_{0,1} := (1/3, 2/3)\), leaving us with \(\mathcal{C}_1\) as a union of two intervals. In the \(k\)-th step, remove \(2^k\) open intervals, each of length \(3^{-k-1}\) and denoted by \(I_{k,1}, \dots, I_{k, 2^k}\), in order of appereance from left to right, to obtain \(\mathcal{C}_k\). We then set \(\mathcal{C} := \cap_{k=1}^\infty \mathcal{C}_k\).

3. The one-dimensional case

Indeed before talking about the average triangle area in two dimensions, it is prudent to ask of the average distance in one dimension. Thus we wish to calculate, for two random numbers \(X,Y\in\mathcal{C}\), the quantity \(\mathbb{E}(|X - Y|)\).

This problem has been solved [1,2] and it turns out that

\begin{align} \label{eq:average-distance} \mathbb{E}(|X - Y|) = 2/5. \end{align}

We'll show one derivation of this result.

3.1. Reduction to CDF powers

We have,

\begin{align} \int_0^1\int_0^1 |x-y|dF(y)dF(x) & = 2 \int_0^1\int_0^x (x - y)dF(y)dF(x), \end{align}

where \(F\) denotes distribution function of \(\mu\), \(F(x) := \mu((-\infty, x])\) for \(x\in\mathbb{R}\). \(F\) is also the Cantor-Lebesgue function of \(\mathcal{C}\) (see §1.5 of [3] for more) and in probabilistic terms, it is the cumulative distribution function of \(\mu\).

cantor-lebesgue-function.svg
Figure 1: The Cantor-Lebesgue function \(F(x)\). (source)

We may now perform integration by parts on the inner Lebesgue-Stieltjes integral (see Theorem 3.36 of [3]), to obtain:

\begin{align} \int_0^x (x-y)dF(y) = \left.(x-y)F(y)\right|_0^x + \int_0^x F(y)dy. \end{align}

The first term vanishes and so we're left with:

\begin{align} \mathbb{E}(|X - Y|) & = 2\int_0^1\int_0^x F(y)dydF(x) \end{align}

One more productive use of integration by parts, this time on the outer integral, yields:

\begin{align} \label{eq:moments} \mathbb{E}(|X - Y|) & = 2\int_0^1 F(y)dy - 2\int_0^1 F(x)^2dx, \end{align}

where we've made use of \(d(f(x)) = f'(x)dx\) to get the second term.

The terms in \eqref{eq:moments} have been computed in many ways [4,5].

One straightforward computation is to note that \(F(x)\) is constant each of the intervals \(I_{i,j}\) from §2, and they exhaust \([0,1]\) (in the Lebesgue measure), so for any exponent \(M>0\),

\begin{align} \int F^M(x)dx& = \sum_{k=0}^\infty \sum_{j=1}^{2^k} |I_{k,j}| F(I_{k,j})^M. \end{align}

We also have

\begin{align} F(I_{k,j}) = 2^{-k-1}(2j - 1) \end{align}

We can utilize the above and compute the two integrals from:

\begin{align} \int F(x)^Mdx & = \frac{1}{3\cdot 2^M}\sum_{k=0}^\infty (3\cdot 2^M)^{-k}\sum_{j=1}^{2^k}(2j-1)^M, \end{align}

to obtain \(\int F(x)dx = 1/2\) and \(\int F(x)^2dx = 3/10\) which shows \eqref{eq:average-distance}.

3.2. Making use of a recurring formula

Reasonably one expects that a recurring formula may help in the calculation of the \(2/5\) result in \eqref{eq:average-distance}. This would offer an alternative method to compute the result. In §2 of [1], the integral is reduced to calculating, for any \(k\), the sum \(\displaystyle\sum_{i < j} d^k_{ij}\) where \(d^k_{ij}\) denotes the distances of the midpoints of the \(i\)-th and \(j\)-th interval of \(\mathcal{C}_k\), and indeed they recurringly connected to the next set of distances \(d^{k+1}_{m,n}\).

4. Open questions

4.1. TODO Find all lines in the Cantor approximation

Can all the lines be computed? More explicitly, can all degenerate triangles be computed?

If a line \(L\) is given, how many points does \(|L\cap\mathcal{C}^2|\) contain? (it answers how many degenerate triangles on \(L\) exist)

4.2. TODO Find all triangle congruence classes pinned at a corner

It may be too difficult to calculate all triangle congruence classes. Can I calculate a subclass such as the pinned triangles? See [6] for an estimate for the integer lattice (my \(r=1/2\)) where it is estimated that the number of congruence classes is \(\sim n^2\) inside \([0,\sqrt{n}]^2\).

4.3. DONE() Write down exactly what series the program is calculating   easy

I need to figure out exactly what the sums \(\pm r \pm r^2 \pm \cdots \pm r^n\) represent.

4.3.1. Analysis of Cantor discretization midpoints

Start with the centered interval \(\mathcal{C}_0 = [-1/2, 1/2]\). Given a ratio \(0 < r \leq 1/2\), remove the middle interval of length \(1-2r\), which means that \(\mathcal{C}_1\) consists of two intervals, each of length \(r\), and of midpoints \(m_{1,1} = -(1-r)/2\) and \(m_{1,2} = (1-r)/2\). For the next iteration, everything will be repeated at the scale \(r\), leaving us e.g. with \(m_{2,1} = m_{1,1} + rm_{1,1} = (1 + r)m_{1,1}\), and \(\mathcal{C}_2\) being four intervals each of length \(r^2\).

In general, each point is given by

\begin{align} m\sum_{k=0}^n \pm r^k, \end{align}

where \(m := (1 - r)/2\). At a simple rescale of the original structure by \(r/m\), we obtain the series \(\pm r \pm r^2 \pm \cdots \pm r^n\). Note that for \(r = 1/3\) we have \(r/m = 1\), and so for the standard Cantor set, the series above goes unchanged.

4.4. TODO Singular congruence classes

Some congruence classes are singular in their area, i.e. there are no two different classes \(C_1, C_2\) of triangles with the same area.

Can these be classified? What makes them unique?

4.5. TODO Complexity analysis on operations

I need to figure out how many operations are taken for each computation. For example, how much memory is required? For instance, here is a count of the congruence classes:

depth ratio # of congruence classes
1 1/3 4
2 1/3 67
3 1/3 2533
4 1/3 119388

Note that they change for different classes. We note their stability:

depth stable ratio # of congruence classes
1 1/2 4
2 1/5 67
3 1/7 2572

Some remarks on the stability:

  • The number of congruence classes increases as the ratio gets smaller.
  • The number appears to immediately stabilize once two different ratios give the same number.

Why do these numbers stabilize? Do they really stabilize or not?

4.5.1. TODO Statistics on congruence classes

What is the average size of a congruence class? What is the standard deviation?

4.6. DONE() Talk to Poursalidis on

  • Mention the tables and observations appearing in 4.5.

5. References

[1]
C.C. Leary, D.A. Ruppe, G. Hartvigsen, Fractals, average distance and the cantor set, Fractals 18 (2010) 327–341.
[2]
D. Allen, H. Edwards, S. Harper, L. Olsen, Average distances on self-similar sets and higher order average distances of self-similar measures, Math. Z. 287 (2017) 287–324.
[3]
G.B. Folland, Real analysis, 2nd ed., John Wiley & Sons, Nashville, TN, 1999.
[4]
E.A. Gorin, B.N. Kukushkin, Integrals related to the cantor function, St. Petersburg Math. J. 15 (2006) 449–469.
[5]
R.A. Gordon, Some integrals involving the cantor function, The American Mathematical Monthly 116 (2009) 218–227. https://doi.org/10.1080/00029890.2009.11920931.
[6]
E. Palsson, E. Yu, On optimal point sets determining distinct triangles, Electron. J. Comb. 31 (2024).