MixedMath: Learning Möbius from Inconvenient Integer Representations

I've previously described machine learning experiments concerning learning the Möbius function. (See general report and technical report, as well as slides from a talk and ctypes). This work went into my preprint on the arxiv DLD. Studying number theory with deep learning: a case study with the Möbius and squarefree indicator functions. arXiv 2025.

The goal is to learn the function $μ (n)$ , which is $0$ if a nontrivial square divides $n$ and otherwise is $(- 1)^{ω (n)}$ , where $ω (n)$ is the number of prime divisors of $n$ . For example, $μ (1) = μ (6) = μ (10) = μ (15) = 1$ , $μ (2) = μ (3) = μ (7) = - 1$ , and $μ (4) = μ (8) = μ (9) = μ (12) = 0$ .

Given enough time, the "obvious strategies" become arbitrarily good at learning whether $μ (n) = 0$ or not. This is not a deep statement and instead describes statistical distribution of square divisibility. One of my strategies in training models to study $μ$ before was to choose representations of integers that obfuscate divisibility by squares — deliberately trying to force the models to learn something different.

Many people who have read or heard about my experiments have offered different inconvenient integer representations and asked how they compare. In this note, I describe two representative representations and how they fare.

Every model I've trained seems to pick up either

divisibility by squares: these become arbitrarily good at learning when $μ (n) = 0$ and don't learn anything about how to distinguish $μ (n) = \pm 1$ ).
divisibility by small primes: these use the correlative fact that if a small prime divides a number, then it's more likely to be squarefull. These also don't learn how to distinguish between $μ (n) = \pm 1$ .

(Or alternately the representations are too inconvenient and the models don't learn either of these).

In practice, what I'm really doing is using $μ (n)$ as a proxy to see if models can learn basic divisibility rules from different representations of integers. In principle it is possible that a model would pick up on some unknown pattern in the representation, but in practice the most likely way to do this would be from $n$ (or $n mod p$ or $χ_{p} (n)$ or some combination of numeric functions like this) itself instead of inconvenient representations.

This just happens to be my favorite problem to throw models at right now, seeing if anything sticks.

Zeckendorf Representation

Every positive integer has a unique representation as a sum of distinct Fibonacci numbers such that the sum does not include any two consecutive Fibonacci numbers. This is straightforward to prove and leads to two closely related ways of representing an integer $n$ : as a sequence of Fibonacci numbers that sum to $n$ , or as a binary sequence with $1$ s corresponding to the included Fibonacci numbers.

A straightforward greedy algorithm computes these very quickly.

fibs = [0, 1]

def extend_fib(n):
    """Extends `fibs` to have at least one number larger than n."""
    while n > fibs[-1]:
        nextfib = fibs[-1] + fibs[-2]
        fibs.append(nextfib)

def zeckendorf(n):
    """
    Computes the Zeckendorf representation of a non-negative integer n.

    Returns:
        A list of Fibonacci numbers that sum to n, with no two consecutive
        Fibonacci numbers. Returns an empty list if n is 0.
    """
    if n < 0: raise ValueError("Input must be a non-negative integer.")
    if n == 0: return []
    extend_fib(n)
    representation = []
    for fib in reversed(fibs):
        if n >= fib:
            representation.append(fib)
            n -= fib
    return representation

def zeckendorf_indices(n):
    """
    Computes the Zeckendorf representation of a non-negative integer n.

    Returns:
        The indices of the Fibonacci numbers that sum to n with no consecutive
        indices being 1. Returns an empty list if n is 0.
    """
    if n < 0: raise ValueError("Input must be a non-negative integer.")
    if n == 0: return []
    extend_fib(n)
    representation = []
    for fib in reversed(fibs):
        if n >= fib:
            representation.append(1)
            n -= fib
        else:
            representation.append(0)
    representation.reverse()
    return remove_trailing_zeros(representation)

def remove_trailing_zeros(inlist):
    """Remove trailing zeros from input array."""
    end = -1
    for i in range(len(inlist) - 1, -1, -1):
        if inlist[i] != 0: break
        end = i
    return inlist[:end]

Similar systems include

Square-Fibonacci base (writing $n = \sum a_{i} F_{i}^{2}$ with $a_{i} \in {0, 1, 2}$ ), or
Chung-Graham (writing $n = \sum a_{i} F_{i + 2}$ with $a_{i} \in {0, 1, 2}$ and $a_{i} = 0$ if $i$ is odd).

I didn't know these representations of integers before Joshua Zelinsky asked me about them. A couple others have given similar ideas.

With each of these, the models learn nothing. These representations are simply too far removed.¹ ¹According to Littlewood's Mathematicians Miscellany, Charles Darwin believed that every once in a while one should perform a fool's experiment. These almost always fail, but any success will be terrific. Darwin purportedly played the trombone to his tulips. The result of that experiment was negative. (The result of this experiment was also negative).

Mixed base, negaternary, and other systems

In standard base $b$ , we can write $\begin{matrix} (1) & n = \pm \sum_{i = 0}^{K} a_{i} b^{i} (0 \leq a_{i} < b) \end{matrix}$ for some $K$ depending on $n$ . Closely related are negative bases, where we can write $\begin{matrix} (2) & n = \sum_{i = 0}^{K} a_{i} (- b)^{i} (0 \leq a_{i} < b) . \end{matrix}$ One can also treat mixed base, writing $\begin{matrix} (3) & n = \sum_{i = 0}^{K} a_{i} (b_{i})^{i} (0 \leq a_{i} < b_{i}) . \end{matrix}$

In each of these, it's straightforward to determine divisibility by primes dividing the base (or primes dividing the "one's base"). The models picked up on this but nothing more.

Factoradic Representation

This came from my coauthor Tom Oliver and performed surprisingly well (given that it feels like an inconvenient representation).

We call a permutation $σ$ of $N$ to be tame if $σ (i) = i$ for almost every $i$ . We can write a tame permutation by writing the sequence $\begin{matrix} (4) & σ (0), σ (1), σ (2), \dots \end{matrix}$ As $σ (i) = i$ for almost every $i$ , we can truncate the purmutation at $J$ where all $j$ for which $σ (j) \neq j$ satisfy $j \leq J$ . The set of tame permutations can then be ordered lexicographically. We use the reversed right-to-left lexicographic order.

We define the factoradic form of $n \in N$ to be the $n$ th tame permutation in the reversed, right-to-left lexicographic order.

In practice, this is made much clearer by a couple of examples.

The factoradic form of $0$ is $(0, 1, 2, 3, \dots)$ .
The factoradic form of $1$ is $(1, 0, 2, 3, \dots)$ .
The factoradic form of $2$ is $(0, 2, 1, 3, 4, \dots)$ .

There are $k!$ permutations of the first $k$ integers, and thus any integer $n < k!$ has a factoradic form that can be specified by giving $σ (j)$ for $j \leq k$ .

An alternate way of giving the factoradic form of $n$ is to give the $n$ th (ordered) permutation of the integers up to $k$ (for some $k$ with $n < k!$ ). This is how I compute them in practice.

ALGORITHM: return factoradic form of n
  1. If n = 0 return (0).
  2. If n = 1 return (1, 0).  // 0! = 1! is confusing
  3. Set k to the the minimal integer with n < k!
  4. Compute L, the (k! - n)th lexicographically ordered
     permutation of (0, 1, ... , k-1).
  5. Return REVERSE(L).

Another simple algorithm gives the $n$ th ordered permutation of $(0, \dots, k - 1)$ .

ALGORITHM: return the nth permutation of (0, 1, ..., k-1).
  1. Set L = (0, 1, ..., k-1).
  2. Set n' = n - 1.  // 0-indexing lists is confusing.
  3. Initialize RET = [].
  4. For each m in [k-1 .. 0]:
  5.   Set q = FLOOR(n' / m!).
  6.   Set r = n MODULO n!
  7.   Append qth element of L to RET.
  8.   Remote qth element of L.
  9.   Set n' := n' - q * m!
 10. Return RET.

Python code for this practically matches the pseudocode.

import math

def nth_permutation(n, k):
    current_list = list(range(k))
    nprime = n - 1
    ret = []
    for m in range(k-1, -1, -1):
        mfact = math.factorial(m)
        q = nprime // mfact
        r = nprime % mfact
        nprime -= q * mfact
        ret.append(current_list[q])
        del current_list[q]
    return ret

def factoradic(n):
    if n == 0: return (0, )
    if n == 1: return (1, 0)
    k = 0
    while True:
        if n < math.factorial(k):
            break
        k += 1
    np = nth_permutation(math.factorial(k) - n, k)
    return list(reversed(np))

assert factoradic(9) == [3, 0, 1, 2]
assert factoradic(1234) == [2, 6, 0, 3, 4, 1, 5]
assert factoradic(123456789) == [9, 3, 6, 1, 4, 2, 5, 0, 7, 11, 10, 8]

I trained models using the factoradic representations. It turns out that these models successfully learned divisibility-by-small-primes rules.

Oliver and Vernitski's preprint Oliver and Vernitski. Divisibility rules for integers presented as permutations. arXiv 2025) shows that factoradic representations "obviously" encode divisibility rules — but I think it' not at all obvious that a ML model would pick up on this. But they do!

Info on how to comment

To make a comment, please send an email using the button below. Your email address won't be shared (unless you include it in the body of your comment). If you don't want your real name to be used next to your comment, please specify the name you would like to use. If you want your name to link to a particular url, include that as well.

bold, italics, and plain text are allowed in comments. A reasonable subset of markdown is supported, including lists, links, and fenced code blocks. In addition, math can be formatted using $(inline math)$ or $$(your display equation)$$.

Please use plaintext email when commenting. See Plaintext Email and Comments on this site for more. Note also that comments are expected to be open, considerate, and respectful.

Comment via email

mixedmath

Zeckendorf Representation

Mixed base, negaternary, and other systems

Factoradic Representation

Leave a comment