# mixedmath

Explorations in math and programming
David Lowry-Duda

In this note, I talk about the primary results of Improved Bounds on Number Fields of Small Degree, the recent preprint put up by Anderson, Gafni, Hughes, Lemke Oliver, Thorne, Wang, Zhang, and me.

I briefly described this paper before. In this note, I give a simplified approach with simpler (weaker) proofs and proof sketches.

# 1. Broad Strategy

Our goal is to count the number $N_n(X)$ of degree $n$ number fields over $\mathbb{Q}$ with discriminant up to $X$. $\DeclareMathOperator{\Disc}{Disc}$ $\DeclareMathOperator{\Ht}{Ht}$ $\DeclareMathOperator{\idx}{Index}$ We recall that the known bound due to Schmidt is \begin{equation}\label{eq:schmidt_bound} N_n(X) \ll X^{\frac{n+2}{4}}. \end{equation}

We use the same initial setup as Schmidt used to obtain the Schmidt bound \eqref{eq:schmidt_bound}: to count $N_n(X)$, we count monic polynomials \begin{equation*} f(x) = x^n + c_1 x^{n-1} + \cdots + c_n. \end{equation*} Recalling that $X$ is the size of discriminants we're counting up to, it will be convenient to introduce an auxiliary notation \begin{equation} H = X^{\frac{1}{2n - 2}}. \end{equation} Schmidt showed that it suffices to count polynomials $f$ with trace $0$ (i.e. with $c_1 = 0$) and where $\lvert c_i \rvert \ll_n H^i$.1 1There is an implicit constant here that depends on $n$. It's not worth paying attention to, and in this note we ignore it. Ultimately we allow implicit constants depending on $n$. We refer to polynomials satisfying this coefficient bound as the set of polynomials of height $H$.

In $\S2$ of our paper, we show that including polynomials that don't have trace $0$ causes no problems. That is, this naively adds a factor of $H$ (as we count over $\lvert c_1 \rvert \leq H^1$ — but this family overcounts also by a factor of at least $H$.

It turns out that it is also sufficient to consider only irreducible polynomials. This reduction was also known to Schmidt and is not new.

We summarize this in the following lemma.

The cardinality of the set of monic irreducible polynomials of height $H$ bounds $H \cdot N_n(X)$. Thus to bound $N_n(X)$, it suffices to count these polynomials and divide by $H$.

To count these polynomials, we split them into two pieces:

1. We first count polynomials with "small" discriminant, and then
2. We count polynomials with "large" discriminant.

Most polynomials have large discriminant. If $f(x)$ cuts out the field $K$, then \begin{equation*} \Disc(f) = \Disc(K) [ \mathcal{O}_K \colon \mathbb{Z}[\alpha]]^2, \end{equation*} where $\alpha$ is a root of $f$ over $K$. We call the last factor $\idx(f)^2$. In order to bound the number of polynomials with large discriminant, we split these into two subpieces, depending on whether the radical of the index is small or large.2 2Recall that the radical of an integer is the product of its prime divisors.

## 1.1. Why is this approach reasonable?

It is clear that if we can bound each of the three subfamilies of polynomials

• small discriminant
• large discriminant, small index radical
• large discriminant, large index radical

then it should be possible to bound the total number of polynomials.

To bound the set of polynomials having small discriminant, we use mostly naive bounds.

To bound polynomials with large discriminant and having large index radical, we appeal to a result of Bhargava, Shankar, and Wang3 3or rather, we give a slight strengthening of their result by using a tighter sieve that bounds the number of polynomials with discriminants having large square divisors $m^2$ where $m$ is squarefree. Necessarily, if $\Disc(f)$ has large index radical, then it has large squarefree divisors of the desired form.

For polynomials having large discriminant but small index radical, it is intuitively clear that the discriminant should be very powerful.4 4that is, it should be divisible by large powers of primes. This carries to the index itself — $\idx(f)$ will necessarily be powerful.

In our paper, we show that there is a large cubefull5 5meaning that every prime divisor $p$ is a divisor of order at least $3$ divisor $d$ The result is that we look at polynomials whose discriminants are divisible by large squares. To bound these, we sieve.

It's not obvious that such a sieving approach should work. But it's certainly hopeful.

The final result comes from performing a balancing act on the various subpieces. Even though we spend the vast majority of our paper working towards the local sieving problem, this is not the obstruction towards improving our final result. A major subtheme of our article is that we can count polynomials with large discriminant and powerful index substantially better than might be expected.

In this note, I'll cut many corners on the large and powerful index case. But we'll see that this barely affects the main theorem. This indicates where the remaining obstruction is.

I note that the simultaneously announced paper of Bhargava, Shankar, and Wang makes improvements towards the remaining obstruction, but does not improve upon our bounds for large and powerful index polynomials. It would be interesting to see if it would be possible to use a combination of these ideas to obtain further improved bounds.

# 2. "Easier" Estimates

Let's first consider the two "easier" estimates: polynomials with small discriminant and polynomials with large discriminant and large index radical.

To provide an upper bound for the number of polynomials with small discriminant, we appeal to Davenport's Lemma6 6H. Davenport. On a principle of Lipschitz. J. London Math. Soc. 1951. and a classical argument from the geometry of numbers.

Suppose $\Omega \subset \mathbb{R}^n$ is a region cut out by algebraic inequalities. Then the number of lattice points $\mathbb{Z} \cap \Omega$ is \begin{equation*} \mathrm{Vol}(\Omega) + O\big(\max_\pi \mathrm{Vol}(\pi(\Omega))\big), \end{equation*} where the maximum runs over projections $\pi$ of $\mathbb{R}^n$ onto its various coordinate hyperplanes.

Davenport's Lemma morally says that the number of lattice points in a region is roughly the volume of the region, up to an error comparable in size to the "surface area" of the region.

We define our region to be \begin{equation} \Omega_{H, Y} := \{ (c_1, \ldots, c_n) \in \mathbb{R}^n : \lvert c_i \rvert \leq H^i, \Disc(f_c) \leq H^{n^2 - n} / Y \}. \end{equation} I note here and below that it will be at various points convenient to think of a monic degree $n$ polynomial as living in an $n$-dimensional space $F^n$ for an appropriate ring $F$. The translation will be as above: given a vector $(c_1, \ldots, c_n) \in F^n$, we define $f_c(x) \in F[x]$ by \begin{equation*} f_c(x) = x^n + c_1 x^{n-1} + \cdots + c_n. \end{equation*} The maximum volume of coordinate projections is trivially $O_n(H^{\frac{n^2 + n}{2} - 1})$, coming from projecting $(c_1, \cdots, c_n) \mapsto (c_2, \cdots, c_n)$ (i.e. forgetting $c_1$) and ignoring the discriminant condition.

It remains to consider the volume of $\Omega_{H, Y}$. We see that \begin{equation*} \mathrm{Vol}(\Omega_{H, Y}) = H^{\frac{n^2 + n}{2}} \mathrm{Vol}(\Omega_{1, Y}), \end{equation*} so we only consider $\Omega_{1, Y}$.

One way to do this is to appeal to van der Corput's lemma. The discriminant $\Disc(f_c)$ is a polynomial in $c_1, \ldots, c_n$ with integer coefficients. Explicit computation shows that, as a polynomial in $c_n$, the discriminant is \begin{equation*} \Disc(c_n) = (-1)^{\frac{n(n-1)}{2}} n^n c_n^{n-1} + O(c_n^{n-2}). \end{equation*} Van der Corput's lemma then implies that \begin{equation*} \lvert \{ c_n \in [-1, 1] : \lvert \Disc(c_n) \rvert \leq 1/Y \} \rvert \ll_n Y^{-\frac{1}{n-1}}, \end{equation*} where the implicit constant is independent of $c_1, \ldots, c_{n-1}$. Applying this bound pointwise for each $c_1, \ldots, c_{n-1}$ in $[-1, 1]^{n-1}$, we estimate \begin{equation*} \mathrm{Vol}(\Omega_{1, Y} \ll_n Y^{- \frac{1}{n-1}}). \end{equation*}

Thus we have proved the following lemma, handling the small discriminant case.

Let $n \geq 3$, $Y \geq 1$, $H \gg_n 1$. Then the number of polynomials $f(x) \in \mathbb{Z}[x]$ of the form $f(x) = x^n + c_1 x^{n-1} + \cdots + c_n$ with $\lvert c_i \rvert \leq H^i$ and $\Disc(f) \leq H^{n^2 - n}/Y$ is \begin{equation*} O_n\Big( H^{\frac{n^2 + n}{2}} / Y^{\frac{1}{n-1}} + H^{\frac{n^2 + n}{2} - 1}\Big). \end{equation*}

In our paper, we work harder and prove a substantially better result. We prove this lemma, but with $Y^{\frac{1}{2} + \frac{1}{n}}$ instead of $Y^{\frac{1}{n-1}}$.

To handle polynomials with large discriminant and large index radical, it would be possible to use the following earlier result of Bhargava, Shankar, and Wang7 7Note this is an older paper than the recently announced one. This is from Squarefree values of polynomial discriminants I, to appear in Inventiones Mathematicae. .

For $n \geq 3, H \geq 1, M \geq 1$, we have that \begin{align} &\#\{ f_c : \substack{\lvert c_i \rvert \leq H^i \\ m^2 \mid \Disc(f_c) \text{ for some squarefree } m \geq M} \} \notag{} \\ &\qquad\ll_n \frac{H^{\frac{n^2 + n}{2}}}{M} + H^{\frac{n^2 + n}{2} - \frac{1}{5}}.\label{eq:bsw-prev} \end{align}

This applies as $\idx(f)^2 \mid \Disc(f)$, and if $\idx(f)$ is large and has large radical, then it has a large squarefree part.

In our paper, we note that applying a stronger sieve argument directly in BSW would improve the savings $-\frac{1}{5}$ from \eqref{eq:bsw-prev} to $-\frac{1}{2} + \epsilon$. This is not a "deep" observation. I suspect it would be possible to improve this directly using sieves, but I note that this result was improved with very different techniques in the recently announced BSW preprint.

# 3. Parametrized Argument

Let us now describe how these pieces fit together with an incomplete parametrized result for the remaining piece. Lemma 1 shows that it suffices to irreducible polynomials of height $H$.

As previously noted, if a polynomial $f(x)$ is irreducible and cuts out the field $K$, then its discriminant satisfies $\Disc(f) = \Disc(K) \idx(f)^2$. Taking $Y = H^{n-1}$ in Lemma 4 shows that the number of polynomials of height up to $H$ and $\Disc(f) \leq H^{(n-1)^2}$ is $O_n(H^{\frac{n^2 + n}{2} - 1})$.

Thus with at most $O_n(H^{\frac{n^2 + n}{2} - 1})$ exceptions, \begin{equation}\label{eq:idx_bound} \idx(f)^2 \cdot \Disc(K) = \Disc(f) > H^{(n-1)^2}. \end{equation} For any desired number field $K$, we have $\Disc(K) \leq X \sim H^{2(n-1)}$, and thus each of the corresponding irreducible polynomials has index bounded below by \begin{equation}\label{eq:idx_bound2} \idx(f) \gg_n H^{\frac{(n-1)(n-3)}{2}}. \end{equation}

We call $P(\alpha)$ the statement such that the following proposition holds for a given $\alpha$. This is the parametrized proposition.

Let $n \geq 3$, $H \geq 2$. The number of polynomials $f \in \mathbb{Z}[x]$ of degree $n$ and height $H$ for which $\mathrm{rad}(\idx(f)) < H^\alpha$ but $\idx(f) > H^{\frac{(n-1)(n-3)}{2}}$ is $O_{n, \epsilon}(H^{\frac{n^2 + n}{2} - \alpha + \epsilon})$.

Taking $M = H^\alpha$ in Lemma 6 shows that the number of polynomials of height $H$, index bounded below by $H^{\frac{(n-1)(n-3)}{2}}$, and $\mathrm{rad}({\idx}(f)) > H^\alpha$ is at most \begin{equation*} H^{\frac{n^2 + n}{2} - \alpha} + H^{\frac{n^2 + n}{2} - \frac{1}{5}}. \end{equation*} All remaining polynomials have $\mathrm{rad}(\idx(f)) < H^\alpha$, and Proposition 8 (when true for $\alpha$) implies that there are at most \begin{equation*} H^{\frac{n^2 + n}{2} - \alpha + \epsilon} \end{equation*} many such polynomials.

In total, these bounds and Lemma 1 show that for any $\alpha$ where $P(\alpha)$ is true, \begin{equation*} H \cdot N_n(X) \ll_{n, \epsilon} H^{\frac{n^2 + n}{2} - \alpha + \epsilon} + H^{\frac{n^2 + n}{2} - \frac{1}{5}}. \end{equation*} Recalling that $H \approx X^{\frac{1}{2n - 2}}$, this shows that for any $\alpha$ with $0 \leq \alpha < \frac{1}{5}$ for which $P(\alpha)$ is true, we have the bound \begin{equation}\label{eq:result_parametrized} N_n(X) \ll_{n, \epsilon} X^{\frac{n+2}{4} - \frac{\alpha}{2n - 2} + \epsilon}. \end{equation}

In particular, showing Proposition 8 is true for any particular $\alpha > 0$ leads to an improvement over Schmidt's bound.8 8It is apparent that it's not necessary for the radical index bound $H^\alpha$ to be the same as the savings $H^{-\beta}$. More generally, any result with positive $(\alpha, \beta)$ would improve over Schmidt.

# 3. "Harder" Estimate: Large and Powerful Index

We now focus on Proposition 8. This is more involved.

Intuitively, if $\idx(f)$ is large, but $\mathrm{rad}(\idx(f))$ is small, then $\idx(f)$ should be highly divisible by "large" powers of primes. Concretely, for relevant $f$ we'll show that there is a cubefull divisor $d$ of $\idx(f)$ of size approximately $H^2 < d \leq H^3$.

We then bound the number of polynomials of height $H$ with $d^2 \mid \Disc(f)$, and take the union bound across the various possible cubefull $d$.9 9Showing that $d$ is cubefull means that there aren't too many $d$. It would also be possible to consider squarefull or fourth-power-full $d$. In this application, cubefull is the optimal balancing point between sparsity of points and the requisite size of the divisor.

## 3.1 Powerful divisors

We codify the intuitive relationship between being large, but having small radical. This sort of result probably exists in the literature somewhere. It's the sort of thing that is probably frequently reproved, as a stepping stone to other results.

Recall, we say $m$ is $k$-powerful if every prime $p \mid m$ divides $m$ to order at least $k$.

Suppose $m \geq 2$ and $k \geq 2$ are positive integers. Let $R = \mathrm{rad}(m)$ denote the product of the primes dividing $m$.

If $m \geq R^{2k - 2}$, then for every $x \in \mathbb{R}$ with $R^{k-1} \leq x \leq m/R^{k-1}$, the integer $m$ has a $k$-powerful divisor $d \in [x, Rx]$.

This isn't a particularly elegant proof, and it's the same as the one in our paper.

If $m$ is itself $k$-powerful, it is straightforward to show the stronger estimate in each interval $[x, Rx]$ with $R^{k-1} \leq x \leq m/R$.

If $x \leq R^k$, then we can take $d = R^k$. If $x > R^k$, then consider divisors of the form $R^k a$ with $a$ dividing $m / R^k$. We claim there is one such divisor $a$ in the interval $[x/R^k, x/R^{k-1}]$.

If this interval includes $m / R^k$, then we take $a = m/R^k$. Otherwise, let $a$ be the minimal divisor of $m / R^k$ that is greater than $x / R^{k-1}$. As $a \mid m$ and $R = \mathrm{rad}(m)$, it's clear that every prime divisor $p$ of $a$ satisfies $p \leq R$, and thus $a/p \geq x/R^k$. By the minimality assumption on $a$, we have that $a/p \leq x / R^{k-1}$, and thus $a/p$ is our claimed divisor.

When $m$ isn't $k$-powerful, we let $m'$ denote the maximal $k$-powerful divisor of $m$ and let $R' = \mathrm{rad}(m')$. Clearly $R' \leq R$, hence $m/R^{k-1} \leq m'/R'^{k-1}$. Then $m'$ has a $k$-powerful divisor in $[x, R' x] \subseteq [x, R x]$, which implies that $m$ does too. $\diamondsuit$

## 3.2 Fourier analysis

We now consider polynomials $f$ with discriminant divisible by a square $d^2$. It will be more convenient to work locally, so we rephrase this problem as considering polynomials with discriminants divisible by $p^{2k}$.

We'd like to study the $p$-adic density of the condition that $p^{2k}$ divides the discriminant and use a bit of Fourier analysis to recover estimates. It turns out that density estimates aren't sufficient on their own, and additional work is necessary — but we'll still use the density estimate as our "trivial Fourier bound".

But actually determining the $p$-adic density of this condition seems quite hard. One might naively guess that discriminants nearly equidistribute mod $p^{2k}$, and thus the density might by $O(1/p^{2k}))$. But this isn't true!

In our paper, we show that this density is at most \begin{equation*} O_n(p^{-k - \frac{2k}{n}}). \end{equation*} It's slightly simpler to show that the density is at most $O_n(p^{-k})$.

Let $n \geq 2, k \geq 1$, and let $p$ be prime. The set of monic polynomials $f \in \mathbb{Z}_p[x]$ for which $p^{2k} \mid \Disc(f)$ has relative density $O_n(p^{-k})$.

Let $\mathbf{1}_{p^{2k}}(f)$ denote the characteristic function for the set of $p$-adic polynomials for which $p^{2k} \mid \Disc(f)$. Let $d \nu$ denote the Haar measure on the space of coefficients, and let $d \mu$ denote the Haar measure on $K_p$, the $p$-adic completion of the field $K$, the space of roots. The density we're looking for is \begin{equation*} \int \mathbf{1}_{p^{2k}} (f) d \nu(f). \end{equation*} The relationship between these Haar measures was studied by Shankar and Tsimmerman10 10Shankar and Tsimmerman. Heuristiscs for the asymptotics of the number of $S_n$ number fields. 2020. arxiv: 2006.09620. . Using their Lemma 2.2, we find that \begin{align*} \int \mathbf{1}_{p^{2k}} (f) d \nu(f) &= \sum_{[K_p : \mathbb{Q}_p] = n} \frac{\lvert \Disc(K_p) \rvert_p^{1/2}} {\lvert \mathrm{Aut}(K_p) \rvert} \int_{O_{K_p}} \lvert \Disc(\alpha) \rvert_p^{1/2} \mathbf{1}_{p^{2k}}(\alpha) d\mu(\alpha) \\ &\leq \frac{1}{p^k} \sum_{[K_p : \mathbb{Q}_p] = n} \frac{\lvert \Disc(K_p) \rvert_p^{1/2}} {\lvert \mathrm{Aut}(K_p) \rvert} \int_{O_{K_p}} d\mu(\alpha) \\ &\leq \frac{1}{p^k} + O(p^{-k - \frac{1}{2}}). \end{align*} Other than appealing to the work of Shankar and Tsimmerman, the only additional piece of information added here was the trivial bound that $\lvert \Disc(\alpha) \rvert_p \leq p^{2k}$, which is clearly true for any $\alpha$ for which $\mathbf{1}_{p^{2k}}(\alpha) \neq 0$. This completes the proof of the lemma. $\diamondsuit$

Let \begin{equation*} \mathcal{R}_{p^{2k}} \subseteq (\mathbb{Z} / p^{2k} \mathbb{Z})^n \end{equation*} be the set consisting of the $O_n(p^{2nk - k})$ residue classes mod $p^{2k}$ containing the polynomials with $p^{2k}$ dividing their discriminants. We write \begin{equation*} \psi_{p^{2k}}(f) \end{equation*} for this characteristic function. For any $\mathbf{u} \in \mathbb{Z}^n$, we define the Fourier transform11 11Note that we normalize our Fourier transforms in a particular way. This choice of normalization is similar to that in our previous paper Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants, to appear in IMRN. arxiv: 2107.02914

\begin{equation*} \widehat{\psi_{p^{2k}}}(\mathbf{u}) := \frac{1}{p^{2kn}} \sum_{f \in \mathcal{R}_{p^{2k}}} \exp \left( \frac{2 \pi i \langle f, \mathbf{u} \rangle}{p^{2k}} \right), \end{equation*} where the inner product between $f = x^n + c_1 x^{n-1} + \cdots + c_n$ and $\mathbf{u} = (u_1, \ldots, u_n)$ is $\langle f, \mathbf{u} \rangle = \sum c_i u_i$.

The density result in Lemma 10 gives the trivial bound \begin{equation*} \lvert \widehat{\psi_{p^{2k}}}(\mathbf{u}) \rvert \leq \lvert \widehat{\psi_{p^{2k}}}(\mathbf{0}) \rvert \ll_n p^{-k}. \end{equation*} In a simple world, we would show that $\widehat{\psi_{p^{2k}}}(\mathbf{0})$ strongly dominates the other coefficients. This is sort of true, but the fact is that there are so many coefficients that we require additional work.

To bound the Fourier transforms, we'll consider two further cases. If $\mathbf{u} = (u_1, 0, \ldots, 0)$, then we'll see that the Fourier transform prototypically vanishes. If $u_j \neq 0$ for some $j \geq 2$, then we'll use different algebraic ingredients to get better bounds.

Write $\mathbf{u} = (u_1, \ldots, u_n)$. Let $m \leq n$ be the greatest coefficient index for which $u_m \neq 0$ mod $p^{2k}$.

• If $\mathbf{u} = 0$, then $\widehat{\psi_{p^{2k}}}(\mathbf{0}) \ll_n p^{-k}$.
• If $m = 1$, then $\widehat{\psi_{p^{2k}}}(\mathbf{u}) = 0$ unless $u_1$ is divisible by $p^{2k} / \gcd(n, p^{2k}).$
• If $m > 1$, then $\widehat{\psi_{p^{2k}}}(\mathbf{u}) \ll_n p^{-\frac{5k}{2} + v_p(u_m)}.$

In this lemma, we write $v_p(n)$ to mean the $p$-valuation of $n$. We've already seen the trivial bound.

For the other cases, we will use an argument that I rather like12 12This argument is unfortunately cursed. We originally used a similar argument when writing Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants, but we later found a superior argument. This happened again in this paper! Nonetheless, I find this conceptually simpler than what we included in our preprint. . See notes from my previous talk for my previous description of similar methods. There is a standard action of $\mathrm{GL}(2)$ on binary $n$-ic forms. Splitting the action on the leading coefficients and adapting to monic polynomials, we see that there is an action of $\mathrm{AGL}(1) \cong \mathbf{G}_a \rtimes \mathbf{G}_m$, given by \begin{equation}\label{eq:action} f(x) \mapsto \alpha^n f(\alpha^{-1} x + \beta). \end{equation} Ignoring the leading coefficient makes this somewhat inconvenient to work with, but nonetheless it is useful here.

We observe that the condition $p^{2k} \mid \Disc(f)$ is invariant under the operations in \eqref{eq:action}. Correspondingly $\lvert \widehat{\psi_{p^{2k}}}(\mathbf{u}) \rvert^2$ is invariant under this action. Let $\mathcal{O}$ denote any orbit under this action and write $\lvert \widehat{\psi_{p^{2k}}}(\mathcal{O}) \rvert^2$ to mean the common value of $\lvert \widehat{\psi_{p^{2k}}}(\mathbf{u}) \rvert^2$ along $\mathbf{u} \in \mathcal{O}$. A simple application of Plancherel13 13Recalling that we use a somewhat atypical normalization on the Fourier transform. shows that \begin{equation*} \lvert \mathcal{O} \rvert \lvert \widehat{\psi_{p^{2k}}}(\mathcal{O}) \rvert^2 \leq \sum_{\mathcal{O}'} \lvert \mathcal{O'} \rvert \lvert \widehat{\psi_{p^{2k}}}(\mathcal{O'}) \rvert^2 = \lVert \widehat{\psi_{p^{2k}}}(\mathbf{u}) \rVert_2^2 = \lVert \psi_{p^{2k}}(\mathbf{u}) \rVert_2^2 \ll_n p^{-k}. \end{equation*} Thus \begin{equation}\label{eq:orbit_bound} \lvert \widehat{\psi_{p^{2k}}}(\mathcal{O}) \rvert \ll_n p^{-k/2} / \sqrt{\lvert \mathcal{O} \rvert}. \end{equation}

Nontrivial lower bounds for the sizes of nontrivial orbits lead to nontrivial savings. On the other hand, it turns out that proving nontrivial orbit bounds is nontrivial.

When $\mathbf{u} = (u_1, 0, \ldots, 0)$, the exponential sum defining the Fourier transform is \begin{equation*} \widehat{\psi_{p^{2k}}}(u_1, 0, \ldots, 0) = \frac{1}{p^{2nk}} \sum_{f \in \mathcal{R}_{p^{2k}}} \exp\left( \frac{2 \pi i c_1 u_1}{p^{2k}} \right). \end{equation*} As this is invariant under $\mathrm{AGL}(1)$, this is equal to its average over the translation action of $\mathrm{G}_a$ sending $f(x) \mapsto f(x + \beta)$. The $x^{n-1}$ coefficient transforms as $c_1 \mapsto c_1 + n \beta$. This shows that \begin{equation*} \widehat{\psi_{p^{2k}}}(u_1, 0, \ldots, 0) = \frac{1}{p^{2nk}} \sum_{f \in \mathcal{R}_{p^{2k}}} \frac{1}{p^{2k}} \sum_{\beta \in \mathbb{Z}/p^{2k} \mathbb{Z}} \exp\left( \frac{2 \pi i (c_1 + \beta n)u_1}{p^{2k}} \right). \end{equation*} The inner $\beta$ sum is either a sum of complete exponential sums (and thus $0$) or is a trivial sum. It's complete unless $p^{2k} \mid nu_1$, which gives the second point of Lemma 11.

To complete the proof of Lemma 11, we need to study the sizes of orbits when $u_{m} \neq 0$ for some $m > 1$. Conceptually, the action of $\mathrm{GL}(2)$ on binary forms satisfies $[g \cdot v, v'] = v, g^T \cdot v']$, and we can consider the adjoint action instead. Computing this action here, one can show that for $\alpha \rtimes \beta$, we have \begin{align*} \widehat{\phi_{p^{2k}}}(\mathbf{u}) &= \alpha \cdot \widehat{\phi_{p^{2k}}}(\mathbf{u}) = \widehat{\phi_{p^{2k}}}(\alpha^T \cdot \mathbf{u}) \\ \widehat{\phi_{p^{2k}}}(\mathbf{u}) &= \beta \cdot \widehat{\phi_{p^{2k}}}(\mathbf{u}) = \widehat{\phi_{p^{2k}}}(\beta^T \cdot \mathbf{u}) \exp \left( \frac{2\pi i \langle \mathbf{b}(\beta), \mathbf{u} \rangle}{p^{2k}} \right), \end{align*} where \begin{align*} \alpha^T \cdot \mathbf{u} &= (\alpha u_1, \ldots, \alpha^n u_n ) \\ \beta^T \cdot \mathbf{u} &= \left(\binom{n}{1} \beta, \ldots, \binom{n}{n}\beta^n \right) \end{align*} and \begin{align*} \mathbf{b}(\beta) = \bigg( &u_1 + \binom{n-1}{1} \beta u_2 + \cdots + \binom{n-1}{n-1} \beta^{n-1}u_n, \\ &u_2 + \binom{n-2}{1} \beta u_3 + \cdots + \binom{n-2}{n-2} \beta^{n-2}u_n, \\ &\ldots, \\ &u_n \bigg). \end{align*}

For $m$ as in the statement of the lemma, i.e. $m$ is the smallest index such that $u_m \neq 0$ and $m \geq 2$, then we study the effect of $(\alpha, \beta)^T$ solely on $u_m$ and $u_{m-1}$. A simpler computation shows that \begin{equation*} (\alpha, \beta)^T \cdot \mathbf{u} = (\ldots, \alpha^{m-1} u_{m-1} + (n - m + 1) \beta \alpha^m u_m, \alpha^m u_m, \ldots). \end{equation*} The valuation of the $m$th coordinate is preserved, and there are $\gg p^{2k - v_p(u_m)}$ different values $u_m$ produced by these group translations.

For each fixed $u_m$, we count choices of $u_{m-1}$. We note that keeping $u_m$ fixed doesn't necessarily fix $\alpha$ — instead it forces $\alpha \equiv \zeta_m \bmod p^{2k - v_p(u_m)}$ for some $m$th root of unity $\zeta_m$, and thus if $v_p(u_m)$ is large then it's possible to find many distinct $\alpha$ that still fixes $u_m$. It follows that the $\alpha$ action allows one to translate $u_{m-1}$ by an arbitrary multiple of $p^{2k + v_p(u_{m-1}) - v_p(u_m)}$ while keeping $u_m$ fixed. Acting by $\beta$ with a fixed $\alpha = 1$ clearly allows one to add an arbitrary multiple of $(n - m + 1) u_m$ to $u_{m-1}$.

It follows that there are $\gg_n \max\{ p^{v_p(u_m) - v_p(u_{m-1})}, p^{2k - v_p(u_m)} \}$ choices of $u_{m-1}$ for fixed $u_m$. As there are $\gg_n p^{2k - v_p(u_m)}$ choices of $u_m$, this orbit has size \begin{equation*} \gg_n \max\{ p^{2k - v_p(u_{m-1})}, p^{4k - 2v_p(u_m)} \}, \end{equation*} depending on whether the $\alpha$ action on $u_{m-1}$ or the $\beta$ action on $u_{m-1}$ yields a larger orbit. Inserting into \eqref{eq:orbit_bound} completes the proof of Lemma 11.$\diamondsuit$

It might be possible to be more clever about computing orbit sizes. I note that restricting attention two a pair of coefficients might not be particularly lossy, as the underlying $\mathrm{GL}(2)$ group action is fundamentally a two dimensional action. But this analysis of the orbits of these two particular coefficients probably isn't optimal. I don't know what the true answer is here.

This is the most complicated lemma in this writeup, and the corresponding lemma is the most complicated lemma in our paper. I note that the proof here and the proof in the paper are completely different!

## 3.3 Fourier support

Lemma 11 describes bounds for the Fourier transform $\widehat{\psi_{p^{2k}}}$. But examining the lemma closely, we see that if $v_p(u_m)$ is large, then we can't do better than the trivial density bound. It turns out that this bound isn't sufficient on its own. We need another ingredient.

Write the polynomial $f$ as \begin{equation*} f(x) = x^n - \sigma_1 x^{n-1} + \cdots + (-1)^n \sigma_n. \end{equation*} For each $1 \leq i \leq n$, define $D_i := \frac{\partial \Disc(f)}{\partial \sigma_i}$. Then

1. For relevant $r, s, k$, we have that \begin{equation*} \Disc(f) \mid (D_r D_s - D_{r+k} D_{s - k}). \end{equation*}
2. And \begin{equation*} \sum_{1 \leq i \leq n} D_i (n + 1 - i) \cdot \sigma_{i-1} = 0, \end{equation*} where $\sigma_0 = 1$.

The discriminant is naturally expressed in terms of differences of roots of the polynomial. The coefficients $\sigma_i$ are really the $i$th symmetric polynomials in the roots. The proof is largely in exercise in translating between coefficient space and root space, and showing that the maps sending one to the other behave sufficiently nicely to maintain properties.

Then discriminant and its derivatives satisfy an enormous number of algebraic relations, and it isn't surprising that it's possible to identify some of these relations as being relatively straightforward to work with. We translate this relation to a restriction on the gradient of the discriminant.

Suppose $f$ is a degree $n$ polynomial with $\Disc(f) \equiv 0 \bmod p^{2k}$. Let $v_i(f)$ denote the $p$-valuation of $\frac{\partial \Disc(f)}{\partial c_i}$ (where $c_i$ is the coefficient of $x^{n - i}$ in $f$). Then there is either $a \geq 0$ or $0 \leq b \leq \min(\mathrm{val}_p(n), k)$ such that \begin{align*} \min \{ v_i(f), k \} &= \min \{ v_n(f) + (n-i)a, k \} \\ \min \{ v_i(f), k \} &= \min \{ v_1(f) + (i-1)b, k \}. \end{align*}

Stated differently, this says that $\min \{ v_i(f) \}$ is almost an arithmetic progression. To prove this, we use from the previous lemma that $p^{2k} \mid \Disc(f)$ and $\Disc(f) \mid D_r D_s - D_{r+\ell}D_{s - \ell}$. Specializing to $r = s$ and $\ell = 1$, and considering mod $p^{2k}$, this implies that \begin{equation*} \min\{ 2v_i(f), 2k \} = \min\{ v_{i-1}(f) + v_{i+1}(f), 2k \}. \end{equation*} Thus $(v_{i-1}(f), v_i(f), v_{i+1}(f))$ are almost in an arithmetic progression. This shows the form of the two possibilities &emdash; it only remains to consider whether the progressions are increasing or decreasing.

The second part of Lemma 14 implies that $nD_1$ is in the ideal generated by $(D_2, \ldots, D_n)$, which implies that $b \leq \min \{ \mathrm{val}_p(n), k \}$ as claimed. $\diamondsuit$

As a corollary to these two lemmas, we show that $\widehat{\psi_{p^{2k}}}$ is supported on "near arithmetic progressions."

Suppose $\mathbf{u} = (u_1, \ldots, u_n) \in (\mathbb{Z}/p^{2k}\mathbb{Z})^n$. Then $\widehat{\psi_{p^{2k}}}(\mathbf{u}) = 0$ unless $\mathbf{u}$ satisfies one of the two "near arithmetic progression" conditions that \begin{align*} \min\{ v_p(u_i), k \} &= \min\{ v_p(u_n) + (n-i) a, k \} \\ \min\{ v_p(u_i), k \} &= \min\{ v_p(u_1) + (i-1) b, k \}, \end{align*} as described in the previous lemma.

To prove this proposition, we linearize the problem. For each $\mathbf{c}$ corresponding to a polynomial $f_{\mathbf{c}}$ with $p^{2k} \mid \Disc(f_{\mathbf{c}})$, we associate the "hyperplane" \begin{equation*} P_{\mathbf c} := \{ \mathbf{v} \in (\mathbb{Z} / p^{2k} \mathbb{Z})^n : p^{2k} \mid \Disc(f_{\mathbf{v}}), \mathbf{v} - \mathbf{c} \in p^k \cdot (\mathbb{Z} / p^{2k} \mathbb{Z})^n \}. \end{equation*} The set of congruence classes corresponding to polynomials having discriminants divisible by $p^{2k}$ can be split into a disjoint union of $P_{\mathbf{c}}$ over a set $C$ of representatives, and then \begin{equation*} \widehat{\psi_{p^{2k}}} = \sum_{\mathbf{c} \in C} \widehat{1_{P_{\mathbf{c}}}}. \end{equation*}

Thus it suffices to study the "linearized" problem on $\widehat{1_{P_{\mathbf{c}}}}$. For $\mathbf{v} \in P_{\mathbf{c}}$, Taylor expansion shows that \begin{equation*} \Disc(f_\mathbf{v}) \equiv \Disc(f_{\mathbf{c}}) + \mathbf{D}_{\mathbf{c}} \cdot (\mathbf{v} - \mathbf{c}) \bmod {p^{2k}}, \end{equation*} where I use $\mathbf{D}_{\mathbf{c}}$ to mean the gradient vector of the discriminant function. It follows that \begin{equation}\label{eq:add_group} P_{\mathbf{c}} = \{ \mathbf{v} \in (\mathbb{Z} / p^{2k} \mathbb{Z})^n : \mathbf{D}_{\mathbf{c}} \cdot \frac{\mathbf{v} - \mathbf{c}}{p^k} \equiv 0 \bmod p^k \}. \end{equation} This is how the gradient becomes involved.

On the one hand, explicit expansion shows that \begin{equation*} \widehat{1_{P_{\mathbf{c}}}}(\pmb{\xi}) = \frac{\exp(2 \pi i \langle \pmb{\xi}, \mathbf{c} \rangle )}{p^{2 k n}} \sum_{\mathbf{v} \in P_{\mathbf{c}}} \exp\left( 2 \pi i \frac{\langle \pmb{\xi}, \mathbf{v} - \mathbf{c}\rangle}{p^{2k}} \right). \end{equation*} On the other hand, it is clear from \eqref{eq:add_group} that $P_{\mathbf{c}}$ forms an additive group. (This is a benefit of the linearization we're performing). Thus \begin{equation*} \sum_{\mathbf{v} \in P_{\mathbf{c}}} \exp\left( 2 \pi i \frac{\langle \pmb{\xi}, \mathbf{v} - \mathbf{c}\rangle}{p^{2k}} \right) = \exp\left( 2 \pi i \frac{\langle \pmb{\xi}, \mathbf{u} - \mathbf{c} \rangle}{p^{2k}} \right) \sum_{\mathbf{v} \in P_{\mathbf{c}}} \exp\left( 2 \pi i \frac{\langle \pmb{\xi}, \mathbf{v} - \mathbf{c}\rangle}{p^{2k}} \right) \end{equation*} for any $\mathbf{u} \in P_{\mathbf{c}}$. By definition, if $\pmb{\xi}$ is not a multiple of $\mathbf{D}_{\mathbf{c}}$, then it's not in $P_{\mathbf{c}}$ and thus there exists some $\mathbf{u} \in P_{\mathbf{c}}$ such that $\langle \pmb{\xi}, \mathbf{u} - \mathbf{c} \rangle \not \equiv 0 \bmod p^{2k}$. Inserting such a choice above chows that $\widehat{1_{P_{\mathbf{c}}}}(\pmb{\xi}) = 0$, completing the support claim. $\diamondsuit$

The final step is very similar to standard character theory computations. For example, one way to show that $\sum_{n \in (\mathbb{Z}/N\mathbb{Z})} \exp(2 \pi i n / N)$ is $0$ is to note that multiplying by $\exp(2 \pi i /N)$ permutes the sum.

## 3.3 Completing the "hard" estimate

We are now ready to assemble the pieces together and to prove a form of Proposition 8. We will actually prove the following.

Let $n \geq 6$. For $H \gg 1$, the number of monic polynomials $f$ of degree $n$ and height $H$, for which $\mathrm{rad}(\idx(f)) < H^{1 - \epsilon}$ but $\idx(f) > H^{\frac{(n-1)(n-3)}{2}}$ is $O_{n, \epsilon}(H^{\frac{n^2 + n}{2} - \frac{1}{2} + \epsilon})$.

For one such polynomial $f$, Lemma 9 guarantees that there is a cubefull divisor $d$ of $\idx(f)$ with $H^{2 - 2\epsilon} < d \leq H^{3 - 3\epsilon}$. By Lemma 10 and the Chinese remainder theorem, given such a $d$ we have that the polynomial $f$ lies in one of $O(d^{2n - 1 + \epsilon})$ residue classes mod $d^2$.

We let $\psi_{d^2}$ denote the indicator function of the appropriate residue classes, thought of as a function on $\mathbb{Z}^n$. Let $\phi: \mathbb{R}^n \longrightarrow \mathbb{R}$ be Schwartz. Poisson summation shows that \begin{align*} \sum_{\mathbf{c}} &\phi \left( \frac{c_1}{H}, \ldots, \frac{c_n}{H^n} \right) \psi_{d^2} (c_1, \ldots, c_n) \\ &= H^{\frac{n^2 + n}{2}} \widehat{\psi_{d^2}}(\mathbf{0}) \widehat{\phi}(\mathbf{0}) + H^{\frac{n^2 + n}{2}} \sum_{\mathbf{u} \neq \mathbf{0}} \widehat{\phi}\left( \frac{u_1 H}{d^2}, \ldots, \frac{u_n H^n}{d^2} \right) \widehat{\psi_{d^2}}(\mathbf{u}). \end{align*} Here, the Fourier transform $\widehat{\psi_{d^2}}$ is defined in the expected way \begin{equation}\label{eq:poisson_base} \widehat{\psi_{d^2}}(\mathbf{u}) = \frac{1}{d^{2n}} \sum_{f \in (\mathbb{Z} / d^2 \mathbb{Z})^n} \psi_d(f) \exp(2 \pi i \langle f, \mathbf{u} \rangle / d^2 ). \end{equation} Reasoning along the lines of the Chinese Remainder Theorem14 14essentially the same as in Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants shows that we can apply the local bounds from Lemma 11 here. Equally important, we can apply the support results along "near arithmetic progressions".

The trivial density bound implies that the first term on the right of \eqref{eq:poisson_base} is $O_{n, \epsilon}(H^{\frac{n^2 + n}{2}} d^{-1 + \epsilon})$. Summed over all cubefull integers $d > H^{2 - 2\epsilon}$, this term yields a total contribution of size at most $O_{n, \epsilon}(H^{\frac{n^2 + n}{2} - \frac{4}{3} + \epsilon})$.

Note that the Dirichlet series for cubefull numbers has Euler product \begin{equation*} \prod_p \left( 1 + \frac{1}{p^{3s}} + \frac{1}{p^{4s}} + \cdots \right), \end{equation*} from which one readily identifies that there are at most $X^{\frac{1}{3} + \epsilon}$ up to $X$. Additional care shows that one can get this result with $\epsilon = 0$ and even compute the leading coefficient.15 15See for example Ivić and Shiu, The distribution of powerful integers, Illinois Journal of Mathematics 1982. But even the weaker bound is sufficient here (up to treatment of $\epsilon$ factors).

We now consider the nontrivial Fourier coefficients. We note that the weighted terms in the Schwartz function, and the assumption that $d < H^{3 - 3\epsilon}$, implies that the summation is essentially supported on $\mathbf{u}$ with $\lvert u_i \rvert \ll d^2 / H^i$, implying that $u_i = 0$ for $i \geq 6$.

The arithmetic progression of the support of$\widehat{\phi_{d^2}}$ implies that $u_i$ for $i \leq 5$ must then be divisible by $d / \gcd(n^{6-i}, d)$.16 16This number comes from the worst case, when $b = \mathrm{val}_p(n)$ for every prime $p$ dividing $d$. The point is that even in the worst case, it's not possible to lose too many prime factors when we know that $u_i \equiv 0 \bmod {p^{2k}}$ for later $i$. Writing $u_i = (d / \gcd(n^{6-i}, d)) u_i'$, we find that $\lvert u_i' \rvert \ll d/H^i$. This actually implies that $u_i = 0$ for $i \geq 3$, further saving the support.

If $u_2 = 0$, then the sum is only over $\lvert u_1 \rvert \ll d^2 / H$. As Lemma 11 implies that $\widehat{\psi_{d^2}}(\mathbf{u})$ will vanish unless $u_1$ is divisible by $d^2 / \gcd(d^2, n)$, and we take $H \gg_n 1$, this forces $u_1 = 0$ in this range, recovering the trivial Fourier coefficient again.

It only remains to consider $u_2 \neq 0$. Lemma 11 implies that

\begin{align*} \widehat{\psi_{d^2}}(\mathbf{u}) &\ll d^{-\frac{5}{2} + \epsilon} \gcd(d^2, u_2) \\ &\ll d^{-\frac{3}{2} + \epsilon} \gcd(d, u_2'). \end{align*} For each $d$, we compute that this contribution is \begin{align*} &\ll_{n, \epsilon} \frac{H^{\frac{n^2 + n}{2} + \epsilon}}{d^{3/2}} \sum_{\lvert u_1' \rvert \leq d / H} \sum_{0 \neq \lvert u_2' \rvert \leq d/H^2} \gcd(d, u_2') \\ &\ll_{n, \epsilon} d^{1/2} H^{\frac{n^2 + n}{2} - 3 + \epsilon}. \end{align*} In the last inequality, we use that the mass of the sum over $u_2'$ comes mostly from the contribution of the largest possible values, the $O(d^{\epsilon})$ or so terms with gcd of size approximately $O(d/H^2)$. Summing over cubefull integers between $H^{2-2\epsilon} < d < H^{3 - 3\epsilon}$, we find total contribution of size \begin{equation*} O_{n, \epsilon}(H^{\frac{n^2 + n}{2} - \frac{1}{2} + \epsilon}). \end{equation*}

This proves a form of Proposition 8 with $\alpha = \frac{1}{2}$. This is much weaker than what we prove in our paper itself, but is does immediately give (from \eqref{eq:result_parametrized}) that we improve Schmidt's bound to the following.

\begin{equation} N_n(X) \ll_{n, \epsilon} X^{\frac{n+2}{4} - \frac{1}{10n - 10} + \epsilon}. \end{equation}

Combined with the noted possible improvement to Lemma 6 to $\frac{1}{2}$ instead of $\frac{1}{5}$) this is sufficient to obtain the primary theorem of our paper (even though almost every argument given here is weaker than appears in our actual paper).

\begin{equation} N_n(X) \ll_{n, \epsilon} X^{\frac{n+2}{4} - \frac{1}{4n - 4} + \epsilon}. \end{equation}

bold, italics, and plain text are allowed in comments. A reasonable subset of markdown is supported, including lists, links, and fenced code blocks. In addition, math can be formatted using $(inline math)$ or $$(your display equation)$$.