Improving the bounds towards this conjecture was one of the purposes of my recent paper with Anderson, Gafni, Lemke Oliver, Shakan, and Zhang (accepted to IMRN; arXiv preprint; previous discussion on this site). I’ll refer to this paper as AGLDLOSZ^{2}.

For $H \geq 2$, let $E _n(H)$ count the number of monic integer polynomials $f(x) = x^n + a _1 x^{n-1} + \cdots + a _n$ of degree $n$ with $\lvert a _i \rvert \leq H$, *and* whose Galois group is not the full Galois group $S _n$. Classical reasoning due to Hilbert shows that $E _n(H) = o(H^n)$, sometimes phrased as indicating that one hundred percent of monic polynomials are irreducible and have Galois group $S _n$.

Van der Waerden’s conjecture concerns improving this count. Improvements using varied techniques and ideas have appeared over the years. Prior to the paper of Bhargava, the best record was held by my collaborators and me in AGLDLOSZ, when we showed that $$ E _n(H) = O(H^{n – \frac{2}{3} + \frac{2}{3n + 3} + \epsilon}). $$ But now Bhargava proves the conjecture outright, proving that $$ E _n(H) = O(H^{n – 1}). $$

This is a remarkable improvement and a very good result!

As in AGLDLOSZ, Bhargava studies the problem with a mixture algebraic techniques and Fourier analysis. Let $V(\mathbb{F} _p)$ denote the space of monic degree $n$ (which I keep implicit in the notation) polynomials over $\mathbb{F} _p$. For any complex function $\psi _p$ on $V(\mathbb{F} _p)$, define its Fourier transform $\widehat{\psi} _p$ by $$ \widehat{\psi} _p(x) = \frac{1}{p^n} \sum _{g \in V^*(\mathbb{F} _p)}

\psi _p(g) \exp\left( \frac{2\pi i \langle f, g \rangle}{p} \right).$$

We should think of $\psi _p$ as standing for a characteristic function of some appropriate set $S \subset V(\mathbb{F} _p)$. If $\phi$ is a Schwarz function approximating the characteristic function of $[-1, 1]^n$, then Poisson summation gives $$ \sum _{f \in V(\mathbb{Z})} \phi(f/H) \psi _p(f) =

H^n \sum _{g \in V^*(\mathbb{Z})} \widehat{\phi}(gH/p) \widehat{psi} _p(g).

\tag{1}

$$ For reasonable $\phi$, the left hand side of $(1)$ gives a good upper bound for the number of elements projecting to $S$ from the polynomial box $[-H, H]^n$. As $\phi$ is Schwarz, we should expect the rapid decay of $\widehat{\phi}$ to rapidly bound the error term on the right hand side by $\max \lvert \widehat{\psi} _p(g) \rvert$ times the size of the box $H^n$, with a possible main term coming from the $g = 0$ term.

In AGLDLOSZ, we used precisely this Fourier setup in a modified form of Selberg’s sieve. We focused on counting polynomials $f$ whose Galois group was a subgroup of $A _n$, and we chose $\psi _p$ to be roughly an indicator function that $f (\bmod p)$ had splitting type mod $p$ that was compatible with Galois group $A _n$. (Actually, we were sieving the *incompatible* elements *out*, but this is unimportant). The limit of our result was in understanding the size of the error term in $(1)$, which amounts to providing good bounds for the Fourier transform $\widehat{\psi} _p(g)$. For us, we related this error term to general bounds for the Mobius $\mu$ function and applied these general bounds.

In this paper, Bhargava uses more refined indicator functions. Suppose that the polynomial $f$ factors over $\mathbb{F} _p$ as $\prod P _i^{e _i}$, where each $P _i$ is irreducible (and distinct) and the degree of $P _i$ is $f _i$. Then the degree of $f$ is $\sum f _i e _i$ and we can define the *index* of $f$ mod $p$ as $\sum (e _i – 1) f _i$.

Bhargava roughly considers indicator functions for polynomials having specified index, and shows that for almost any index the corresponding Fourier transform has significant decay. This is roughly the content of Proposition 24 and Corollary 25 (for non-monic polynomials) or Proposition 28 and Corollary 29 (for monic polynomials).

The ideas and methods used in the proofs of Propositions 24 and 28 in particular are very powerful. I think they’re worth meditating over, and I’ll spend more time thinking about them.

To complete the argument, Bhargava then splits up the regions to estimate. Note that counting polynomials and counting the number fields generated by those polynomials are very similar; here, we count the number fields. Using a result of Lemke Oliver and Thorne, it is possible to bound the number of polynomials leading to number fields with “small” absolute discriminant and “small” product of ramified primes. If the product of ramified primes is “small” but the discriminant is “large”, then the index of these polynomials must be large and is thus bounded by his index counts above.

The third case, where the product of the ramified primes is large, takes more work. Bhargava supplies an additional argument using discriminants. In short, one can show that for each ramified prime $p _r$, the source polynomial $f$ must have a triple root or a pair of double roots mod $p _r$. It turns out that this controls the mod $p$ structure of an iterated discriminant, and counting the number of polynomials giving this structure gives a bound $O(H^{n-1 + \epsilon})$. (This is my summary of the bottom paragraphs of pg 22 on the arXiv version).

Further work is needed to remove the $\epsilon$, but this takes small details when compared to the earlier, bigger ideas.

]]>This talk includes some discussion of our paper to appear in IMRN (link to the arXiv version, which is mostly the same as what will be published). (See also my previous discussion on this paper). But I’ll note that in this talk I lean towards a few ideas that did not make it into the paper, but which we are using in current work.

In particular, in our paper we don’t need to use group actions or classify orbit sizes, but it turns out that this is a very strong idea! I’ll note that in a very particular case, Thorne and Taniguchi have applied this type of orbit counting method in their paper “Orbital exponential sums for prehomogeneous vector spaces” to gain extremely strong, specific understanding of Fourier transform for their application.

]]>Here I briefly describe the project and the work of Nir, Raymond, and Henry.

The project was organized around understanding why the following picture has so much structure.

Fundamentally, this image depicts differences between sums related to primes. Let $p_n$ denote the $n$th prime. It follows from the Prime Number Theorem that $p_n \approx n \log n$, and thus that $n p_n \approx n^2 \log n$. One can also show that $$ \sum_{m \leq n} p_m \approx \frac{1}{2} n^2 \log n,$$ and thus we should have that $$ \frac{n p_n}{\sum_{m \leq n} p_m} \to 2.\tag{1}$$

The vertical axis in the image above examines differences between consecutive $n$ in $(1)$ (in log scale), while the horizontal axis gives $n$ (also in log scale).

The fact that $(1) \to 0$ corresponds to the overall downwards trend in the graph. But there is so much more structure! Why do the points fall into “troughs” or along “curtains”? Does each line mean something?

In this version, I’ve colored differences coming from when $p_n$ is a twin prime (in blue), a cousin prime (in green), a sexy prime (in red), or a prime $p$ such that the next prime is $p+8$ (in cyan). The first dot is black because it comes from $2$. The next two correspond to $3$ and $5$ (both twin primes), and the fourth dot corresponds to $7$ and is green because the next prime after $7$ is $11$, and so on.

This is a strong hint at distributional aspects alluded to within the plots.

Nir, Raymond, and Henry proved many things! They quantified the rate of convergence in $(1)$ and thus quantified the guaranteed downward trend in the images and found images that better convey the structure of what’s going on better. I was already very impressed, but then they branched out and studied more!

We chose to investigate a nuanced question: what aspects of the initial plots depend strongly on the fact that the underlying data consists of *primes*, and what aspects depend only on the fact that the underlying data consists of integers with the same *density as the primes*?

To study this, one can create a new set of distinguished elements called Promys Primes (PPrimes) with the same density as true primes using probabilistic ideas of Cramér. Let’s call $2$ and $3$ PPrimes, and then for each odd $m \geq 5$, we call $m$ a PPrime with probability $2 / \log m$. Do this for a large sequence of $m$, and we get a collection of PPrimes that has (with very high probability) the same density as true primes, but none of the multiplicative structure.

It turns out that for sets of PPrimes, there are analogous pictures and the asymptotics are even better! This is in section 3 of their write-up.

We also thought to study analogous situations in related sets of primes, such as the Gaussian integers. Recall that the Gaussian integers $\mathbb{Z}[i] = \{ a + bi : a, b \in \mathbb{Z} \}$ are a unique factorization domain and have a rich theory of primes. Sometimes this theory is very similar to the standard theory of primes over $\mathbb{Z}$. But there are challenges.

One significant challenge is that $\mathbb{C}$ is not ordered. A related challenge is that there are more *units*. Over $\mathbb{Z}$, both $2$ and $-2$ are primes, but we typically recognize $2$ as being more “simple”. For Gaussian primes, there isn’t such a choice; for example each of $1 + i, 1 – i, -1 + i, -1 – i$ are Gaussian primes, but none are more simple or fundamental than the others.

More concretely, one has to be careful even with how to define the “sum of the first $n$ primes”. One natural thought might be to sum all Gaussian primes $\pi$ that have norm up to $X$. But one can quickly see that this sum is $0$ for analogous reasons to why the sum of all the typical primes with absolute values up to $X$ must vanish ($p + -p = 0$). In the Gaussian case, it is also true that $$ \sum_{N(\pi) \leq X} \pi^2 = 0.$$

But they considered higher powers, where there aren’t trivial or obvious reasons for massive cancellation, and they showed that there is *always* nontrivial cancellation. This is interesting on its own!

Then they also constructed a mixture, a Cramér-type model for Gaussian primes and showed that one should expect nontrivial cancellation there for purely distributional reasons.

I leave the details to their write-up. But they’ve done great work, and I look forward to seeing what they come up with in the future.

]]>I’ll also note a few open problems that I don’t know how to handle and that I briefly describe during the talk.

- Is it possible to show that every (symmetrized) Dirichlet series associated to a half-integral weight modular form must have zeros off the critical line? This is true in practice, but seems hard to show.
- Is it possible to determine whether a given Dirichlet series has zeros in the half-plane of absolute convergence? If there is one zero, there are infinitely many – but is there a way of determining if there are any?
- Why does there seem to be a gap around the critical line in zero distribution?
- Can one explain why the pair correlation seems well-behaved (even heuristically)?

If you have any ideas, let me know!

]]>We are now working in a few different directions, involving informational visualizations of different forms and different types of forms, as well as purely artistic visualizations.

The slides for this talk can be found here.

I’ve recently been very fond of including renderings based on a picture of my wife and I in Iceland (from the beforetimes). This is us as a wallpaper (preserving many of the symmetries) for a particular modular form.

I reused a few images from Painted Modular Terrains, which I made a few months ago.

If you’re interested, you might also like a few previous talks and papers of mine:

- Slides from a talk on Visualizing Modular Forms
- Slides from a talk on computing Maass forms
- Notes behind a talk: visualizing modular forms
- Trace form 3.32.a.a
- phase_mag_plot: a sage package for plotting complex functions
- A paper: Visualizing modular forms
- A paper: Computing classical modular forms
- Bridges paper: Towards flying through modular forms

George has also written about this paper on his site.

This project began at an AIM workshop on Fourier analysis, arithmetic statistics, and discrete restriction.

Our guiding question was very open. For some *nice* local polynomial conditions, can we make sense of the Fourier transforms of these local conditions well enough to have arithmetic application?

This is partly inspired from *Orbital exponential sums for prehomogeneous vector spaces* by Takashi Taniguchi and Frank Thorne (preprint available on the arXiv). In this paper, Frank and Takashi algebraically compute Fourier transforms of a couple arithmetically interesting functions on prehomogeneous vector spaces over finite fields. It turns out that one can, for example, explicitly and completely compute the Fourier transform of the characteristic function of singular binary cubic forms over $\mathbb{F}_{q}$.

In a companion paper, Takashi and Frank combine those computations with sieves to prove that there are $\gg X / \log X$ cubic fields whose discriminant is squarefree, bounded above by $X$, and has at most $3$ prime factors. They also show there are $\gg X / \log X$ quartic fields whose discriminant is squarefree, bounded above by $X$, and has at most $8$ prime factors.

We have two classes of result. Both rely on similar types of analysis, and are each centered on a study of a particular indicator-type function, its Fourier transform, and a sieve.

First, we prove a bound on the number of polynomials whose Galois group is a subgroup of $A_n$. For $H > 1$, define \begin{equation*} V_n(H) = \{ f \in \mathbb{Z}[x] : \mathrm{ht}(f) \leq H \} \end{equation*} and \begin{equation*} E_n(H, A_n) := \# \{ f \in V_n(H) : \mathrm{Gal}(f) \subseteq A_n \}. \end{equation*} We show that \begin{equation} E_n(H, A_n) \ll H^{n – \frac{2}{3} + O(1/n)}. \end{equation} This is an improvement on progress towards a conjecture of Van der Waerden and is a quantitative form of Hilbert’s Irreducibility Theorem, which shows (among other applications) that most monic irreducibile polynomials have full Galois group.

However I should note that Bhargava has announced a proof of a (slightly weakened form of) Van der Waerden’s conjecture, and his result is strictly stronger than our result.

Secondly, we prove that for any $n \geq 3$ and $r \geq 2n – 3$, we have \begin{equation} \# \{ f \in \mathbb{Z}[x] : \mathrm{ht}(f) \leq H, f \, \text{monic }, \omega(\mathrm{Disc}(f)) \leq r \} \gg_{n, r} \frac{H^n}{\log H}, \end{equation} where $\omega(\cdot)$ denotes the number of distinct prime divisors. Qualitatively, this says that there are lots of polynomials with almost prime discriminants.

As a corollary of this second result, we prove that for $n \geq 3$ and $r \geq 2n – 3$, \begin{equation} \# \{ F / \mathbb{Q} : [F \colon Q] = n, \mathrm{Disc}(F) \leq X, \omega(\mathrm{Disc}(F)) \leq r \} \gg_{n, r, \epsilon} X^{\frac{1}{2} + \delta_n – \epsilon} \end{equation} for explicit $\delta_n > 0$ and any $\epsilon > 0$. This shows that there are at least $X^{1/2}$ cubic fields whose discriminants are divisible by at most $3$ primes, or at least $X^{1/2}$ quartic fields whose discriminants are divisible by at most $5$ primes, for example. We guarantee fewer fields than Taniguchi and Thorne, but we guarantee fields with fewer prime factors and cover all degrees.

In the remainder of this post, I’ll describe a line of thinking that went towards proving our first result.

We initially studied the Fourier transform of the *odd-polynomial* indicator function. We call a function $f(x) \in \mathbb{F}_p[x]$ *odd* if it has no repeated roots and the factorization type of $f$ corresponds to an odd permutation in the Galois group. That is, we can write $f$ as \begin{equation*} f(x) = f_1(x) f_2(x) \cdots f_r(x) \bmod p, \end{equation*} and there will be an element of the Galois group with cycle type $(\deg f_1) (\deg f_2) \cdots (\deg f_r)$. For *odd* $f$, this cycle must be an odd permutation.

A more convenient description of *oddness* is in terms of the Möbius function on $\mathbb{F}_p[x]$. A degree $n$ polynomial $f$ is odd precisely if $\mu_p(f) = (-1)^{n+1}$. Define $1^p_{sf}(f)$ to be the squarefree indicator function on $\mathbb{F}_p[x]$, and define $1^p_{odd, n}$ to be the odd indicator function on degree $n$ polynomials on $\mathbb{F}_p[x]$. Then \begin{equation*} 1^p_{odd, n}(f) = 1^p_n(f)\frac{(-1)^{n+1}\mu_p(f) + 1^p_{sf}(f)}{2}. \end{equation*} (Here, $1^p_n(f)$ keeps only the degree $n$ polynomials).

We then studied the Fourier transform of $1^p_{odd, n}$. Identifying the vector space of polynomials of degree at most $n$ over $\mathbb{F}_p[x]$, which we denote at $V_n(\mathbb{Z}/p\mathbb{Z})$, as $(\mathbb{Z}/p\mathbb{Z})^{n+1}$, we can study the Fourier transform of a function $\psi:V_n(\mathbb{Z}/p\mathbb{Z}) \longrightarrow \mathbb{C}$, \begin{equation*} \widehat{\psi}(\mathbf{u}) = \frac{1}{p^{n+1}} \sum_{f \in V_n(\mathbb{Z}/p\mathbb{Z})} \psi(f) e_p(\langle f, \mathbf{u} \rangle). \end{equation*} Here, $e_p(x) = e^{2 \pi i x / p}$.

It is possible to understand this Fourier transform using ideas similar to those of Takashi and Thorne. $\mathrm{GL}(2)$ acts on these polynomials in a similar way as it acts on quadratic forms, *and* $1^p_{odd, n}$ is invariant under this action. As in Takashi and Thorne, one can study the sizes of the Fourier transform on each orbit. This leads to several classical polynomial counting problems.

But unlike the prehomogeneous vector space context of Takashi and Thorne, we can’t *completely* determine the Fourier transform. For general degree, there are too many other terms.

Ultimately, we intend to use the knowledge of this Fourier transform as an ingredient in a sieve. An old theorem of Dedekind shows that if $\mathrm{Gal}(f) \subseteq A_n$, then $f$ is never *odd* mod any prime $p$.

We could use a Selberg sieve in the following form. For a nonnegative weight function $\phi: V_n(\mathbb{R}) \longrightarrow \mathbb{R}$ (roughly supported on the box $[-1, 1]^{n+1}$). Then consider \begin{equation}\label{eq:basic_sieve} \sum_{f \in V_n(\mathbb{Z})} \phi(f/H) \Big(\sum_{d: f \bmod p \text{ is odd } \forall p \mid d} \lambda_d \Big)^2 \geq 0 \end{equation} for some real weights $\lambda_d$ to be chosen later, but where $\lambda_1 = 1$.

For $f$ with $\mathrm{Gal}(f) \subseteq A_n$, $f$ is never odd. Thus the sum of weights $\lambda_d$ is exactly $\lambda_1 = 1$ for those $f$, and we get that \eqref{eq:basic_sieve} is bounded below by \begin{equation}\label{eq:basic_sieve_LHS} \sum_{\substack{f \in V_n(\mathbb{Z}) \\\\ \mathrm{Gal}(f) \subseteq A_n}} \phi(f/H). \end{equation} On the other hand, \eqref{eq:basic_sieve} is equal to \begin{equation}\label{eq:basic_sieve_RHS} \sum_{d_1, d_2} \lambda_{d_1} \lambda_{d_2} \sum_{f \in V_n(\mathbb{Z})} \phi(f / H) \prod_{p \mid [d_1, d_2]} 1^p_{odd, n}(f). \end{equation} Thus we have that \eqref{eq:basic_sieve_LHS} $\leq$ \eqref{eq:basic_sieve_RHS}. To bound \eqref{eq:basic_sieve_RHS}, we use Poisson summation to transform the sum of $\phi 1^p_{odd, n}$ into a dualized sum of $\widehat{\phi} \widehat{1}^p_{odd, n}$ and use our understanding of the Fourier transform $1^p_{odd, n}$ to (try to) get good bounds. Then one plays a game of optimizing over the weights $\lambda_d$.

There is a major problem with this approach. As we’re unable to completely determine the Fourier transform, it’s necessary to determine where it’s large and small and to handle the regions where it’s large well. Let’s look again at the expression \begin{equation*} 1^p_{odd, n}(f) = 1^p_n(f)\frac{(-1)^{n+1}\mu_p(f) + 1^p_{sf}(f)}{2}. \end{equation*} The Fourier transform of $\mu_p$ is expected to behave very well away from $0$. But the Fourier transform of $1^p_{sf}$ can be shown to have large Fourier coefficients away from $0$, strongly affecting the resulting bounds.

Instead of studying the indicator function $1^p_{odd, n}$, we chose to study a sort of *graded* indicator function \begin{equation*} \psi_p(f) = \frac{(-1)^{n+1}1^p_n(f)\mu_p(f) + 1}{2}. \end{equation*} This is $1$ if $f$ is odd and squarefree, $0$ if $f$ is squarefree and even, and $1/2$ if $f$ is not squarefree.

On the Fourier transform side, we completely understand the Fourier transform of $1$ and we can hope to have good understanding of the Möbius function. So we should expect much better bounds.

But on the other side, this is not as clean of an indicator function as $1^p_{odd, n}$. In comparison to the basic sieve inequality \eqref{eq:basic_sieve_LHS} $\leq$ \eqref{eq:basic_sieve_RHS}, the product of indicator functions on the right hand side now becomes much messier, and the basic setup no longer applies.

Instead, in \eqref{eq:basic_sieve}, we replace $\big( \sum \lambda_d \big)^2$ by a positive semidefinite quadratic form in $\lambda_{d_1}, \lambda_{d_2}$ to get a modified Selberg sieve inequality similar to \eqref{eq:basic_sieve_LHS} $\leq$ \eqref{eq:basic_sieve_RHS}. The tail of the argument remains largely the same. Instead of bounding \eqref{eq:basic_sieve_RHS}, we bound

\begin{equation*} \sum_{d_1, d_2} \lambda_{d_1} \lambda_{d_2} \sum_{f \in V_n(\mathbb{Z})} \phi(f / H) \prod_{p \mid [d_1, d_2]} \psi_p(f). \end{equation*}

After Poisson summation, the goal becomes controlling $\widehat{\psi_p}(f)$, which essentially boils down to understanding $\widehat{\mu_p}(f)$.

In explicit coordinates, this is the task of understanding \begin{equation*} \widehat{\mu_p}(u_0, \ldots, u_n) = \frac{1}{p^{n+1}} \sum_{t_i \in \mathbb{F}_p} \mu_p(t_n x^n + \cdots + t_0) e_p(u_n t_n + \cdots + u_0 t_0). \end{equation*} This is a $\mathbb{F}_p[x]$-analogue of the classical question of bounding \begin{equation*} \sum_{n \leq x} \mu(n) e(n\theta) \end{equation*} for some real $\theta$. Baker and Harman have proved that GRH implies that\begin{equation*} \Big \lvert \sum_{n \leq x} \mu(n) e(n\theta) \Big \rvert \ll x^{\frac{3}{4} + \epsilon}, \end{equation*} and Porritt has proved the analogous result holds over function fields (where RH is known).

Applying this bound in our modified form of the Selberg sieve is what allows us to prove our first theorem.

]]>I’ve worked with modular forms for almost 10 years now, but I’ve only known what a modular form looks like for about 2 years. In this talk, I explored visual representations of modular forms, with lots of examples.

The slides are available here.

I’ll share one visualization here that I liked a lot: a visualization of a particular Maass form on $\mathrm{SL}(2, \mathbb{Z})$.

]]>`3.32.a.a`

.
The space of weight $32$ modular forms on $\Gamma_0(3)$ with trivial central character is an $11$-dimensional vector space. The subspace of newforms is a $5$-dimensional vector space.

These newforms break down into two groups: the two embeddings of an abstract newform whose coefficients lie in a quadratic field, and the three embeddings of an abstract newform whose coefficients lie in a cubic field. The label `3.32.a.a`

is a label for the two newforms with coefficients in a quadratic field.

These images are for the trace form, made by summing the two conjugate newforms in `3.32.a.a`

. This trace form is a newform of weight $32$ on $\Gamma_1(3)$.

Each modular form is naturally defined on the upper half-plane. In these images, the upper half-plane has been mapped to the unit disk. This mapping is uniquely specified by the following pieces of information: the real line $y = 0$ in the plane is mapped to the boundary of the disk, and the three points $(0, i, \infty)$ map to the (bottom, center, top) of the disk.

This is a relatively high weight modular form, meaning that magnitudes can change very quickly. In the contoured image, each contour indicates a multiplicative change in elevation: points on one contour are $32$ times larger or smaller than points on adjacent contours.

I have a bit more about this and related visualizations on my visualization site.

]]>This talk is a description of ongoing explicit computational experimentation with Mehmet Kiral, Tom Hulse, and Li-Mei Lim on various aspects of half-integral weight modular forms and their Dirichlet series.

These Dirichlet series behave like typical beautiful automorphic L-functions in many ways, but are very different in other ways.

The first third of the talk is largely about the “typical” story. The general definitions are abstractions designed around the objects that number theorists have been playing with, and we also briefly touch on some of these examples to have an image in mind.

The second third is mostly about how half-integral weight Dirichlet series aren’t quite as well-behaved as L-functions associated to GL(2) automorphic forms, but sufficiently well-behaved to be comprehendable. Unlike the case of a full-integral weight modular form, there isn’t a canonical choice of “nice” forms to study, but we identify a particular set of forms with symmetric functional equations to study. There are several small details that can be considered here, and I largely ignore them for this talk. This is something that I hope to return to in the future.

In the final third of the talk, we examine the behavior and zeros of a handful of half-integral weight Dirichlet series. There are plots of zeros, including a plot of approximately the first 150k zeros of one particular form. These are also interesting, and I intend to investigate and describe these more on this site later.

]]>In this note, I describe an aspect of this paper that I found surprising. In fact, I’ve found it continually surprising, as I’ve reproven it to myself three times now, I think. By writing this here and in my note system, I hope to perhaps remember this better.

In this paper, we revisit an application of “Landau’s Method” to estimate partial sums of coefficients of Dirichlet series. We model this paper off of an earlier application by Chandrasakharan and Narasimhan, except that we explicitly track dependence of the several implicit constants and we prove these results uniformly for all partial sums, as opposed to sufficiently large partial sums.

The only structure is that we have a Dirichlet series $\phi(s)$, some Gamma factors $\Delta(s)$, and a functional equation of the shape $$ \phi(s) \Delta(s) = \psi(s) \Delta(1-s). $$ This is relatively structureless, and correspondingly our attack is very general. We use some smoothed approximation to the sum of coefficients, shift lines of integration to pick up polar main terms, apply the functional equation and change variables so work with the dual, and then get some collection of error terms and error integrals.

It happens to be that it’s much easier to work with a $k$-Riesz smoothed approximation. That is, if $$

\phi(s) = \sum_{n \geq 1} \frac{a(n)}{\lambda_n^s}

$$ is our Dirichlet series, and we are interested in the partial sums $$

A_0(s) = \sum_{\lambda_n \leq X} a(n),

$$ then it happens to be easier to work with the smoothed approximations $$

A_k(X) = \frac{1}{\Gamma(k+1)}\sum_{\lambda_n \leq X} a(n) (X – \lambda_n)^k a(n),

$$ and to somehow combine several of these smoothed sums together.

This smoothed sum is recognizable as $$

A_k(X) =

\frac{1}{2\pi i}\int_{c – i\infty}^{c + i\infty} \phi(s)

\frac{\Gamma(s)}{\Gamma(s + k + 1)} X^{s + k}ds

$$ for $c$ somewhere in the half-plane of convergence of the Dirichlet series. As $k$ gets large, these integrals become better behaved. In application, one takes $k$ sufficiently large to guarantee desired convergence properties.

The process of taking several of these smoothed approximations for large $k$ together, studying them through basic functional equation methods, and combinatorially combining these smoothed approximations via finite differencing to get good estimates for the sharp sum $A_0(s)$ is roughly what I think of as “Landau’s Method”.

In our paper, as we apply Landau’s method, it becomes necessary to understand certain bounds coming from the dual Dirichlet series $$

\psi(s) = \sum_{n \geq 1} \frac{b(n)}{\mu_n^s}.

$$ Specifically, it works out that the (combinatorially finite differenced) between the $k$-smoothed sum $A_k(X)$ and its $k$-smoothed main term $S_k(X)$ can be written as $$

\Delta_y^k [A_k(X) – S_k(X)] = \sum_{n \geq 1}

\frac{b(n)}{\mu_n^{\delta + k}} \Delta_y^k I_k(\mu_n X),\tag{1}

$$ where $\Delta_y^k$ is a *finite differencing operator* that we should think of as a sum of several shifts of its input function.

More precisely, $\Delta_y F(X) := F(X + y) – F(X)$, and iterating gives $$

\Delta_y^k F(X) = \sum_{j = 0}^k (-1)^{k – j} {k \choose j} F(X + jy).

$$ The $I_k(\cdot)$ term on the right of $(1)$ is an inverse Mellin transform $$

I_k(t) = \frac{1}{2 \pi i} \int_{c – i\infty}^{c + i\infty}

\frac{\Gamma(\delta – s)}{\Gamma(k + 1 + \delta – s)}

\frac{\Delta(s)}{\Delta(\delta – s)} t^{\delta + k – s} ds.

$$ Good control for this inverse Mellin transform yields good control of the error for the overall approximation. Via the method of finite differencing, there are two basic choices: either bound $I_k(t)$ directly, or understand bounds for $(\mu_n y)^k I_k^{(k)}(t)$ for $t \approx \mu_n X$. Here, $I_k^{(k)}(t)$ means the $k$th derivative of $I_k(t)$.

In the classical application (as in the paper of CN), one worries about this asymptotic mostly as $t \to \infty$. In this region, $I_k(t)$ can be well-approximated by a $J$-Bessel function, which is sufficiently well understood in large argument to give good bounds. Similarly, $I_k^{(k)}(t)$ can be contour-shifted in a way that still ends up being well-approximated by $J$-Bessel functions.

The shape of the resulting bounds end up being that $\Delta_y^k I_k(\mu_n X)$ is bounded by either

- $(\mu_n X)^{\alpha + k(1 – \frac{1}{2A})}$, where $A$ is a fixed parameter that isn’t worth describing fully, and $\alpha$ is a bound coming from the direct bound of $I_k(t)$, or
- $(\mu_n y)^k (\mu_n X)^\beta$, where $\beta$ is a bound coming from bounding $I_k^{(k)}(t)$.

In both, there is a certain $k$-dependence that comes from the $k$-th Riesz smoothing factors, either directly (from $(\mu_n y)^k$), or via its corresponding inverse Mellin transform (in the bound from $I_k(t)$). But these are the only aspects that depend on $k$.

At this point in the classical argument, one determines when one bound is better than the other, and this happens to be something that can be done exactly, and (surprisingly) independently of $k$. Using this pair of bounds and examining what comes out the other side gives the original result.

In our application, we also worry about asymptotic as $t \to 0$. While it may still be true that $I_k$ can be approximated by a $J$-Bessel function, the “well-known” asymptotics for the $J$-Bessel function behave substantially worse for small argument. Thus different methods are necessary.

It turns out that $I_k$ can be approximated in a relatively trivial way for $t \leq 1$, so the only remaining hurdle is $I_k^{(k)}(t)$ as $t \to 0$.

We’ve proved a variety of different bounds that hold in slightly different circumstances. And for each sort of bound, the next steps would be the same as before: determine when each bound is better, bound by absolute values, sum together, and then choose the various parameters to best shape the final result.

But unlike before, the boundary between the regions where $I_k$ is best bounded directly or bounded via $I_k^{(k)}$ depends on $k$. Aside from choosing $k$ sufficiently large for convergence properties (which relate to the locations of poles and growth properties of the Dirichlet series and gamma factors), any sufficiently large $k$ would suffice.

After I step away from this paper and argument for a while and come back, I wonder about the right way to choose the balancing error. That is, I rework when to use bounds coming from studying $I_k(t)$ directly vs bounds coming from studying $I_k^{(k)}(t)$.

But it turns out that there is always a reasonable heuristic choice. Further, this heuristic gives the same choice of balancing as in the case when $t \to \infty$ (although this is not the source of the heuristic).

Making these bounds will still give bounds for $\Delta_y^k I_k(\mu_n X)$ of shape

- $(\mu_n X)^{\alpha + k(1 – \frac{1}{2A})}$, where $A$ is a fixed parameter that isn’t worth describing fully, and $\alpha$ is a bound coming from the direct bound of $I_k(t)$, or
- $(\mu_n y)^k (\mu_n X)^\beta$, where $\beta$ is a bound coming from bounding $I_k^{(k)}(t)$.

The actual bounds for $\alpha$ and $\beta$ will differ between the case of small $\mu_n X$ and large $\mu_n X$ ($J$-Bessel asymptotics for large, different contour shifting analysis for small), but in both cases it turns out that $\alpha$ and $\beta$ are independent of $k$.

This is relatively easy to see when bounding $I_k^{(k)}(t)$, as repeatedly differentiating under the integral shows essentially that $$

I_k^{(k)}(t) =

\frac{1}{2\pi i}

\int \frac{\Delta(s)}{(\delta – s)\Delta(\delta – s)}

t^{\delta – s} ds.

$$ (I’ll note that the contour does vary with $k$ in a certain way that doesn’t affect the shape of the result for $t \to 0$).

When balancing the error terms $(\mu_n X)^{\alpha + k(1 – \frac{1}{2A})}$ and $(\mu_n y)^k (\mu_n X)^\beta$, the heuristic comes from taking arbitrarily large $k$. As $k \to \infty$, the point where the two error terms balance is independent of $\alpha$ and $\beta$.

This reasoning applies to the case when $\mu_n X \to \infty$ as well, and gives the same point. Coincidentally, the actual $\alpha$ and $\beta$ values we proved for $\mu_n X \to \infty$ perfectly cancel in practice, so this limiting argument is not necessary — but it does still apply!

I suppose it might be possible to add another parameter to tune in the final result — a parameter measuring deviation from the heuristic, that can be refined for any particular error bound in a region of particular interest.

But we haven’t done that.

In fact, we were slightly lossy in how we bounded $I_k^{(k)}(t)$ as $t \to 0$, and (for complicated reasons that I’ll probably also forget and reprove to myself later) the heuristic choice assuming $k \sim \infty$ and our slighly lossy bound introduce the same order of imprecision to the final result.

We’re updating our preprint and will have that up soon. But as I’ve been thinking about this a lot recently, I realize there are a few other things I should note down. I intend to write more on this in the short future.

]]>