## A balancing act in “Uniform bounds for lattice point counting”

I was recently examining a technical hurdle in my project on “Uniform bounds for lattice point counting and partial sums of zeta functions” with Takashi Taniguchi and Frank Thorne. There is a version on the arxiv, but it currently has a mistake in its handling of bounds for small $X$.

In this note, I describe an aspect of this paper that I found surprising. In fact, I’ve found it continually surprising, as I’ve reproven it to myself three times now, I think. By writing this here and in my note system, I hope to perhaps remember this better.

## Landau’s Method

In this paper, we revisit an application of “Landau’s Method” to estimate partial sums of coefficients of Dirichlet series. We model this paper off of an earlier application by Chandrasakharan and Narasimhan, except that we explicitly track dependence of the several implicit constants and we prove these results uniformly for all partial sums, as opposed to sufficiently large partial sums.

The only structure is that we have a Dirichlet series $\phi(s)$, some Gamma factors $\Delta(s)$, and a functional equation of the shape $$ \phi(s) \Delta(s) = \psi(s) \Delta(1-s). $$ This is relatively structureless, and correspondingly our attack is very general. We use some smoothed approximation to the sum of coefficients, shift lines of integration to pick up polar main terms, apply the functional equation and change variables so work with the dual, and then get some collection of error terms and error integrals.

It happens to be that it’s much easier to work with a $k$-Riesz smoothed approximation. That is, if $$

\phi(s) = \sum_{n \geq 1} \frac{a(n)}{\lambda_n^s}

$$ is our Dirichlet series, and we are interested in the partial sums $$

A_0(s) = \sum_{\lambda_n \leq X} a(n),

$$ then it happens to be easier to work with the smoothed approximations $$

A_k(X) = \frac{1}{\Gamma(k+1)}\sum_{\lambda_n \leq X} a(n) (X – \lambda_n)^k a(n),

$$ and to somehow combine several of these smoothed sums together.

This smoothed sum is recognizable as $$

A_k(X) =

\frac{1}{2\pi i}\int_{c – i\infty}^{c + i\infty} \phi(s)

\frac{\Gamma(s)}{\Gamma(s + k + 1)} X^{s + k}ds

$$ for $c$ somewhere in the half-plane of convergence of the Dirichlet series. As $k$ gets large, these integrals become better behaved. In application, one takes $k$ sufficiently large to guarantee desired convergence properties.

The process of taking several of these smoothed approximations for large $k$ together, studying them through basic functional equation methods, and combinatorially combining these smoothed approximations via finite differencing to get good estimates for the sharp sum $A_0(s)$ is roughly what I think of as “Landau’s Method”.

## Application and shape of the error

In our paper, as we apply Landau’s method, it becomes necessary to understand certain bounds coming from the dual Dirichlet series $$

\psi(s) = \sum_{n \geq 1} \frac{b(n)}{\mu_n^s}.

$$ Specifically, it works out that the (combinatorially finite differenced) between the $k$-smoothed sum $A_k(X)$ and its $k$-smoothed main term $S_k(X)$ can be written as $$

\Delta_y^k [A_k(X) – S_k(X)] = \sum_{n \geq 1}

\frac{b(n)}{\mu_n^{\delta + k}} \Delta_y^k I_k(\mu_n X),\tag{1}

$$ where $\Delta_y^k$ is a *finite differencing operator* that we should think of as a sum of several shifts of its input function.

More precisely, $\Delta_y F(X) := F(X + y) – F(X)$, and iterating gives $$

\Delta_y^k F(X) = \sum_{j = 0}^k (-1)^{k – j} {k \choose j} F(X + jy).

$$ The $I_k(\cdot)$ term on the right of $(1)$ is an inverse Mellin transform $$

I_k(t) = \frac{1}{2 \pi i} \int_{c – i\infty}^{c + i\infty}

\frac{\Gamma(\delta – s)}{\Gamma(k + 1 + \delta – s)}

\frac{\Delta(s)}{\Delta(\delta – s)} t^{\delta + k – s} ds.

$$ Good control for this inverse Mellin transform yields good control of the error for the overall approximation. Via the method of finite differencing, there are two basic choices: either bound $I_k(t)$ directly, or understand bounds for $(\mu_n y)^k I_k^{(k)}(t)$ for $t \approx \mu_n X$. Here, $I_k^{(k)}(t)$ means the $k$th derivative of $I_k(t)$.

## Large input errors

In the classical application (as in the paper of CN), one worries about this asymptotic mostly as $t \to \infty$. In this region, $I_k(t)$ can be well-approximated by a $J$-Bessel function, which is sufficiently well understood in large argument to give good bounds. Similarly, $I_k^{(k)}(t)$ can be contour-shifted in a way that still ends up being well-approximated by $J$-Bessel functions.

The shape of the resulting bounds end up being that $\Delta_y^k I_k(\mu_n X)$ is bounded by either

- $(\mu_n X)^{\alpha + k(1 – \frac{1}{2A})}$, where $A$ is a fixed parameter that isn’t worth describing fully, and $\alpha$ is a bound coming from the direct bound of $I_k(t)$, or
- $(\mu_n y)^k (\mu_n X)^\beta$, where $\beta$ is a bound coming from bounding $I_k^{(k)}(t)$.

In both, there is a certain $k$-dependence that comes from the $k$-th Riesz smoothing factors, either directly (from $(\mu_n y)^k$), or via its corresponding inverse Mellin transform (in the bound from $I_k(t)$). But these are the only aspects that depend on $k$.

At this point in the classical argument, one determines when one bound is better than the other, and this happens to be something that can be done exactly, and (surprisingly) independently of $k$. Using this pair of bounds and examining what comes out the other side gives the original result.

## Small input errors

In our application, we also worry about asymptotic as $t \to 0$. While it may still be true that $I_k$ can be approximated by a $J$-Bessel function, the “well-known” asymptotics for the $J$-Bessel function behave substantially worse for small argument. Thus different methods are necessary.

It turns out that $I_k$ can be approximated in a relatively trivial way for $t \leq 1$, so the only remaining hurdle is $I_k^{(k)}(t)$ as $t \to 0$.

We’ve proved a variety of different bounds that hold in slightly different circumstances. And for each sort of bound, the next steps would be the same as before: determine when each bound is better, bound by absolute values, sum together, and then choose the various parameters to best shape the final result.

But unlike before, the boundary between the regions where $I_k$ is best bounded directly or bounded via $I_k^{(k)}$ depends on $k$. Aside from choosing $k$ sufficiently large for convergence properties (which relate to the locations of poles and growth properties of the Dirichlet series and gamma factors), any sufficiently large $k$ would suffice.

## Limiting behavior gives a heuristic region

After I step away from this paper and argument for a while and come back, I wonder about the right way to choose the balancing error. That is, I rework when to use bounds coming from studying $I_k(t)$ directly vs bounds coming from studying $I_k^{(k)}(t)$.

But it turns out that there is always a reasonable heuristic choice. Further, this heuristic gives the same choice of balancing as in the case when $t \to \infty$ (although this is not the source of the heuristic).

Making these bounds will still give bounds for $\Delta_y^k I_k(\mu_n X)$ of shape

- $(\mu_n X)^{\alpha + k(1 – \frac{1}{2A})}$, where $A$ is a fixed parameter that isn’t worth describing fully, and $\alpha$ is a bound coming from the direct bound of $I_k(t)$, or
- $(\mu_n y)^k (\mu_n X)^\beta$, where $\beta$ is a bound coming from bounding $I_k^{(k)}(t)$.

The actual bounds for $\alpha$ and $\beta$ will differ between the case of small $\mu_n X$ and large $\mu_n X$ ($J$-Bessel asymptotics for large, different contour shifting analysis for small), but in both cases it turns out that $\alpha$ and $\beta$ are independent of $k$.

This is relatively easy to see when bounding $I_k^{(k)}(t)$, as repeatedly differentiating under the integral shows essentially that $$

I_k^{(k)}(t) =

\frac{1}{2\pi i}

\int \frac{\Delta(s)}{(\delta – s)\Delta(\delta – s)}

t^{\delta – s} ds.

$$ (I’ll note that the contour does vary with $k$ in a certain way that doesn’t affect the shape of the result for $t \to 0$).

When balancing the error terms $(\mu_n X)^{\alpha + k(1 – \frac{1}{2A})}$ and $(\mu_n y)^k (\mu_n X)^\beta$, the heuristic comes from taking arbitrarily large $k$. As $k \to \infty$, the point where the two error terms balance is independent of $\alpha$ and $\beta$.

This reasoning applies to the case when $\mu_n X \to \infty$ as well, and gives the same point. Coincidentally, the actual $\alpha$ and $\beta$ values we proved for $\mu_n X \to \infty$ perfectly cancel in practice, so this limiting argument is not necessary — but it does still apply!

I suppose it might be possible to add another parameter to tune in the final result — a parameter measuring deviation from the heuristic, that can be refined for any particular error bound in a region of particular interest.

But we haven’t done that.

In fact, we were slightly lossy in how we bounded $I_k^{(k)}(t)$ as $t \to 0$, and (for complicated reasons that I’ll probably also forget and reprove to myself later) the heuristic choice assuming $k \sim \infty$ and our slighly lossy bound introduce the same order of imprecision to the final result.

## More coming soon

We’re updating our preprint and will have that up soon. But as I’ve been thinking about this a lot recently, I realize there are a few other things I should note down. I intend to write more on this in the short future.