You are currently browsing the tag archive for the ‘Siegel zero’ tag.
The Erdös problem site was created last year, and announced earlier this year on this blog. Every so often, I have taken a look at a random problem from the site for fun. A few times, I was able to make progress on one of the problems, leading to a couple papers; but the more common outcome is that I play around with the problem for a while, see why the problem is difficult, and then eventually give up and do something else. But, as is common in this field, I don’t make public the observations that I made, and the next person who looks at the same problem would likely have to go through the same process of trial and error to work out what the main obstructions that are present are.
So, as an experiment, I thought I would record here my preliminary observations on one such problem – Erdös problem #385 – to discuss why it looks difficult to solve with our current understanding of the primes. Here is the problem:
Problem 1 (Erdös Problem #385) Letwhere
is the least prime divisor of
. Is it true that
for all sufficiently large
? Does
as
?
This problem is mentioned on page 73 of this 1979 paper of Erdös (where he attributes the problem to an unpublished work of Eggelton, Erdös, and Selfridge that, to my knowledge, has never actually appeared), as well as briefly in page 92 of this 1980 paper of Erdös and Graham.
At first glance, this looks like a somewhat arbitrary problem (as many of Erdös’s problems initially do), as the function is not obviously related to any other well-known function or problem. However, it turns out that this problem is closely related to the parity barrier in sieve theory (as discussed in this previous post), with the possibility of Siegel zeroes presenting a particular obstruction. I suspect that Erdös was well aware of this connection; certainly he mentions the relation with questions on gaps between primes (or almost primes), which is in turn connected to the parity problem and Siegel zeroes (as is discussed recently in my paper with Banks and Ford, and in more depth in these papers of Ford and of Granville).
Let us now explore the problem further. Let us call a natural number bad if
, so the first part of the problem is asking whether there exist bad numbers that are sufficiently large. We unpack the definitions:
is bad if and only if
for any composite
, so placing
in intervals of the form
we are asking to show that
It is now natural to try to understand this problem for a specific choice of interval as a function of
. If
is large in the sense that
, then the claimed covering property is automatic, since every composite number less than or equal to
has a prime factor less than or equal to
. On the other hand, for
very small, in particular
, it is also possible to find
with this property. Indeed, if one takes
to lie in the residue class
, then we see that the residue classes cover all of
except for
, and from Linnik’s theorem we can ensure that
is prime. Thus, to rule out bad numbers, we need to understand the covering problem at intermediate scales
.
A key case is when for some
. Here, the residue classes
for
sieve out everything in
except for primes and semiprimes, and specifically the semiprimes that are product of two primes between
and
. If one can show for some
that the largest gap between semiprimes in say
with prime factors in
is
, then this would affirmatively answer the first part of this problem (and also the second). This is certainly very plausible – it would follow from a semiprime version of the Cramér conjecture (and this would also make the more precise prediction
) – but remains well out of reach for now. Even assuming the Riemann hypothesis, the best upper bound on prime gaps in
is
, and the best upper bound on semiprime gaps is not significantly better than this – in particular, one cannot reach
for any
. (There is a remote possibility that an extremely delicate analysis near
, together with additional strong conjectures on the zeta function, such as a sufficiently quantitative version of the GUE hypothesis, may barely be able to resolve this problem, but I am skeptical of this, absent some further major breakthrough in analytic number theory.)
Given that multiplicative number theory does not seem powerful enough (even on RH) to resolve these problems, the other main approach would be to use sieve theory. In this theory, we do not really know how to exploit the specific location of the interval or the specific congruence classes used, so one can study the more general problem of trying to cover an interval
of length
by one residue class mod
for each
, and only leaving a small number of survivors which could potentially be classified as “primes”. The discussion of the small
case already reveals a problem with this level of generality: one can sieve out the interval
by the residue classes
for
, and leave only one survivor,
. Indeed, thanks to known bounds on Jacobsthal’s function, one can be more efficient than this; for instance, using equation (1.2) from this paper of Ford, Green, Konyagin, Maynard, and myself, it is possible to completely sieve out any interval of sufficiently large length
using only those primes
up to
. On the other hand, from the work of Iwaniec, we know that sieving up to
is insufficient to completely sieve out such an interval; related to this, if one only sieves up to
for some
, the linear sieve (see e.g., Theorem 2 of this previous blog post) shows that one must have at least
survivors, where
can be given explicitly in the regime
by the formula
These lower bounds are not believed to be best possible. For instance, the Maier–Pomerance conjecture on Jacobsthal’s function would indicate that one needs to sieve out primes up to in order to completely sieve out an interval of length
, and it is also believed that sieving up to
should leave
survivors, although even these strong conjectures are not enough to positively resolve this problem, since we are permitted to sieve all the way up to
(and we are allowed to leave every prime number as a survivor, which in view of the Brun–Titchmarsh theorem could permit as many as
survivors).
Unfortunately, as discussed in this previous blog post, the parity problem blocks such improvements from taking place from most standard analytic number theory methods, in particular sieve theory. A particularly dangerous enemy arises from Siegel zeroes. This is discussed in detail in the papers of of Ford and of Granville mentioned previously, but an informal discussion is as follows. If there is a Siegel zero associated to the quadratic character of some conductor , this roughly speaking means that almost all primes
(in certain ranges) will be quadratic non-residues mod
. In particular, if one restricts attention to numbers
in a residue class
that is a quadratic residue, we then expect most numbers in this class to have an even number of prime factors, rather than an odd number.
This alters the effect of sieving in such residue classes. Consider for instance the classical sieve of Eratosthenes. If one sieves out for each prime
, the sieve of Eratosthenes tells us that the surviving elements of
are simply the primes between
and
, of which there are about
many. However, if one restricts attention to
for a quadratic residue class
(and taking
to be somewhat large compared to
), then by the preceding discussion, this eliminates most primes, and so now sieving out
should leave almost no survivors. Shifting this example by
and then dividing by
, one can end up with an example of an interval
of length
that can be sieved by residue classes
for each
in such a manner as to leave almost no survivors (in particular,
many). In the presence of a Siegel zero, it seems quite difficult to prevent this scenario from “infecting” the above problem, creating a bad scenario in which for all
, the residue classes
for
already eliminate almost all elements of
, leaving it mathematically possible for the remaining survivors to either be prime, or eliminated by the remaining residue classes
for
.
Because of this, I suspect that it will not be possible to resolve this Erdös problem without a major breakthrough on the parity problem that (at a bare minimum) is enough to exclude the possibility of Siegel zeroes existing. (But it is not clear at all that Siegel zeroes are be the only “enemy” here, so absent a major advance in “inverse sieve theory”, one cannot simply assume GRH to run away from this problem).
— 0.1. Addendum: heuristics for Siegel zero scenarios —
This post also provides a good opportunity to refine some heuristics I had previously proposed regarding Siegel zeroes and their impact on various problems in analytic number theory. In this previous blog post, I wrote
“The parity problem can also be sometimes be overcome when there is an exceptional Siegel zero … [this] suggests that to break the parity barrier, we may assume without loss of generality that there are no Siegel zeroes.”
On the other hand, it was pointed out in a more recent article of Granville that (as with the current situation), Siegel zeroes can sometimes serve to enforce the parity barrier, rather than overcome it, and responds to my previous statement with the comment “this claim needs to be treated with caution, since its truth depends on the context”.
I actually agree with Granville here, and I propose here a synthesis of the two situations. In the absence of a Siegel zero, standard heuristic models in analytic number theory (such as the ones discussed in this post) typically suggest that a given quantity of interest in number theory (e.g., the number of primes in a certain set) obey an asymptotic law of the form
However, the presence of a Siegel zero tends to “magnetize” the error term by pulling most of the fluctuations in a particular direction. In many situations, what this means is that one can obtain a refined asymptotic of the form
The implications of this refined asymptotic then depend rather crucially on how the Siegel correction term is aligned with the main term, and also whether it is of comparable order or lower order. In many situations (particularly those concerning “average case” problems, in which one wants to understand the behavior for typical choices of parameters), the Siegel correction term ends up being lower order, and so one ends up with the situation described in my initial blog post, where we are able to get the predicted asymptotic in the Siegel zero case. However, as pointed out by Granville, there are other situations (particularly those involving “worst case” problems, in which some key parameter can be chosen adversarially) in which the Siegel correction term can align to completely cancel (or to highly reinforce) the main term. In such cases, the Siegel zero becomes a very concrete manifestation of the parity barrier, rather than a means to avoid it. (There is a tiny chance that there may be some sort of “repulsion” phenomenon in which having no semiprimes in
for one value of
somehow generates semiprimes in
for another value of
, which would allow one to solve the problem without having to directly address the Siegel issue, but I don’t see how two such intervals could “communicate” in order to achieve such a repulsion effect.)
Joni Teräväinen and I have just uploaded to the arXiv our preprint “The Hardy–Littlewood–Chowla conjecture in the presence of a Siegel zero“. This paper is a development of the theme that certain conjectures in analytic number theory become easier if one makes the hypothesis that Siegel zeroes exist; this places one in a presumably “illusory” universe, since the widely believed Generalised Riemann Hypothesis (GRH) precludes the existence of such zeroes, yet this illusory universe seems remarkably self-consistent and notoriously impossible to eliminate from one’s analysis.
For the purposes of this paper, a Siegel zero is a zero of a Dirichlet
-function
corresponding to a primitive quadratic character
of some conductor
, which is close to
in the sense that
One of the early influential results in this area was the following result of Heath-Brown, which I previously blogged about here:
Theorem 1 (Hardy-Littlewood assuming Siegel zero) Letbe a fixed natural number. Suppose one has a Siegel zero
associated to some conductor
. Then we have
for all
, where
is the von Mangoldt function and
is the singular series
In particular, Heath-Brown showed that if there are infinitely many Siegel zeroes, then there are also infinitely many twin primes, with the correct asymptotic predicted by the Hardy-Littlewood prime tuple conjecture at infinitely many scales.
Very recently, Chinis established an analogous result for the Chowla conjecture (building upon earlier work of Germán and Katai):
Theorem 2 (Chowla assuming Siegel zero) Letbe distinct fixed natural numbers. Suppose one has a Siegel zero
associated to some conductor
. Then one has
in the range
, where
is the Liouville function.
In our paper we unify these results and also improve the quantitative estimates and range of :
Theorem 3 (Hardy-Littlewood-Chowla assuming Siegel zero) Letbe distinct fixed natural numbers with
. Suppose one has a Siegel zero
associated to some conductor
. Then one has
for
for any fixed
.
Our argument proceeds by a series of steps in which we replace and
by more complicated looking, but also more tractable, approximations, until the correlation is one that can be computed in a tedious but straightforward fashion by known techniques. More precisely, the steps are as follows:
- (i) Replace the Liouville function
with an approximant
, which is a completely multiplicative function that agrees with
at small primes and agrees with
at large primes.
- (ii) Replace the von Mangoldt function
with an approximant
, which is the Dirichlet convolution
multiplied by a Selberg sieve weight
to essentially restrict that convolution to almost primes.
- (iii) Replace
with a more complicated truncation
which has the structure of a “Type I sum”, and which agrees with
on numbers that have a “typical” factorization.
- (iv) Replace the approximant
with a more complicated approximant
which has the structure of a “Type I sum”.
- (v) Now that all terms in the correlation have been replaced with tractable Type I sums, use standard Euler product calculations and Fourier analysis, similar in spirit to the proof of the pseudorandomness of the Selberg sieve majorant for the primes in this paper of Ben Green and myself, to evaluate the correlation to high accuracy.
Steps (i), (ii) proceed mainly through estimates such as (1) and standard sieve theory bounds. Step (iii) is based primarily on estimates on the number of smooth numbers of a certain size.
The restriction in our main theorem is needed only to execute step (iv) of this step. Roughly speaking, the Siegel approximant
to
is a twisted, sieved version of the divisor function
, and the types of correlation one is faced with at the start of step (iv) are a more complicated version of the divisor correlation sum
Step (v) is a tedious but straightforward sieve theoretic computation, similar in many ways to the correlation estimates of Goldston and Yildirim used in their work on small gaps between primes (as discussed for instance here), and then also used by Ben Green and myself to locate arithmetic progressions in primes.
In a recent post I discussed how the Riemann zeta function can be locally approximated by a polynomial, in the sense that for randomly chosen
one has an approximation
where grows slowly with
, and
is a polynomial of degree
. Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of
should all lie on the unit circle, and one should then be able to write
as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix
, which we normalise as
Here is some quantity depending on
. We view
as a random element of
; in the limit
, the GUE hypothesis is equivalent to
becoming equidistributed with respect to Haar measure on
(also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view
as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).
Taking logarithmic derivatives of (2), we have
and hence on taking logarithmic derivatives of (1) in the variable we (heuristically) have
Morally speaking, we have
so on comparing coefficients we expect to interpret the moments of
as a finite Dirichlet series:
To understand the distribution of in the unitary group
, it suffices to understand the distribution of the moments
where denotes averaging over
, and
. The GUE hypothesis asserts that in the limit
, these moments converge to their CUE counterparts
where is now drawn uniformly in
with respect to the CUE ensemble, and
denotes expectation with respect to that measure.
The moment (6) vanishes unless one has the homogeneity condition
This follows from the fact that for any phase ,
has the same distribution as
, where we use the number theory notation
.
In the case when the degree is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:
Proposition 1 (Low moments in CUE model) If
then the moment (6) vanishes unless
for all
, in which case it is equal to
Another way of viewing this proposition is that for distributed according to CUE, the random variables
are distributed like independent complex random variables of mean zero and variance
, as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of
, so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of
‘s and lets
go to infinity. (The paper of Diaconis and Shahshahani writes
in place of
, but I believe this to be a typo.)
Proof: Let be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence
Our starting point is Schur-Weyl duality. Namely, we consider the -dimensional complex vector space
This space has an action of the product group : the symmetric group
acts by permutation on the
tensor factors, while the general linear group
acts diagonally on the
factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition
where ranges over Young tableaux of size
with at most
rows,
is the
-irreducible unitary representation corresponding to
(which can be constructed for instance using Specht modules), and
is the
-irreducible polynomial representation corresponding with highest weight
.
Let be a permutation consisting of
cycles of length
(this is uniquely determined up to conjugation), and let
. The pair
then acts on
, with the action on basis elements
given by
The trace of this action can then be computed as
where is the
matrix coefficient of
. Breaking up into cycles and summing, this is just
But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity
where is the character on
associated to
, and
is the character on
associated to
. As is well known,
is just the Schur polynomial of weight
applied to the (algebraic, generalised) eigenvalues of
. We can specialise to unitary matrices to conclude that
and similarly
where consists of
cycles of length
for each
. On the other hand, the characters
are an orthonormal system on
with the CUE measure. Thus we can write the expectation (6) as
Now recall that ranges over all the Young tableaux of size
with at most
rows. But by (8) we have
, and so the condition of having
rows is redundant. Hence
now ranges over all Young tableaux of size
, which as is well known enumerates all the irreducible representations of
. One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if
,
are not conjugate, and is equal to
divided by the size of the conjugacy class of
(or equivalently, by the size of the centraliser of
) otherwise. But the latter expression is easily computed to be
, giving the claim.
Example 2 We illustrate the identity (11) when
,
. The Schur polynomials are given as
where
are the (generalised) eigenvalues of
, and the formula (11) in this case becomes
The functions
are orthonormal on
, so the three functions
are also, and their
norms are
,
, and
respectively, reflecting the size in
of the centralisers of the permutations
,
, and
respectively. If
is instead set to say
, then the
terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.
Example 3 Consider the moment
. For
, the above proposition shows us that this moment is equal to
. What happens for
? The formula (12) computes this moment as
where
is a cycle of length
in
, and
ranges over all Young tableaux with size
and at most
rows. The Murnaghan-Nakayama rule tells us that
vanishes unless
is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space
of vectors in
whose coordinates sum to zero), in which case it is equal to
(depending on the parity of the number of non-zero rows). As such we see that this moment is equal to
. Thus in general we have
Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges and
is not quite the range
but instead leaks into the adjacent range
. This issue can be addressed by working in a “weak" sense in which parameters such as
are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of
to the range
(it would be slightly better technically to use a smooth cutoff).
One can morally expand out (5) using (4) as
where ,
, and the integers
are in the ranges
for and
, and
for and
. Morally, the expectation here is negligible unless
in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then is significantly less than
, so the
multiplicative error in (15) becomes an additive error of
. On the other hand, because of the fundamental integrality gap – that the integers are always separated from each other by a distance of at least
– this forces the integers
,
to in fact be equal:
The von Mangoldt factors effectively restrict
to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces
, and
to be a permutation of
, which then forces
for all
._ For a given
, the number of possible
is then
, and the expectation in (14) is equal to
. Thus this expectation is morally
and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree case of pair correlations due to Montgomery, and the degree
case due to Hejhal.)
With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for , thus
for all . Montgomery showed this for (essentially) the range
(as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.
These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of . One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose
is some random polynomial depending on
of degree at most
. Let
denote the eigenvalues of
, and let
be a parameter. Observe from the pigeonhole principle that if the quantity
then the arcs cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than
(
times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than
times the mean angle separation. By judiciously choosing the coefficients of
as functions of the moments
, one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as
for sufficiently small , and this can be computed (in principle, at least) using (3) if the coefficients of
are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than
times the mean spacing and greater than
times the mean spacing infinitely often for certain
; the current records are
(due to Goldston and Turnage-Butterbaugh) and
(due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).
It would be of great interest if one could push the upper bound for the smallest gap below
. The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases
of the eigenvalues
of
are nasymptotically always non-zero integer multiples of
. The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of
); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which
is very close to zero, then
behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined
-function
will have a polynomial approximation which in our language looks like a scalar multiple of
, where
and
is a phase. The zeroes of this approximation lie on a coset of the
roots of unity; the polynomial
is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of
. Taking
then gives the claim.)
Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the Alternative Circular Unitary Ensemble (ACUE) which is indistinguishable in low moments in the sense that the expectation for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of
. This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)
To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group in terms of the distribution of the phases
of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure
is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity
where ,
denotes the dot product, and
is the “long word”, which implies that (20) is a trigonometric series with constant term
; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing
using the probability measure (20), and then generating
to be a random unitary matrix with eigenvalues
.
For the alternative distribution, we first draw on the discrete torus
(thus each
is a
root of unity) with probability density function
shift by a phase drawn uniformly at random, and then select
to be a random unitary matrix with eigenvalues
. Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form
for
. The diagonal contribution
gives the constant function
, which has total mass one. All of the other exponentials have a frequency
that is not a multiple of
, and hence will have mean zero on
. The claim follows.
From construction it is clear that the matrix drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of
. Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials
with
of size at most
and of equal size remain orthonormal with respect to the alternative measure. That is to say,
when have size equal to each other and at most
. In this case the phase
in the definition of
is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that
By Fourier decomposition, it then suffices to show that the trigonometric polynomial does not contain any components of the form
for some non-zero lattice vector
. But we have already observed that
is a linear combination of plane waves of the form
for
. Also, as is well known,
is a linear combination of plane waves
where
is majorised by
, and similarly
is a linear combination of plane waves
where
is majorised by
. So the product
is a linear combination of plane waves of the form
. But every coefficient of the vector
lies between
and
, and so cannot be of the form
for any non-zero lattice vector
, giving the claim.
Example 4 If
, then the distribution (21) assigns a probability of
to any pair
that is a permuted rotation of
, and a probability of
to any pair that is a permuted rotation of
. Thus, a matrix
drawn from the alternative distribution will be conjugate to a phase rotation of
with probability
, and to
with probability
.
A similar computation when
gives
conjugate to a phase rotation of
with probability
, to a phase rotation of
or its adjoint with probability of
each, and a phase rotation of
with probability
.
Remark 5 For large
it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of
; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.
In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then is periodic in
with period
, so from Proposition 1 for the alternative distribution one has
which differs from (13) for any . (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single
with
would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)
Remark 6 One can view the CUE as normalised Lebesgue measure on
(viewed as a smooth submanifold of
). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of
consisting of those unitary matrices whose phase spacings are non-zero integer multiples of
; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel
(or one can equivalently take
; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be
roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of
). In particular, the
-point correlation functions of ACUE (after this rotation) are precisely the restriction of the
-point correlation functions of CUE after normalisation, that is to say they are proportional to
.
Remark 7 One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics for
for some small even exponent
(almost always
or
) and some short Dirichlet polynomial
; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like
where
is now some random medium degree polynomial that depends on the unitary matrix
associated to
(and in applications will typically also contain some negative power of
to cancel the corresponding powers of
in
). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of
involve products of traces
of total order less than
, then in terms of the eigenvalue phases
,
is a linear combination of plane waves
where the frequencies
have coefficients of magnitude less than
. On the other hand, as each coefficient of
is an elementary symmetric function of the eigenvalues,
is a linear combination of plane waves
where the frequencies
have coefficients of magnitude at most
. Thus
is a linear combination of plane waves where the frequencies
have coefficients of magnitude less than
, and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus
by the previous arguments. In other words,
has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier
has degree close to or exceeding
, which corresponds to Dirichlet polynomials
of length close to or exceeding
, which is far beyond current technology for such moment estimates.
Remark 8 The GUE hypothesis for the zeta function asserts that the average
for any
and any test function
, where
is the Dyson sine kernel and
are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for
. The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):
This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)
The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).
However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:
Theorem 1 At least one of the following two statements are true:
- (Twin prime conjecture) There are infinitely many primes
such that
is also prime.
- (No Siegel zeroes) There exists a constant
such that for every real Dirichlet character
of conductor
, the associated Dirichlet
-function
has no zeroes in the interval
.
Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.
The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound
for some large value of
If there is a Siegel zero with
close to
and
a Dirichlet character of conductor
, then multiplicative number theory methods can be used to show that the Möbius function
“pretends” to be like the character
in the sense that
for “most” primes
near
(e.g. in the range
for some small
and large
). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.
The fact that pretends to be like
can be used to construct a tractable approximation (after inserting the sieve weight
) in the range
(where
for some large
) for the second von Mangoldt function
, namely the function
One expects to be a good approximant to
if
is of size
and has no prime factors less than
for some large constant
. The Selberg sieve
will be mostly supported on numbers with no prime factor less than
. As such, one can hope to approximate (1) by the expression
Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:
Lemma 2 (Kloosterman bound) One haswhenever
and
are coprime to
, where the
is with respect to the limit
(and is uniform in
).
Proof: Observe from change of variables that the Kloosterman sum is unchanged if one replaces
with
for
. For fixed
, the number of such pairs
is at least
, thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound
We will also need another easy case of the Weil bound to handle some other portions of (2):
Lemma 3 (Easy Weil bound) Letbe a primitive real Dirichlet character of conductor
, and let
. Then
Proof: As is the conductor of a primitive real Dirichlet character,
is equal to
times a squarefree odd number for some
. By the Chinese remainder theorem, it thus suffices to establish the claim when
is an odd prime. We may assume that
is not divisible by this prime
, as the claim is trivial otherwise. If
vanishes then
does not vanish, and the claim follows from the mean zero nature of
; similarly if
vanishes. Hence we may assume that
do not vanish, and then we can normalise them to equal
. By completing the square it now suffices to show that
While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function in place of
. These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.
In Notes 1, we approached multiplicative number theory (the study of multiplicative functions and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions
and logarithmic sums
. Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series
, defined (at least for
of sufficiently large real part) by the formula
These series also made an appearance in the elementary approach to the subject, but only for real that were larger than
. But now we will exploit the freedom to extend the variable
to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as
or
from control on the Dirichlet series. Crucially, for many key functions
of number-theoretic interest, the Dirichlet series
can be analytically (or at least meromorphically) continued to the left of the line
. The zeroes and poles of the resulting meromorphic continuations of
(and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of
; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function
are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function
(and hence to the primes) of the form
where the sum is over zeroes (counting multiplicity) of the Riemann zeta function
(with the sum often restricted so that
has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that
for suitable “test functions” (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem
as , with the size of the error term
closely tied to the location of the zeroes
of the Riemann zeta function.
The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation
for the Dirichlet series of the von Mangoldt function; note that (4) is formally the special case of (2) when
. Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.
More generally, one has an explicit formula
for any (non-principal) Dirichlet character , where
now ranges over the zeroes of the associated Dirichlet
-function
; we view this formula as a “twist” of (1) by the Dirichlet character
. The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that
as , whenever
is a fixed primitive residue class. Again, the size of the error term
here is closely tied to the location of the zeroes of the Dirichlet
-function, with particular importance given to whether there is a zero very close to
(such a zero is known as an exceptional zero or Siegel zero).
While any information on the behaviour of zeta functions or -functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are
- The region on or near the point
.
- The region on or near the right edge
of the critical strip
.
- The right half
of the critical strip.
- The region on or near the critical line
that bisects the critical strip.
- Everywhere else.
For instance:
- We will shortly show that the Riemann zeta function
has a simple pole at
with residue
, which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function
. For Dirichlet
-functions, the behaviour is instead controlled by the quantity
discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
- The zeta function is also known to have no zeroes on the right edge
of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for
into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for
-functions and the prime number theorem in arithmetic progressions.
- The (as yet unproven) Riemann hypothesis prohibits
from having any zeroes within the right half
of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis, zero density estimates in this region are available that give some partial control of this form. Similarly for
-functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis.
- Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general GUE hypothesis) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for
-functions and primes in short arithmetic progressions.
- The functional equation of the zeta function describes the behaviour of
to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for
-functions.
Remark 1 If one takes an “adelic” viewpoint, one can unite the Riemann zeta function
and all of the
-functions
for various Dirichlet characters
into a single object, viewing
as a general multiplicative character on the adeles; thus the imaginary coordinate
and the Dirichlet character
are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet
-functions. (The non-Archimedean character
and the Archimedean character
behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)
Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.
As a consequence of this hierarchy of importance, information about the function away from the critical strip, such as Euler’s identity
or equivalently
or the infamous identity
which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as
are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value
of an
-function at
turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.
For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.

Recent Comments