Dwork's proof of rationality

Chapter 11 Dwork's proof of rationality

Original lecture date: November 4, 2019.

In this lecture, we discuss Dwork's proof of rationality of the zeta function for varieties over finite fields. Dwork's proof does not itself use a finite-dimensional cohomology theory on \(X\text{,}\) but it did inspire the construction of \(p\)-adic Weil cohomology which we will see later.

Readings 11.0.1.

The original paper of Dwork is [39]. An updated presentation of the proof was given by Koblitz [80].

Section 11.1

Theorem 11.1.1. Dwork, 1958.

For any scheme \(X\) of finite type over \(\FF_q\text{,}\) \(Z(X,T)\) represents a rational function of \(T\text{.}\)

The proof involves three key components.

Some initial reduction steps to put the problem in a more convenient form.
An extension of a theorem of Borel on power series.
Use of \(p\)-adic analysis to check the hypothesis of Borel's theorem.

We start with the reduction steps. Recall that if \(X\) splits as a disjoint union of an open subscheme \(U\) and a closed subscheme \(S\text{,}\) then

\begin{equation*} Z(X,T) = Z(U,T) Z(S,T). \end{equation*}

Similarly, if \(X\) is a union of two closed subschemes \(X_1\) and \(X_2\text{,}\) then

\begin{equation*} Z(X,T) = \frac{Z(X_1, T) Z(X_2, T)}{Z(X_1 \cap X_2, T)}. \end{equation*}

Using this logic (and induction on dimension), we may reduce to the case where \(X\) is affine and irreducible; in fact, we can assume that \(X\) is contained not just in an affine space \(\AAA^n\text{,}\) but in a torus \(\GG_m^n\text{.}\)

We can even take this a bit further. Write \(X\) as the subscheme of \(\GG_m^n\) cut out by some Laurent polynomials \((P_1,\dots,P_n)\text{.}\) Suppose that we know the rationality for the hypersurface cut out by any single (but not necessarily irreducible) Laurent polynomial; we may then deduce the same conclusion for an intersection of \(k\) such hypersurfaces by induction on \(k\text{.}\) That is, we may assume from now on that \(X\) is the zero locus of a (not necessarily irreducible) Laurent polynomial \(P \in k[x_1^{\pm},\dots,x_n^{\pm}]\) in a torus \(\GG_m^n\text{.}\)

We next turn to the Borel–Dwork theorem. It is motivated by a simple observation.

Lemma 11.1.2.

Let \(f(T)=\sum_{n=0}^\infty a_nT^n\in \ZZ\llbracket T \rrbracket\) be a power series which over \(\CC\) has radius of convergence strictly greater than 1. Then \(f(T) \in \ZZ[T]\text{.}\)

Proof.

The root test implies \(\limsup_{n \to \infty} a_n^{1/n}\lt 1\text{.}\) The only integer with absolute value less than \(1\) is zero, giving the claim.

In the setting of zeta functions we do not expect polynomials, and we don't have much control over any archimedean valuations, although we can at least prove that \(Z(X,T)\) has some positive radius of convergence.

Lemma 11.1.3.

As a power series over \(\CC\text{,}\) \(Z(X,T)\) has radius of convergence at least \(q^{-\dim(X)}\text{.}\)

Proof.

Since we are assuming \(X\) is a toric hypersurface, it admits a finite morphism \(f\colon X \to \GG_m^d\) where \(d = \dim(X)\text{.}\) Then

\begin{equation*} \#X(\FF_{q^n}) \leq \deg(f) \# (q^n-1)^d, \end{equation*}

so the claim follows by an elementary calculation using the expression

\begin{equation*} Z(X,T) = \exp\left(\sum \#X(\FF_{q^n})\frac{T^n}{n}\right). \end{equation*}

To finesse this issue, we bring in the other places of \(\QQ\text{.}\) The statement

\begin{equation*} \{x\in \ZZ:|x|\lt 1\}=\{0\} \end{equation*}

has an analogue over \(\QQ\) in the form of the product formula for the valuations of \(\QQ\text{:}\)

\begin{equation*} \left\{x\in \QQ\colon \left(\prod_{v\text{ a valuation of }\QQ}|x|_v\right)\neq 1\right\} = \{0\}. \end{equation*}

This forms the basis of Dwork's extension of Borel's theorem.

Theorem 11.1.4. Borel, 1894, extended by Dwork in 1958.

Suppose \(f(T)\in \ZZ \llbracket T \rrbracket\) has radius of convergence over \(\CC\) at least \(R\) and is meromorphic on \(\QQ_p\) for \(|T|\lt r\) (that is, it is the ratio of two power series with radius of convergence at least \(r\text{,}\) as measured by the root test). If \(R>r^{-1}\text{,}\) then \(f(T)\) represents a rational function of \(T\text{.}\) Additionally, if \(f(T)\) itself has radius of convergence over \(\QQ_p\) at least \(r\text{,}\) \(f(T)\) is a polynomial.

Proof.

We give only an outline of the proof here; the details are filled in Exercise 19.4.2. Suppose first that \(f(T)\) has radius of convergence over \(\QQ_p\) at least \(r\text{.}\) By the root test, we have

\begin{equation*} |a_n|_{\infty}\lt C_\epsilon(R-\epsilon)^{-n}, |a_n|_p\lt C_\delta(r-\delta)^{-n} \end{equation*}

and, since \(f(T)\in \ZZ \llbracket T \rrbracket\text{,}\) \(|a_n|_l\leq 1\) for any other valuation. In particular, for \(n \gg 0\text{,}\)

\begin{equation*} \prod_{v\text{ a valuation of }\QQ}|x|_v\lt \frac{C_\epsilon C_\delta}{(R-\epsilon)^n(r-\delta)^n}\lt 1. \end{equation*}

In particular, \(a_n=0\) for \(n\) sufficiently large, as desired.

In the general setting, write

\begin{equation*} f(T) = \frac{g(T)}{h(T)} \end{equation*}

with \(g(T),h(T)\in \QQ_p\llbracket T \rrbracket\) having radius of convergence at least \(r\text{.}\) If \(h(T)\) is a polynomial, we can clear denominators and argue that the result is a polynomial. Otherwise, \(h(T)\) can have only finitely many zeroes in \(\overline{\QQ_p}\) with absolute value less than \(r-\delta\) for any \(\delta>0\text{.}\) Thus on \(|T|\lt r-\delta\) we can write \(h(T) = p_\delta(T)u_\delta(T)\text{,}\) with \(p_\delta(T)\in \QQ_p[T]\) and \(u_\delta(T)\) a unit in \(\QQ_p\llbracket T\rrbracket\text{.}\) The idea then is to strip off \(p_\delta(T)\) and apply the previous observation to \(g(T)/u_\delta(T)\text{,}\) but in a way that is uniform as \(\delta\) varies.

Remark 11.1.5.

As an aside, the equality \(h(T)=p_\delta(T)u_\delta(T)\) in the proof of Theorem 11.1.4 is governed by the Newton polygon of \(h\text{.}\) Suppose \(h(T) = \sum a_nT^n\) is a polynomial with \(a_0=1\text{;}\) then the Newton polygon of \(h\) is the lower convex hull of the set \(\{(i,v_p(a_i)\}\text{.}\) See the following diagram for an example.

The main theorem is that if the Newton polygon has a segment of width \(w\) and slope \(s\text{,}\) then the original polynomial has exactly \(w\) roots of \(p\)-adic valuation \(-s\text{.}\)

At this point, Dwork proceeds by emulating Weil's analysis of Fermat hypersurfaces. Recall that we are assuming that \(X = \Spec k[x_1^{\pm},\dots,x_n^{\pm}]/(P)\) is a toric hypersurface. Write \(q = p^a\) and let \(\Theta\) be a nontrivial additive character of \(\FF_p\) (we do not yet specify where \(\Theta\) is valued), so that for any positive integer \(s\text{,}\)

\begin{equation*} \Theta_{as}(x) = \Theta(x^{1+p+\cdots+p^{as-1}}) \end{equation*}

is a nontrivial additive character of \(\FF_{q^s}\text{.}\) Then

\begin{equation*} \sum_{x_0 \in \FF_{q^s}} \Theta_{as}(x_0 y) = \begin{cases} q^s & y=0 \\ 0 & y \neq 0; \end{cases} \end{equation*}

consequently,

\begin{equation*} q^s \#X(\FF_{q^s}) = (q^s-1)^n + \sum_{x_0,\dots,x_n \in \FF_{q^s}^\times} \Theta_{as}(x_0 P(x_1,\dots,x_n)). \end{equation*}

If we expand \(x_0 P\) as a sum \(\sum_j \alpha_j \mu_j(x)\) with \(\alpha_j \in \FF_q^\times\) and \(\mu_j(x)\) a monomial in \(x_0,\dots,x_n\text{,}\) we also have

\begin{equation*} q^s \#X(\FF_{q^s}) = (q^s-1)^n + \sum_{x_0,\dots,x_n \in \FF_{q^s}^\times} \prod_j \Theta_{as}(\alpha_j \mu_j(x)). \end{equation*}

So far we have done nothing \(p\)-adic. We now make the key advance, expressing the character \(\Theta\) in \(p\)-adic terms.

Definition 11.1.6.

Let \(\ZZ_q\) be the finite étale extension of \(\ZZ_p\) with residue field \(\FF_q\text{.}\) For \(x \in \FF_q^\times\text{,}\) there is a unique \((q-1)\)-st root of unity in \(\ZZ_q\) congruent to \(x\) modulo \(p\text{.}\) We denote it by \([x]\) and call it the multiplicative lift of \(x\text{.}\) We also write \([0] = 0\text{.}\)

Form the product

\begin{equation*} (1+Y)^X (1+Y^p)^{(X^p-X)/p} (1+Y^{p^2})^{(X^{p^2}-X^p)/p^2} \cdots \in \QQ \llbracket X, Y \rrbracket \end{equation*}

using binomial expansions, and label its coefficients as

\begin{equation*} \sum_{n=0}^\infty \sum_{m=n}^\infty a_{m,n} X^n Y^m. \end{equation*}

Then pick a primitive \(p\)-th root of unity \(\zeta_p\) in \(\overline{\QQ}_p\text{,}\) put \(\lambda = \zeta_p - 1\text{,}\) and define

\begin{equation*} \Theta(T) = \sum_{n=0}^\infty a_n T^n, \qquad = \sum_{m=n}^\infty a_{m,n} \lambda^m. \end{equation*}

As in [80], Chapter~4, one verifies that \(x \mapsto \Theta([x])\) is a nontrivial additive character of \(\FF_p\text{,}\) so we may use it in place of \(\Theta(x)\) in the previous calculations.

Remark 11.1.7.

The term multiplicative lift is a proposed replacement for the more common term Teichmüller lift. The latter is historically accurate but inadvisable due to Teichmüller's actions during the Nazi regime in 1930s Germany. See the MacTutor article on Teichmüller ¹.

Remark 11.1.8.

The power series \(\Theta(T)\) is not uniquely determined by the fact that it gives rise to an additive character of \(\FF_p\) in this fashion; indeed, Dwork used a slightly different construction (see Exercise 19.4.4), but this discrepancy has no significant effect on the resulting argument.

Now define the power series

\begin{equation*} G(X_0,\dots,X_n) \colonequals \prod_j \Theta([\alpha_j] \mu_j(X)) \Theta([\alpha_j]^p \mu_j(X)^p) \cdots \Theta([\alpha_j]^{p^{a-1}} \mu_j(X)^{p^{a-1}}), \end{equation*}

so that

\begin{equation*} q^s \#X(\FF_{q^s}) = (q^s-1)^n + \sum_{x_0,\dots,x_n \in \FF_{q^s}^\times} \prod_j G([x]) G([x^q]) \cdots G([x^{q^{s-1}}]) \end{equation*}

where \(G([x^{q^i}])\) is shorthand for \(G([x_0^{q^i}], \dots, [x_n^{q^i}])\text{.}\) The sum can be written as the trace (for a suitable topology) of the operator \(f \mapsto T(Gf)\) on \(\QQ_p \llbracket X_0,\dots,X_n \rrbracket\text{,}\) where \(T\) is the “decimation” map

\begin{equation*} \sum a_{i_0,\dots,i_n} X_0^{i_0} \cdots X_0^{i_n} \mapsto \sum a_{pi_0,\dots,pi_n} X_0^{i_0} \cdots X_0^{i_n} \end{equation*}

(see [80], V.3, Lemma 3). Since this operator does not act on a finite-dimensional space, it does not immediately give rationality of the zeta function; however, it does give \(p\)-adic meromorphicity, which is what we need to plug into Borel's theorem and complete the proof.

Remark 11.1.9.

Notice that the key move here was indeed to use a trace formula, but on an infinite-dimensional vector space. This construction can be promoted to give what is sometimes called Dwork cohomology, which we do not treat as a Weil cohomology per se but is nonetheless extremely useful in the study of zeta functions.

Remark 11.1.10.

If one were to specialize this argument back to Fermat hypersurfaces, it would yield an explicit \(p\)-adic analytic formula for Gauss sums. This was somehow missed by Dwork and his contemporaries, only to be appear later as the Gross–Koblitz formula [57].

mathshistory.st-andrews.ac.uk/Biographies/Teichmuller