# Tag Archives: mathematics

## Computing $\pi$

This note was originally written in the context of my fall Math 100 class at Brown University. It is also available as a pdf note.

While investigating Taylor series, we proved that
\label{eq:base}
\frac{\pi}{4} = 1 – \frac{1}{3} + \frac{1}{5} – \frac{1}{7} + \frac{1}{9} + \cdots

Let’s remind ourselves how. Begin with the geometric series

\frac{1}{1 + x^2} = 1 – x^2 + x^4 – x^6 + x^8 + \cdots = \sum_{n = 0}^\infty (-1)^n x^{2n}. \notag

(We showed that this has interval of convergence $\lvert x \rvert < 1$). Integrating this geometric series yields

\int_0^x \frac{1}{1 + t^2} dt = x – \frac{x^3}{3} + \frac{x^5}{5} – \frac{x^7}{7} + \cdots = \sum_{n = 0}^\infty (-1)^n \frac{x^{2n+1}}{2n+1}. \notag

Note that this has interval of convergence $-1 < x \leq 1$.

We also recognize this integral as

\int_0^x \frac{1}{1 + t^2} dt = \text{arctan}(x), \notag

one of the common integrals arising from trigonometric substitution. Putting these together, we find that

\text{arctan}(x) = x – \frac{x^3}{3} + \frac{x^5}{5} – \frac{x^7}{7} + \cdots = \sum_{n = 0}^\infty (-1)^n \frac{x^{2n+1}}{2n+1}. \notag

As $x = 1$ is within the interval of convergence, we can substitute $x = 1$ into the series to find the representation

\text{arctan}(1) = 1 – \frac{1}{3} + \frac{1}{5} – \frac{1}{7} + \cdots = \sum_{n = 0}^\infty (-1)^n \frac{1}{2n+1}. \notag

Since $\text{arctan}(1) = \frac{\pi}{4}$, this gives the representation for $\pi/4$ given in \eqref{eq:base}.

However, since $x=1$ was at the very edge of the interval of convergence, this series converges very, very slowly. For instance, using the first $50$ terms gives the approximation

\pi \approx 3.121594652591011. \notag

The expansion of $\pi$ is actually

\pi = 3.141592653589793238462\ldots \notag

So the first $50$ terms of \eqref{eq:base} gives two digits of accuracy. That’s not very good.

I think it is very natural to ask: can we do better? This series converges slowly — can we find one that converges more quickly?

### Aside

As an aside: one might also ask if we can somehow speed up the convergence of the series we already have. It turns out that in many cases, you can! For example, we know in alternating series that the sum of the whole series is between any two consecutive partial sums. So what if you took the average of two consecutive partial sums? [Equivalently, what if you added only one half of the last term in a partial sum. Do you see why these are the same?]

The average of the partial sum of the first 49 terms and the partial sum of the first 50 terms is actually

3.141796672793031, \notag

which is correct to within $0.001$. That’s an improvement!

What if you do still more? More on this can be found in the last Section.

## Estimating $\pi$ through a different series

We return to the question: can we find a series that gives us $\pi$, but which converges faster? Yes we can! And we don’t have to look too far — we can continue to rely on our expansion for $\text{arctan}(x)$.

We had been using that $\text{arctan}(1) = \frac{\pi}{4}$. But we also know that $\text{arctan}(1/\sqrt{3}) = \frac{\pi}{6}$. Since $1/\sqrt{3}$ is closer to the center of the power series than $1$, we should expect that the convergence is much better.

Recall that

\text{arctan}(x) = x – \frac{x^3}{3} + \frac{x^5}{5} – \frac{x^7}{7} + \cdots = \sum_{n = 0}^\infty (-1)^n \frac{x^{2n + 1}}{2n + 1}. \notag

Then we have that
\begin{align}
\text{arctan}\left(\frac{1}{\sqrt 3}\right) &= \frac{1}{\sqrt 3} – \frac{1}{3(\sqrt 3)^3} + \frac{1}{5(\sqrt 3)^5} + \cdots \notag \\
&= \frac{1}{\sqrt 3} \left(1 – \frac{1}{3 \cdot 3} + \frac{1}{5 \cdot 3^2} – \frac{1}{7 \cdot 3^3} + \cdots \right) \notag \\
&= \frac{1}{\sqrt 3} \sum_{n = 0}^\infty (-1)^n \frac{1}{(2n + 1) 3^n}. \notag
\end{align}
Therefore, we have the equality

\frac{\pi}{6} = \frac{1}{\sqrt 3} \sum_{n = 0}^\infty (-1)^n \frac{1}{(2n + 1) 3^n} \notag

or rather that

\pi = 2 \sqrt{3} \sum_{n = 0}^\infty (-1)^n \frac{1}{(2n + 1) 3^n}. \notag

From a computation perspective, this is far superior. For instance, based on our understanding of error from the alternating series test, using the first $10$ terms of this series will approximate $\pi$ to within

2 \sqrt 3 \frac{1}{23 \cdot 3^{11}} \approx \frac{1}{26680}. \notag

Let’s check this.

2 \sqrt 3 \left(1 – \frac{1}{3\cdot 3} + \frac{1}{5 \cdot 3^2} + \cdots + \frac{1}{21 \cdot 3^{10}}\right) = 3.1415933045030813. \notag

Look at how close that approximation is, and we only used the first $10$ terms!
Roughly speaking, each additional 2.5 terms yields another digit of $\pi$. Using the first $100$ terms would give the first 48 digits of $\pi$.
Using the first million terms would give the first 47000 (or so) digits of $\pi$ — and this is definitely doable, even on a personal laptop. (On my laptop, it takes approximately 4 milliseconds to compute the first 20 digits of $\pi$ using this technique).

### Even Better Series

I think it is very natural to ask again: can we find an even faster converging series? Perhaps we can choose better values to evaluate arctan at? This turns out to be a very useful line of thought, and it leads to some of the best-known methods for evaluating $\pi$. Through clever choices of values and identities involving arctangents, one can construct extremely quickly converging series for $\pi$. For more information on this line of thought, look up Machin-like formula.

## Patterns in the Approximation of $\pi/4$

Looking back at the approximation of $\pi$ coming from the first $50$ terms of the series
\label{eq:series_pi4_base}
1 – \frac{1}{3} + \frac{1}{5} – \frac{1}{7} + \cdots

we found an approximation of $\pi$, which I’ll represent as $\widehat{\pi}$,

\pi \approx \widehat{\pi} = 3.121594652591011. \notag

Let’s look very carefully at how this compares to $\pi$, up to the first $10$ decimals. We color the incorrect digits in ${\color{orange}{orange}}$.
\begin{align}
\pi &= 3.1415926535\ldots \notag \\
\widehat{\pi} &= 3.1{\color{orange}2}159{\color{orange}4}65{\color{orange}2}5 \notag
\end{align}
Notice that most of the digits are correct — in fact, only three (of the first ten) are incorrect! Isn’t that weird?

It happens to be that when one uses the first $10^N / 2$ terms (for any $N$) of the series \eqref{eq:series_pi4_base}, there will be a pattern of mostly correct digits with disjoint strings of incorrect digits in the middle. This is an unusual and surprising phenomenon.

The positions of the incorrect digits can be predicted. Although I won’t go into any detail here, the positions of the errors are closely related to something called Euler Numbers or, more deeply, to Boole Summation.

Playing with infinite series leads to all sorts of interesting patterns. There is a great history of mathematicians and physicists messing around with series and stumbling across really deep ideas.

## Speeding up computation

Take an alternating series

\sum_{n = 0}^\infty (-1)^{n} a_n = a_0 – a_1 + a_2 – a_3 + \cdots \notag

If ${a_n}$ is a sequence of positive, decreasing terms with limit $0$, then the alternating series converges to some value $S$. And further, consecutive partial sums bound the value of $S$, in that

\sum_{n = 0}^{2K-1} (-1)^{n} a_n \leq S \leq \sum_{n = 1}^{2K} (-1)^{n} a_n. \notag

For example,

1 – \frac{1}{3} < \sum_{n = 0}^\infty \frac{(-1)^{n}}{2n+1} < 1 – \frac{1}{3} + \frac{1}{5}. \notag

Instead of approximating the value of the whole sum $S$ by the $K$th partial sum $\sum_{n \leq K} (-1)^n a_n$, it might seem reasonable to approximate $S$ by the average of the $(K-1)$st partial sum and the $K$th partial sum. Since we know $S$ is between the two, taking their average might be closer to the real result.

As mentioned above, the average of the partial sum consisting of the first $49$ terms of \eqref{eq:base} and the first $50$ terms of \eqref{eq:base} gives a much improved estimate of $\pi$ than using either the first $49$ or first $50$ terms on their own. (And indeed, it works much better than even the first $500$ terms on their own).

Before we go on, let’s introduce a little notation. Let $S_K$ denote the partial sum of the terms up to $K$, i.e.

S_K = \sum_{n = 0}^K (-1)^{n} a_n. \notag

Then the idea is that instead of using $S_{K}$ to approximate the wholse sum $S$, we’ll use the average

\frac{S_{K-1} + S_{K}}{2} \approx S. \notag

Averaging once seems like a great idea. What if we average again? That is, what if instead of using the average of $S_{K-1}$ and $S_K$, we actually use the average of (the average of $S_{K-2}$ and $S_{K-1}$) and (the average of $S_{K_1}$ and $S_K$),
\label{eq:avgavg}
\frac{\frac{S_{K-2} + S_{K-1}}{2} + \frac{S_{K-1} + S_{K}}{2}}{2}.

As this is really annoying to write, let’s come up with some new notation. Write the average between a quantity $X$ and $Y$ as

[X, Y] = \frac{X + Y}{2}. \notag

Further, define the average of $[X, Y]$ and $[Y, Z]$ to be $[X, Y, Z]$,

[X, Y, Z] = \frac{[X, Y] + [Y, Z]}{2} = \frac{\frac{X + Y}{2} + \frac{Y + Z}{2}}{2}. \notag

So the long expression in \eqref{eq:avgavg} can be written as $[S_{K-2}, S_{K-1}, S_{K}]$.

With this notation in mind, let’s compute some numerics. Below, we give the actual value of $\pi$, the values of $S_{48}, S_{49}$, and $S_{50}$, pairwise averages, and the average-of-the-average, in the case of $1 – \frac{1}{3} + \frac{1}{5} + \cdots$.
\notag
\begin{array}{c|l|l}
& \text{Value} & \text{Difference from } \pi \\ \hline
\pi & 3.141592653589793238462\ldots & \phantom{-}0 \\ \hline
4 \cdot S_{48} & 3.1207615795929895 & \phantom{-}0.020831073996803617 \\ \hline
4 \cdot S_{49} & 3.161998692995051 & -0.020406039405258092 \\ \hline
4 \cdot S_{50} & 3.121594652591011 & \phantom{-}0.01999800099878213 \\ \hline
4 \cdot [S_{48}, S_{49}] & 3.1413801362940204 & \phantom{-}0.0002125172957727628 \\ \hline
4 \cdot [S_{49}, S_{50}] & 3.1417966727930313 & -0.00020401920323820377 \\ \hline
4 \cdot [S_{48}, S_{49}, S_{50}] & 3.141588404543526 & \phantom{-}0.00000424904626727951 \\ \hline
\end{array}

So using the average of averages from the three sums $S_{48}, S_{49}$, and $S_{50}$ gives $\pi$ to within $4.2 \cdot 10^{-6}$, an incredible improvement compared to $S_{50}$ on its own.

There is something really odd going on here. We are not computing additional summands in the overall sum \eqref{eq:base}. We are merely combining some of our partial results together in a really simple way, repeatedly. Somehow, the sequence of partial sums contains more information about the limit $S$ than individual terms, and we are able to extract some of this information.

I think there is a very natural question. What if we didn’t stop now? What if we took averages-of-averages-of-averages, and averages-of-averages-of-averages-of-averages, and so on? Indeed, we might define the average

[X, Y, Z, W] = \frac{[X, Y, Z] + [Y, Z, W]}{2}, \notag

and so on for larger numbers of terms. In this case, it happens to be that

[S_{15}, S_{16}, \ldots, S_{50}] = 3.141592653589794,

which has the first 15 digits of $\pi$ correct!

By repeatedly averaging alternating sums of just the first $50$ reciprocals of odd integers, we can find $\pi$ up to 15 digits. I think that’s incredible — it seems both harder than it might have been (as this involves lots of averaging) and much easier than it might have been (as the only arithmetic input are the fractions $1/(2n+1)$ for $n$ up to $50$.

Although we leave the thread of ideas here, there are plenty of questions that I think are now asking themselves. I encourage you to ask them, and we may return to this (or related) topics in the future. I’ll see you in class.

## Math 420: Second Week Homework

Firstly, we have three administrative notes.

1. I’ve posted the second homework set. You can find it here.
2. I’ve also written solutions to the first homework set. You can find those here.
3. After feedback from the first week, I’m setting stable office hours. My office hours will be from 1-3pm on Monday and 2:30-3:30pm on Tuesday (immediately following our class). [Or we can set up an appointment].

I’ll see you on Tuesday, when we will continue to talk about the Euclidean Algorithm and greatest common divisors.

Posted in Brown University, Math 420, Mathematics, Teaching | Tagged , , , | Leave a comment

## Estimating the number of squarefree integers up to $X$

I recently wrote an answer to a question on MSE about estimating the number of squarefree integers up to $X$. Although the result is known and not too hard, I very much like the proof and my approach. So I write it down here.

First, let’s see if we can understand why this “should” be true from an analytic perspective.

We know that
$$\sum_{n \geq 1} \frac{\mu(n)^2}{n^s} = \frac{\zeta(s)}{\zeta(2s)},$$
and a general way of extracting information from Dirichlet series is to perform a cutoff integral transform (or a type of Mellin transform). In this case, we get that
$$\sum_{n \leq X} \mu(n)^2 = \frac{1}{2\pi i} \int_{(2)} \frac{\zeta(s)}{\zeta(2s)} X^s \frac{ds}{s},$$
where the contour is the vertical line $\text{Re }s = 2$. By Cauchy’s theorem, we shift the line of integration left and poles contribute terms or large order. The pole of $\zeta(s)$ at $s = 1$ has residue
$$\frac{X}{\zeta(2)},$$
so we expect this to be the leading order. Naively, since we know that there are no zeroes of $\zeta(2s)$ on the line $\text{Re } s = \frac{1}{2}$, we might expect to push our line to exactly there, leading to an error of $O(\sqrt X)$. But in fact, we know more. We know the zero-free region, which allows us to extend the line of integration ever so slightly inwards, leading to a $o(\sqrt X)$ result (or more specifically, something along the lines of $O(\sqrt X e^{-c (\log X)^\alpha})$ where $\alpha$ and $c$ come from the strength of our known zero-free region.

In this heuristic analysis, I have omitted bounding the top, bottom, and left boundaries of the rectangles of integration. But proceeding in a similar way as in the proof of the analytic prime number theorem, you could proceed here. So we expect the answer to look like
$$\frac{X}{\zeta(2)} + O(\sqrt X e^{-c (\log X)^\alpha})$$
using no more than the zero-free region that goes into the prime number theorem.

We will now prove this result, but in an entirely elementary way (except that I will refer to a result from the prime number theorem). This is below the fold.

Posted in Expository, Math.NT, Mathematics | | Leave a comment

## Notes from a talk on the Mean Value Theorem

1. Introduction

When I first learned the Mean Value Theorem and the Intermediate Value Theorem, I thought they were both intuitively obvious and utterly useless. In one of my courses in analysis, I was struck when, after proving the Mean Value Theorem, my instructor said that all of calculus was downhill from there. But it was a case of not being able to see the forest for the trees, and I missed the big picture.

I have since come to realize that almost every major (and often, minor) result of calculus is a direct and immediate consequence of the Mean Value Theorem and the Intermediate Value Theorem. In this talk, we will focus on the forest, the big picture, and see the Mean Value Theorem for what it really is: the true Fundamental Theorem of Calculus.

Posted in Expository, Mathematics | | 2 Comments

## Three Conundrums on Infinity

In this short post, we introduce three conundrums dealing with infinity. This is inspired by my calculus class, as we explore various confusing and confounding aspects of infinity and find that it’s very confusing, sometimes mindbending.

Order Matters

Consider the alternating unit series $$\sum_{n \geq 0} (-1)^n.$$
We want to try to understand its convergence. If we write out the first several terms, it looks like $$1 – 1 + 1 – 1 + 1 – 1 + \cdots$$
What if we grouped the terms while we were summing them? Perhaps we should group them like so, $$(1 – 1) + (1 – 1) + (1 – 1) + \cdots = 0 + 0 + 0 + \cdots$$
so that the sum is very clearly ${0}$. Adding infinitely many zeroes certainly gives zero, right?

On the other hand, what if we group the terms like so, $$1 + (-1 + 1) + (-1 + 1) + \cdots = 1 + 0 + 0 + \cdots$$
which is very clearly ${1}$. After all, adding ${1}$ to infinitely many zeroes certainly gives one, right?

A related, perhaps deeper paradox is one we mentioned in class. For conditionally convergent series like the alternating harmonic series $$\sum_{n = 1}^\infty \frac{(-1)^n}{n},$$
if we are allowed to rearrange the terms then we can have the series sum to any number that we want. This is called the Riemann Series Theorem.

The Thief and the King

A very wealthy king keeps gold coins in his vault, but a sneaky thief knows how to get in. Suppose that each day, the king puts two more gold coins into the vault. And each day, the thief takes one gold coin out (so that the king won’t notice that the vault is empty). After infinitely many days, how much gold is left in the vault?

Suppose that the king numbers each coin. So on day 1, the king puts in coins labelled 1 and 2, and on day 2 he puts in coins labelled 3 and 4, and so on. What if the thief steals the odd numbered coin each day? Then at the end of time, the king has all the even coins.

But what if instead, the thief steals from the bottom. So he first steals coin number 1, then number 2, and so on. At the end of time, no coin is left in the vault, since for any number ${n}$, the ${n}$th coin has been taken by the king.

Prevalence of Rarity

When I drove to Providence this morning, the car in front of me had the license place 637RB2. Think about it – out of the approximately ${10\cdot10\cdot10\cdot26\cdot 26 \cdot 10 = 6760000}$ possibilities, I happened across this one. Isn’t that amazing! How could something so rare happen to me?

Amazingly, something just as rare happened last time I drove to Providence too!

Posted in Brown University, Mathematics, Teaching | Tagged , , | 1 Comment

## Continuity of the Mean Value

1. Introduction

When I first learned the mean value theorem as a high schooler, I was thoroughly unimpressed. Part of this was because it’s just like Rolle’s Theorem, which feels obvious. But I think the greater part is because I thought it was useless. And I continued to think it was useless until I began my first proof-oriented treatment of calculus as a second year at Georgia Tech. Somehow, in the interceding years, I learned to value intuition and simple statements.

I have since completely changed my view on the mean value theorem. I now consider essentially all of one variable calculus to be the Mean Value Theorem, perhaps in various forms or disguises. In my earlier note An Intuitive Introduction to Calculus, we state and prove the Mean Value Theorem, and then show that we can prove the Fundamental Theorem of Calculus with the Mean Value Theorem and the Intermediate Value Theorem (which also felt silly to me as a high schooler, but which is not silly).

In this brief note, I want to consider one small aspect of the Mean Value Theorem: can the “mean value” be chosen continuously as a function of the endpoints? To state this more clearly, first recall the theorem:

Suppose ${f}$ is a differentiable real-valued function on an interval ${[a,b]}$. Then there exists a point ${c}$ between ${a}$ and ${b}$ such that $$\frac{f(b) – f(a)}{b – a} = f'(c), \tag{1}$$
which is to say that there is a point where the slope of ${f}$ is the same as the average slope from ${a}$ to ${b}$.

What if we allow the interval to vary? Suppose we are interested in a differentiable function ${f}$ on intervals of the form ${[0,b]}$, and we let ${b}$ vary. Then for each choice of ${b}$, the mean value theorem tells us that there exists ${c_b}$ such that $$\frac{f(b) – f(0)}{b} = f'(c_b).$$
Then the question we consider today is, as a function of ${b}$, can ${c_b}$ be chosen continuously? We will see that we cannot, and we’ll see explicit counterexamples. This, after the fold.

## Review of How Not to Be Wrong by Jordan Ellenberg

Almost 100 years ago as I write this, on 21 October 1914, Martin Gardner was born. He wrote a popular “Mathematical Games” column for Scientific American from 1957 to 1981, introducing a wide audience to fun and recreational mathematics. His influence and writing were so profound that many of his subjects are still popular today. Notable examples include:

## Conway’s Game of Life

After its first public appearance in Gardner’s Scientific American column in October 1970, Conway’s Game of Life grew to enormous popularity and interest.

## Flexagons

The column “Mathematical Games” started with Gardner’s article on flexagons. The editor of Scientific American thought Gardner’s flexagons was engaging, and suggested that Gardner write a regular column. Fortunately, Gardner acceded.

## Public Key Cryptography

The first major public key cryptosystem, the RSA system, first appeared in Gardner’s August 1977 column. (Their formal paper appeared in 1978 in Communications of the Association for Computing Machinery). Now, public key cryptography is used everywhere, all the time, mostly without the conscious thought of the user.

Martin Gardner was in constant contact with many mathematicians, and always looked for interesting recreational mathematics to share with his readers. He inspired an entire generation of mathematicians and math enthusiasts. He also inspired others to pursue popular mathematics writing (and blogging, and even youtubing, such as the excellent series produced by Vi Hart).

The current issue(October/November 2014) of the MAA Focus, a mathematical newsmagazine from the American Mathematical Association, features Martin Gardner. In addition to describing some of Gardner’s contributions and legacy, the article includes a quote from Gardner: “I’ve always thought that the best way to get students interested in mathematics is to give them something that has a recreational flavor — a puzzle or a magic trick or a paradox, or something like that. I think that hooks their interest faster than anything else.” Later, he is also quoted to say “It’s good to to know much about mathematics. I have to work hard to understand anything that I am writing about, so that makes it easier for me to explain it, perhaps, in a way that the general public can understand.”

(As an aside, the Doctor has noted the lack of recreational mathematics in school too)

It is in this noble succession that I consider Jordan Ellenberg’s recent book How Not to Be Wrong: The Power of Mathematical Thinking, for Ellenberg has made a significant effort to make an approachable, inspiring work (even though it’s not recreational math). After reading the book, it seems clear to me that Ellenberg’s beliefs about how to interest people in mathematics mirror Gardner’s. This book is full of paradoxes and magic tricks. Or rather this book is full of captivating stories each centered around a problem or misconception, and whose resolution comes through careful and explicit reasoning.

Ellenberg presents mathematics as “an extension of common sense by other means,” but I get the feeling that he means to blur what it means to be “common sense” and what it means to be “other means” as the book advances. Much as a textbook or college course eases students into the subject, starting simple and getting progressively deeper, Ellenberg starts with problems that are undeniably simple logic and ends with ideas that are truly profound.

The reader is engaged within the first five pages. After a quick justification about learning mathematics — mathematics is reason, and allows for deeper understanding of the world around us — Ellenberg demonstrates that this is not an abstract book about abstract mathematics, but is instead full of actual examples. And he begins with a tale about Abraham Wald, a mathematician and statistician considering where to reinforce the armor of planes during the Second World War. The writing is conversational, as though this were an oral history transcribed and kept safe in written word. To support the claim that mathematics is an extension of common sense, the book alternates between explaining and setting up problems and careful, but common sensical, analysis. And most of the time, he proceeds without overwhelming the reader with arithmetic details or a flood of equations.

Mathematics is not arithmetic. Yes, arithmetic is one tiny part of mathematics, but mathematics is much more. The typical student is overexposed to arithmetic and underexposed to mathematics. Stories like Abraham Wald seek to rectify this imbalance by demonstrating more mathematics. And later stories, like the chapter about challenges facing Netflix analytics — how does Netflix know what movies to recommend? — use equations and arithmetic to support the underlying mathematics.

It might seem like a delicate arrangement to go through so much mathematical reasoning with so little arithmetic, but Ellenberg succeeds. Part of this is certainly that this is a book full of what he calls “simple and profound” mathematics. The simple is what allows the conversational tone. The profound is what makes it interesting. But the larger part is that Ellenberg’s thesis, math is common sense and allows for deeper understanding of the world around us, is fundamentally true. And quite beautiful.

Ellenberg does truly get to some profound mathematics. Some of the chapter is common material for popular mathematics: survivorship bias, statistics lie, and the high likelihood of coincidence, for instance. But in many ways, he goes deeper, and more profound, than I would expect. He examines with great detail the divide in statistics between Fisher’s “significance testing” and Pearson’s “hypothesis testing,” and evidences deep dissatisfaction with the accepted standard in experiments and hypothesis testing of “Reductio ad unlikely.” He not only mentions and cautions misunderstandings about conditional probabilities, but also undertakes Bayesian inference as a decision-making model, perhaps even a good model for how we make our own decisions.

He makes a strong point that sometimes, mathematics does not have all the answers. Or more pertinently, sometimes the answer from mathematics is inaction. For this is action, this not being sure! Although Ellenberg never says it, he hints at the fact that saying anything meaningful about anything at all can be really hard, and sometimes even impossible.

Of course, the book is not without its flaws. Most chapters have their central players, central ideas, and a sort of take home message. But I found the last two chapters to suffer from a bit of indigestion. This might be because they concern the very idea of “existence.” Does public opinion exist as a measurable, or even well-defined quantity? Lurking beneath these two chapters is the problem of designing good, accurate voting systems. Though Ellenberg emphasizes Arrow’s paradox on the impossibility of having a rank order voting system that accurately reflects community opinion, this message is muddied.

And though Ellenberg confronts some of the common misunderstandings of mathematics, like thinking that all mathematics is simple arithmetic and boring, there is one more misunderstanding that I wish he tackled more explicitly, which is that there is room for more mathematics all the time. It is easy to read this book, look at how common sense and mathematics can feel so alike, and sleep comfortably under the sheets at night knowing that these mathematicians have solved all these hard problems for us. But really, more mathematics is needed in both academic and ordinary walks of life.

I think it should also be mentioned that Ellenberg’s rejection of the cult of genius, including the idea that it takes a genius to succeed in mathematics and the far worse idea that we depend on geniuses to progress the sciences, is both good and from an interesting position. Ellenberg was one of the child “geniuses” in the Study of Mathematically Precocious Youth, which found and followed high-performing children and followed them throughout their lives. In an article in the Wall Street Journal, Ellenberg wrote of the dangers of the cult of genius. He also wrote that we need more math majors who don’t become mathematicians. Math and the sciences are not only progressed by the top 0.01 percent, but instead are more often advanced by the hard work and determination of someone who pursued their interests and ignored the cult. For more, read his article. It’s not very long, and it rounds out the end of “How Not to Be Wrong” very nicely.

Ultimately, “How Not to Be Wrong” is a great read that I highly recommend, both to a mathematical and non-mathematical crowd. It’s an engaging and educational read that’s not afraid to do some real math. After finishing the book, part of me wondered if more mathematics should be taught against the history of the mathematicians themselves. Why is it that I learned the development and logic of chemistry along the lives of the chemists of the past while in middle and high school, but I heard almost no mathematician’s name until I began to major in college? This book is literally a tour-de-mathematical-force throughout recent history, and in the spirit of Martin Gardner. I look forward to reading more of his work.

Posted in Book Review, Mathematics | | Leave a comment

## Another proof of Wilson’s Theorem

While teaching a largely student-discovery style elementary number theory course to high schoolers at the Summer@Brown program, we were looking for instructive but interesting problems to challenge our students. By we, I mean Alex Walker, my academic little brother, and me. After a bit of experimentation with generators and orders, we stumbled across a proof of Wilson’s Theorem, different than the standard proof.

Wilson’s theorem is a classic result of elementary number theory, and is used in some elementary texts to prove Fermat’s Little Theorem, or to introduce primality testing algorithms that give no hint of the factorization.

Theorem 1 (Wilson’s Theorem) For a prime number ${p}$, we have $$(p-1)! \equiv -1 \pmod p. \tag{1}$$

The theorem is clear for ${p = 2}$, so we only consider proofs for “odd primes ${p}$.”

The standard proof of Wilson’s Theorem included in almost every elementary number theory text starts with the factorial ${(p-1)!}$, the product of all the units mod ${p}$. Then as the only elements which are their own inverses are ${\pm 1}$ (as ${x^2 \equiv 1 \pmod p \iff p \mid (x^2 – 1) \iff p\mid x+1}$ or ${p \mid x-1}$), every element in the factorial multiples with its inverse to give ${1}$, except for ${-1}$. Thus ${(p-1)! \equiv -1 \pmod p.} \diamondsuit$

Now we present a different proof.

Take a primitive root ${g}$ of the unit group ${(\mathbb{Z}/p\mathbb{Z})^\times}$, so that each number ${1, \ldots, p-1}$ appears exactly once in ${g, g^2, \ldots, g^{p-1}}$. Recalling that ${1 + 2 + \ldots + n = \frac{n(n+1)}{2}}$ (a great example of classical pattern recognition in an elementary number theory class), we see that multiplying these together gives ${(p-1)!}$ on the one hand, and ${g^{(p-1)p/2}}$ on the other.

As ${g^{(p-1)/2}}$ is a solution to ${x^2 \equiv 1 \pmod p}$, and it is not ${1}$ since ${g}$ is a generator and thus has order ${p-1}$. So ${g^{(p-1)/2} \equiv -1 \pmod p}$, and raising ${-1}$ to an odd power yields ${-1}$, completing the proof $\diamondsuit$.

After posting this, we have since seen that this proof is suggested in a problem in Ireland and Rosen’s extremely good number theory book. But it was pleasant to see it come up naturally, and it’s nice to suggest to our students that you can stumble across proofs.

It may be interesting to question why ${x^2 \equiv 1 \pmod p \iff x \equiv \pm 1 \pmod p}$ appears in a fundamental way in both proofs.

This post appears on the author’s personal website davidlowryduda.com and on the Math.Stackexchange Community Blog math.blogoverflow.com. It is also available in pdf note form. It was typeset in \TeX, hosted on WordPress sites, converted using the utility github.com/davidlowryduda/mse2wp, and displayed with MathJax.

Posted in Expository, Math.NT, Mathematics | | 1 Comment

## A bit more about partial fraction decomposition

This is a short note written for my students in Math 170, talking about partial fraction decomposition and some potentially confusing topics that have come up. We’ll remind ourselves what partial fraction decomposition is, and unlike the text, we’ll prove it. Finally, we’ll look at some pitfalls in particular. All this after the fold.

1. The Result Itself

We are interested in rational functions and their integrals. Recall that a polynomial ${f(x)}$ is a function of the form ${f(x) = a_nx^n + a_{n-1}x^{n-1} + \cdots + a_1x + a_0}$, where the ${a_i}$ are constants and ${x}$ is our “intederminate” — and which we commonly imagine standing for a number (but this is not necessary).

Then a rational function ${R(x)}$ is a ratio of two polynomials ${p(x)}$ and ${q(x)}$, $$R(x) = \frac{p(x)}{q(x)}.$$

Then the big result concerning partial fractions is the following:

If ${R(x) = \dfrac{p(x)}{q(x)}}$ is a rational function and the degree of ${p(x)}$ is less than the degree of ${q(x)}$, and if ${q(x)}$ factors into $$q(x) = (x-r_1)^{k_1}(x-r_2)^{k_2} \dots (x-r_l)^{k_l} (x^2 + a_{1,1}x + a_{1,2})^{v_1} \ldots (x^2 + a_{m,1}x + a_{m,2})^{v_m},$$
then ${R(x)}$ can be written as a sum of fractions of the form ${\dfrac{A}{(x-r)^k}}$ or ${\dfrac{Ax + B}{(x^2 + a_1x + a_2)^v}}$, where in particular

• If ${(x-r)}$ appears in the denominator of ${R(x)}$, then there is a term ${\dfrac{A}{x – r}}$
• If ${(x-r)^k}$ appears in the denominator of ${R(x)}$, then there is a collection of terms $$\frac{A_1}{x-r} + \frac{A_2}{(x-r)^2} + \dots + \frac{A_k}{(x-r)^k}$$
• If ${x^2 + ax + b}$ appears in the denominator of ${R(x)}$, then there is a term ${\dfrac{Ax + B}{x^2 + ax + b}}$
• If ${(x^2 + ax + b)^v}$ appears in the denominator of ${R(x)}$, then there is a collection of terms $$\frac{A_1x + B_1}{x^2 + ax + b} + \frac{A_2 x + B_2}{(x^2 + ax + b)^2} + \dots \frac{A_v x + B_v}{(x^2 + ax + b)^v}$$

where in each of these, the capital ${A}$ and ${B}$ represent some constants that can be solved for through basic algebra.

I state this result this way because it is the one that leads to integrals that we can evaluate. But in principle, this theorem can be restated in a couple different ways.

Let’s parse this theorem through an example – the classic example, after the fold.

## A response to FtYoU’s question on Reddit

FtYou writes

Hello everyone ! There is a concept I have a hard time getting my head wrap around. If you have a Vector Space V and a subspace W, I understand that you can find the least square vector approximation from any vector in V to a vector in W. And this correspond to the projection of V to the subspace W. Now , for data fitting … Let’s suppose you have a bunch of points (xi, yi) where you want to fit a set a regressors so you can approximate yi by a linear combination of the regressors lets say ( 1, x, x2 … ). What Vector space are we talking about ? If we consider the Vector space of function R -> R, in what subspace are we trying to map these vectors ?

I have a hard time merging these two concepts of projecting to a vector space and fitting the data. In the latter case what vector are we using ? The functions ? If so I understand the choice of regressors ( which constitute a basis for the vector space ) But what’s the role of the (xi,yi) ?

I want to point out that I understand completely how to build the matrices to get Y = AX and solving using least square approx. What I miss is the big picture. The linear algebra picture. Thanks for any help !

We’ll go over this by closely examining and understanding an example. Suppose we have the data points ${(x_i, y_i)}$

$\displaystyle \begin{cases} (x_1, y_1) = (-1,8) \\ (x_2, y_2) = (0,8) \\ (x_3, y_3) = (1,4) \\ (x_4, y_4) = (2,16) \end{cases},$

and we have decided to try to find the best fitting quadratic function. What do we mean by best-fitting? We mean that we want the one that approximates these data points the best. What exactly does that mean? We’ll see that before the end of this note – but in linear algebra terms, we are projecting on to some sort of vector space – we claim that projection is the ”best-fit” possible.