While idly thinking while heading back from the office, and then more later while thinking after dinner with my academic little brother Alex Walker and my future academic little sister-in-law Sara Schulz, we began to think about $2017$, the number.

## General Patterns

• 2017 is a prime number. 2017 is the 306th prime. The 2017th prime is 17539.
• As 2011 is also prime, we call 2017 a sexy prime.
• 2017 can be written as a sum of two squares,
$$2017 = 9^2 +44^2,$$
and this is the only way to write it as a sum of two squares.
• Similarly, 2017 appears as the hypotenuse of a primitive Pythagorean triangle,
$$2017^2 = 792^2 + 1855^2,$$
and this is the only such right triangle.
• 2017 is uniquely identified as the first odd prime that leaves a remainder of $2$ when divided by $5$, $13$, and $31$. That is,
$$2017 \equiv 2 \pmod {5, 13, 31}.$$
• In different bases,
\begin{align} (2017)_{10} &= (2681)_9 = (3741)_8 = (5611)_7 = (13201)_6 \notag \\ &= (31032)_5 = (133201)_4 = (2202201)_3 = (11111100001)_2 \notag \end{align}
The base $2$ and base $3$ expressions are sort of nice, including repetition.

## Counting to 20

$$\begin{array}{ll} 1 = 2\cdot 0 + 1^7 & 11 = 2 + 0! + 1 + 7 \\ 2 = 2 + 0 \cdot 1 \cdot 7 & 12 = 20 – 1 – 7 = -2 + (0! + 1)\cdot 7 \\ 3 = (20 + 1)/7 = 20 – 17 & 13 = 20 – 1 \cdot 7 \\ 4 = -2 + 0 – 1 + 7 & 14 = 20 – (-1 + 7) \\ 5 = -2 + 0\cdot 1 + 7 & 15 = -2 + 0 + 17 \\ 6 = -2 + 0 + 1 + 7 & 16 = -(2^0) + 17 \\ 7 = 2^0 – 1 + 7 & 17 = 2\cdot 0 + 17 \\ 8 = 2 + 0 – 1 + 7 & 18 = 2^0 + 17 \\ 9 = 2 + 0\cdot 1 + 7 & 19 = 2\cdot 0! + 17 \\ 10 = 2 + 0 + 1 + 7 & 20 = 2 + 0! + 17. \end{array}$$

In each expression, the digits $2, 0, 1, 7$ appear, in order, with basic mathematical symbols. I wonder what the first number is that can’t be nicely expressed (subjectively, of course)?

## Iterative Maps on 2017

Now let’s look at less-common manipulations with numbers.

• The digit sum of $2017$ is $10$, which has digit sum $1$.
• Take $2017$ and its reverse, $7102$. The difference between these two numbers is $5085$. Repeating gives $720$. Continuing, we get
$$2017 \mapsto 5085 \mapsto 720 \mapsto 693 \mapsto 297 \mapsto 495 \mapsto 99 \mapsto 0.$$
So it takes seven iterations to hit $0$, where the iteration stabilizes.
• Take $2017$ and its reverse, $7102$. Add them. We get $9119$, a palindromic number. Continuing, we get
\begin{align} 2017 &\mapsto 9119 \mapsto 18238 \mapsto 101519 \notag \\ &\mapsto 1016620 \mapsto 1282721 \mapsto 2555542 \mapsto 5011094 \mapsto 9912199. \notag \end{align}
It takes one map to get to the first palindrome, and then seven more maps to get to the next palindrome. Another five maps would yield the next palindrome.
• Rearrange the digits of $2017$ into decreasing order, $7210$, and subtract the digits in increasing order, $0127$. This gives $7083$. Repeating once gives $8352$. Repeating again gives $6174$, at which point the iteration stabilizes. This is called Kaprekar’s Constant.
• Consider Collatz: If $n$ is even, replace $n$ by $n/2$. Otherwise, replace $n$ by $3\cdot n + 1$. On $2017$, this gives
\begin{align} 2017 &\mapsto 6052 \mapsto 3026 \mapsto 1513 \mapsto 4540 \mapsto \notag \\ &\mapsto 2270 \mapsto 1135 \mapsto 3406 \mapsto 1703 \mapsto 5110 \mapsto \notag \\ &\mapsto 2555 \mapsto 7666 \mapsto 3833 \mapsto 11500 \mapsto 5750 \mapsto \notag \\ &\mapsto 2875 \mapsto 8626 \mapsto 4313 \mapsto 12940 \mapsto 6470 \mapsto \notag \\ &\mapsto 3235 \mapsto 9706 \mapsto 4853 \mapsto 14560 \mapsto 7280 \mapsto \notag \\ &\mapsto 3640 \mapsto 1820 \mapsto 910 \mapsto 455 \mapsto 1366 \mapsto \notag \\ &\mapsto 683 \mapsto 2050 \mapsto 1025 \mapsto 3076 \mapsto 1538 \mapsto \notag \\ &\mapsto 769 \mapsto 2308 \mapsto 1154 \mapsto 577 \mapsto 1732 \mapsto \notag \\ &\mapsto 866 \mapsto 433 \mapsto 1300 \mapsto 650 \mapsto 325 \mapsto \notag \\ &\mapsto 976 \mapsto 488 \mapsto 244 \mapsto 122 \mapsto 61 \mapsto \notag \\ &\mapsto 184 \mapsto 92 \mapsto 46 \mapsto 23 \mapsto 70 \mapsto \notag \\ &\mapsto 35 \mapsto 106 \mapsto 53 \mapsto 160 \mapsto 80 \mapsto \notag \\ &\mapsto 40 \mapsto 20 \mapsto 10 \mapsto 5 \mapsto 16 \mapsto \notag \\ &\mapsto 8 \mapsto 4 \mapsto 2 \mapsto 1 \notag \end{align}
It takes $69$ steps to reach the seemingly inevitable $1$. This is much shorter than the $113$ steps necessary for $2016$ or the $113$ (yes, same number) steps necessary for $2018$.
• Consider the digits $2,1,7$ (in that order). To generate the next number, take the units digit of the product of the previous $3$. This yields
$$2,1,7,4,8,4,8,6,2,6,2,4,8,4,\ldots$$
This immediately jumps into a periodic pattern of length $8$, but $217$ is not part of the period. So this is preperiodic.
• Consider the digits $2,0,1,7$. To generate the next number, take the units digit of the sum of the previous $4$. This yields
$$2,0,1,7,0,8,6,1,5,0,2,8,\ldots, 2,0,1,7.$$
After 1560 steps, this produces $2,0,1,7$ again, yielding a cycle. Interestingly, the loop starting with $2018$ and $2019$ also repeat after $1560$ steps.
• Take the digits $2,0,1,7$, square them, and add the result. This gives $2^2 + 0^2 + 1^2 + 7^2 = 54$. Repeating, this gives
\begin{align} 2017 &\mapsto 54 \mapsto 41 \mapsto 17 \mapsto 50 \mapsto 25 \mapsto 29 \notag \\ &\mapsto 85 \mapsto 89 \mapsto 145 \mapsto 42 \mapsto 20 \mapsto 4 \notag \\ &\mapsto 16 \mapsto 37 \mapsto 58 \mapsto 89\notag\end{align}
and then it reaches a cycle.
• Take the digits $2,0,1,7$, cube them, and add the result. This gives $352$. Repeating, we get $160$, and then $217$, and then $352$. This is a very tight loop.

## A Few Matrices

• One can make $2017$ from determinants of basic matrices in a few ways. For instance,
\begin{align} \left \lvert \begin{pmatrix} 1&2&3 \\ 4&6&7 \\ 5&8&9 \end{pmatrix}\right \rvert &= 2, \qquad \left \lvert \begin{pmatrix} 1&2&3 \\ 4&5&6 \\ 7&8&9 \end{pmatrix}\right \rvert &= 0\notag \\ \left \lvert \begin{pmatrix} 1&2&3 \\ 4&7&6 \\ 5&9&8 \end{pmatrix}\right \rvert &= 1 , \qquad \left \lvert \begin{pmatrix} 1&2&3 \\ 4&5&7 \\ 6&8&9 \end{pmatrix}\right \rvert &= 7\notag \end{align}
The matrix with determinant $0$ has the numbers $1$ through $9$ in the most obvious configuration. The other matrices are very close in configuration.
• Alternately,
\begin{align} \left \lvert \begin{pmatrix} 1&2&3 \\ 5&6&9 \\ 4&8&7 \end{pmatrix}\right \rvert &= 20 \notag \\ \left \lvert \begin{pmatrix} 1&2&3 \\ 6&8&9 \\ 5&7&4 \end{pmatrix}\right \rvert &= 17 \notag \end{align}
So one can form $20$ and $27$ separately from determinants.
• One cannot make $2017$ from a determinant using the digits $1$ through $9$ (without repetition).
• If one uses the digits from the first $9$ primes, it is interesting that one can choose configurations with determinants equal to $2016$ or $2018$, but there is no such configuration with determinant equal to $2017$.
Posted in Mathematics | Tagged , | 1 Comment

## Revealing zero in fully homomorphic encryption is a Bad Thing

When I was first learning number theory, cryptography seemed really fun and really practical. I thought elementary number theory was elegant, and that cryptography was an elegant application. As I continued to learn more about mathematics, and in particular modern mathematics, I began to realize that decades of instruction and improvement (and perhaps of more useful points of view) have simplified the presentation of elementary number theory, and that modern mathematics is less elegant in presentation.

Similarly, as I learned more about cryptography, I learned that though the basic ideas are very simple, their application is often very inelegant. For example, the basis of RSA follows immediately from Euler’s Theorem as learned while studying elementary number theory, or alternately from Lagrange’s Theorem as learned while studying group theory or abstract algebra. And further, these are very early topics in these two areas of study!

But a naive implementation of RSA is doomed (For that matter, many professional implementations have their flaws too). Every now and then, a very clever expert comes up with a new attack on popular cryptosystems, generating new guidelines and recommendations. Some guidelines make intuitive sense [e.g. don’t use too small of an exponent for either the public or secret keys in RSA], but many are more complicated or designed to prevent more sophisticated attacks [especially side-channel attacks].

In the summer of 2013, I participated in the ICERM IdeaLab working towards more efficient homomorphic encryption. We were playing with existing homomorphic encryption schemes and trying to come up with new methods. One guideline that we followed is that an attacker should not be able to recognize an encryption of zero. This seems like a reasonable guideline, but I didn’t really understand why, until I was chatting with others at the 2017 Joint Mathematics Meetings in Atlanta.

It turns out that revealing zero isn’t just against generally sound advice. Revealing zero is a capital B capital T Bad Thing.

## Basic Setup

For the rest of this note, I’ll try to identify some of this reasoning.

In a typical cryptosystem, the basic setup is as follows. Andrew has a message that he wants to send to Beatrice. So Andrew converts the message into a list of numbers $M$, and uses some sort of encryption function $E(\cdot)$ to encrypt $M$, forming a ciphertext $C$. We can represent this as $C = E(M)$. Andrew transmits $C$ to Beatrice. If an eavesdropper Eve happens to intercept $C$, it should be very hard for Eve to recover any information about the original message from $C$. But when Beatrice receives $C$, she uses a corresponding decryption function $D(\cdot)$ to decrypt $C$, $M = d(C)$.

Often, the encryption and decryption techniques are based on number theoretic or combinatorial primitives. Some of these have extra structure (or at least they do with basic implementation). For instance, the RSA cryptosystem involves a public exponent $e$, a public mod $N$, and a private exponent $d$. Andrew encrypts the message $M$ by computing $C = E(M) \equiv M^e \bmod N$. Beatrice decrypts the message by computing $M = C^d \equiv M^{ed} \bmod N$.

Notice that in the RSA system, given two messages $M_1, M_2$ and corresponding ciphertexts $C_1, C_2$, we have that

E(M_1 M_2) \equiv (M_1 M_2)^e \equiv M_1^e M_2^e \equiv E(M_1) E(M_2) \pmod N. \notag

The encryption function $E(\cdot)$ is a group homomorphism. This is an example of extra structure.

A fully homomorphic cryptosystem has an encryption function $E(\cdot)$ satisfying both $E(M_1 + M_2) = E(M_1) + E(M_2)$ and $E(M_1M_2) = E(M_1)E(M_2)$ (or more generally an analogous pair of operations). That is, $E(\cdot)$ is a ring homomorphism.

This extra structure allows for (a lot of) extra utility. A fully homomorphic $E(\cdot)$ would allow one to perform meaningful operations on encrypted data, even though you can’t read the data itself. For example, a clinic could store (encrypted) medical information on an external server. A doctor or nurse could pull out a cellphone or tablet with relatively little computing power or memory and securely query the medical data. Fully homomorphic encryption would allow one to securely outsource data infrastructure.

A different usage model suggests that we use a different mental model. So suppose Alice has sensitive data that she wants to store for use on EveCorp’s servers. Alice knows an encryption method $E(\cdot)$ and a decryption method $D(\cdot)$, while EveCorp only ever has mountains of ciphertexts, and cannot read the data [even though they have it].

## Why revealing zero is a Bad Thing

Let us now consider some basic cryptographic attacks. We should assume that EveCorp has access to a long list of plaintext messages $M_i$ and their corresponding ciphertexts $C_i$. Not everything, but perhaps from small leaks or other avenues. Among the messages $M_i$ it is very likely that there are two messages $M_1, M_2$ which are relatively prime. Then an application of the Euclidean Algorithm gives a linear combination of $M_1$ and $M_2$ such that

M_1 x + M_2 y = 1 \notag

for some integers $x,y$. Even though EveCorp doesn’t know the encryption method $E(\cdot)$, since we are assuming that they have access to the corresponding ciphertexts $C_1$ and $C_2$, EveCorp has access to an encryption of $1$ using the ring homomorphism properties:
\label{eq:encryption_of_one}
E(1) = E(M_1 x + M_2 y) = x E(M_1) + y E(M_2) = x C_1 + y C_2.

By multiplying $E(1)$ by $m$, EveCorp has access to a plaintext and encryption of $m$ for any message $m$.

Now suppose that EveCorp can always recognize an encryption of $0$. Then EveCorp can mount a variety of attacks exposing information about the data it holds.

For example, EveCorp can test whether a particular message $m$ is contained in the encrypted dataset. First, EveCorp generates a ciphertext $C_m$ for $m$ by multiplying $E(1)$ by $m$, as in \eqref{eq:encryption_of_one}. Then for each ciphertext $C$ in the dataset, EveCorp computes $C – C_m$. If $m$ is contained in the dataset, then $C – C_m$ will be an encryption of $0$ for the $C$ corresponding to $m$. EveCorp recognizes this, and now knows that $m$ is in the data. To be more specific, perhaps a list of encrypted names of medical patients appears in the data, and EveCorp wants to see if JohnDoe is in that list. If they can recognize encryptions of $0$, then EveCorp can access this information.

And thus it is unacceptable for external entities to be able to consistently recognize encryptions of $0$.

Up to now, I’ve been a bit loose by saying “an encryption of zero” or “an encryption of $m$”. The reason for this is that to protect against recognition of encryptions of $0$, some entropy is added to the encryption function $E(\cdot)$, making it multivalued. So if we have a message $M$ and we encrypt it once to get $E(M)$, and we encrypt $M$ later and get $E'(M)$, it is often not true that $E(M) = E'(M)$, even though they are both encryptions of the same message. But these systems are designed so that it is true that $C(E(M)) = C(E'(M)) = M$, so that the entropy doesn’t matter.

This is a separate matter, and something that I will probably return to later.

## Math 100 Fall 2016: Concluding Remarks

It is that time of year. Classes are over. Campus is emptying. Soon it will be mostly emptiness, snow, and grad students (who of course never leave).

I like to take some time to reflect on the course. How did it go? What went well and what didn’t work out? And now that all the numbers are in, we can examine course trends and data.

Since numbers are direct and graphs are pretty, let’s look at the numbers first.

## Math 100 grades at a glance

Let’s get an understanding of the distribution of grades in the course, all at once.

These are classic box plots. The center line of each box denotes the median. The left and right ends of the box indicate the 1st and 3rd quartiles. As a quick reminder, the 1st quartile is the point where 25% of students received that grade or lower. The 3rd quartile is the point where 75% of students received that grade or lower. So within each box lies 50% of the course.

Each box has two arms (or “whiskers”) extending out, indicating the other grades of students. Points that are plotted separately are statistical outliers, which means that they are $1.5 \cdot (Q_3 – Q_1)$ higher than $Q_3$ or lower than $Q_1$ (where $Q_1$ denotes the first quartile and $Q_3$ indicates the third quartile).

Within each blob, you’ll notice an embedded box-and-whisker graph. The white dots indicate the medians, and the thicker black parts indicate the central 50% of the grade. The width of the colored blobs roughly indicate how many students scored within that region. [As an aside, each blob actually has the same area, so the area is a meaningful data point].

So what can we determine from these graphs? Firstly, students did extremely well in their recitation sections and on the homework. I am perhaps most stunned by the tightness of the homework distribution. Remarkably, 75% of students had at least a 93 homework average. Recitation scores were very similar.

I also notice some patterns between the two midterms and final. The median on the first midterm was very high and about 50% of students earned a score within about 12 points of the median. The median on the second midterm was a bit lower, but the spread of the middle 50% of students was about the same. However the lower end was significantly lower on the second midterm in comparison to the first midterm. The median on the final was significantly lower, and the 50% spread was much, much larger.

Looking at the Overall grade, it looks very similar to the distribution of the first midterm, except shifted a bit.

It is interesting to note that that Recitation (10%), Homework (20%), and the First Midterm (20%) accounted for 50% of the course grade; the Second Midterm (20%) and the Final (30%) accounted for the other 50% of the course grade. The Recitation, Homework, and First Midterm grades pulled the Overall grade distribution up, while the Second Midterm and Final pulled the Overall grade distribution down.

## Correlation between assignments and Overall Grade

I post the question: was any individual assignment type a good predictor of the final grade? For example, to what extent can we predict your final grade based on your First Midterm grade?

No, doing well on homework is a terrible predictor of final grade. The huge vertical cluster of dots indicates that the overall grades vary significantly over a very small amount of homework. However, I note that doing poorly on homework is a great predictor of doing poorly overall. No one whose homework average was below an 80 got an A in the course. Having a homework grade below a 70 is a very strong indicator of failing the course. In terms of Pearson’s R correlation, one might say that about 40% of overall performance is predicted from performance on homework (which is very little).

Although drastic, this is in line with my expectations for calculus courses. This is perhaps a bit more extreme than normal — the level of clustering in the homework averages is truly stunning. Explaining this is a bit hard. It is possible to get homework help from the instructor or TA, or to work with other students, or to get help from the Math Resource Center or other tutoring. It is also possible to cheat, either with a solutions manual (which I know some students have), or a paid answer service (which I also witnessed), or to check answers on a computer algebra system like WolframAlpha. Each of these weakens the relationship between homework as an indicator of mastery.

In the calculus curriculum at Brown, I think it’s safe to say that homework plays a formative role instead of a normative role. It serves to provide opportunities for students to work through and learn material, and we don’t expect the grades to correspond strongly to understanding. To that end, half of the homework isn’t even collected.

The two midterms each correlate pretty strongly with Overall grade. In particular, the second midterm visually indicates really strong correlation. Statistically speaking (i.e. from Pearson’s R), it turns out that 67% of the Overall grade can be predict from the First Midterm (higher than might be expected) and 80% can be predicted from the Second Midterm (which is really, really high).

If we are willing to combine some pieces of information, the Homework and First Midterm (together) predict 77% of the Overall grade. As each student’s initial homework effort is very indicative of later homework, this means that we can often predict a student’s final grade (to a pretty good accuracy) as early as the first midterm.

(The Homework and the Second Midterm together predict 85% of the Overall grade. The two midterms together predict 88% of the Overall grade.)

This has always surprised me a bit, since for many students the first midterm is at least partially a review of material taught before. However, this course is very cumulative, so it does make sense that doing poorly on earlier tests indicates a hurdle that must be overcome in order to succeed on later tests. This is one of the unforgiving aspects of math, the sciences, and programming — early disadvantages compound. I’ve noted roughly this pattern in the past as well.

However the correlation between the Final and the Overall grade is astounding. I mean, look at how much the relationship looks like a line. Even the distributions (shown around the edges) look similar. Approximately 90% of the Overall grade is predicted by the grade on the Final Exam.

This is a bit atypical. One should expect a somewhat high correlation, as the final exam is cumulative and covers everything from the course (or at least tries to). But 90% is extremely high.

I think one reason why this occurred this semester is that the final exam was quite hard. It was distinctly harder than the midterms (though still easier than many of the homework problems). A hard final gives more opportunities for students who really understand the material to demonstrate their mastery. Conversely, a hard final punishes students with only a cursory understanding. Although stressful, I’ve always been a fan of exams that are difficult enough to distinguish between students, and to provide a chance for students to catch up. See Chances for a Comeback below for more on this.

Related statistics of interest might concern to what extent performance on the First Midterm predicts performance on the Second Midterm (44%) or the Final Exam (48%), or to what extent the Second Midterm predicts performance on the Final Exam (63%).

## Homework and Recitations

As mentioned above, homework performance is a terrible predictor of course grade. I thought it was worth diving into a bit more. Does homework performance predict anything well? The short answer is not really.

Plotting Homework grade vs the First Midterm shows such a lack of meaning that it doesn’t even make sense to try to draw a line of best fit.

To be fair, homework is a better predictor of performance on the Second Midterm and Final Exam, but it’s still very bad.

Here’s a related question: what about Recitation sections? Are these good predictors of any other aspect of the course?

Plotting Recitation vs Homework is sort of interesting. Evidently, most people did very well on both homework and recitation. It is perhaps no surprise that most students who did very well in Recitation also did very well on their Homework, and vice versa. However it turns out that there are more people with high recitation grades and low homework grades than the other way around. But thinking about it, this makes sense.

These distributions are so tight that it still doesn’t make sense to try to draw a line of best fit or to talk about Pearson coefficients – most variation is simply too small to be meaningful.

Together, Homework and Recitation predict a measly 50% of the Overall grade of the course (in the Pearson’s R sense). One would expect more, as Homework and Recitation are directly responsible for 30% of the Overall grade, and one would expect homework and recitation to correlate at least somewhat meaningfully with the rest of graded content of the course, right?

I guess not.

So what does this mean about recitation and homework? Should we toss them aside? Does something need to be changed?

I would say “Not necessarily,” as it is important to recognize that not all grades are equal. Both homework and recitation are the places for students to experiment and learn. Recitations are supposed to be times where students are still learning material. They are to be inoffensive and safe, where students can mess up, fall over, and get back up again with the help of their peers and TA. I defend the lack of stress on grade or challenging and rigorous examination during recitation.

Homework is sort of the same, and sort of completely different. What gives me pause concerning homework is that homework is supposed to be the barometer by which students can measure their own understanding. When students ask us about how they should prepare for exams, our usual response is “If you can do all the homework (including self-check) without referencing the book, then you will be well-prepared for the exam.” If homework grade is such a poor predictor of exam grades, then is it possible that homework gives a poor ruler for students to measure themselves by?

I’m not sure. Perhaps it would be a good idea to indicate all the relevant questions in the textbook so that students have more problems to work on. In theory, students could do this themselves (and for that matter, I’m confident that such students would do very well in the course). But the problem is that we only cover a subset of the material in most sections of the textbook, and many questions (even those right next to ones we assign) require ideas or concepts that we don’t teach.

On the other hand, learning how to actually learn is a necessary skill, and probably one that most people struggle with when they first actually have to learn it. It’s necessary to learn it sooner or later.

## Chances for a Comeback

The last numerical aspect I’ll consider is about whether or not it is possible to come back after doing badly on an earlier assessment. There are two things to consider: whether it is actually feasible or not, and whether any students did make it after a poor initial/early performance.

As to whether it is possible, the answer is yes (but it may be hard). And the reason why is that the Second Midterm and Final grades were each relatively low. It may be counterintuitive, but in order to return from a failing grade, it is necessary that there be enough room to actually come back.

Suppose Aiko is a pretty good student, but it just so happens that she makes a 49 on the first midterm due to some particular misunderstanding. If the class average on every assessment is a 90, then Aiko cannot claw her way back. That is, even if Aiko makes a 100 on everything else, Aiko’s final grade would be below a 90, and thus below average. Aiko would probably make a B.

In this situation, the class is too easy, and thus there are no chances for students to overcome a setback on any single exam.

On the other hand, suppose that Bilal makes a 49 on the first midterm, but that the class average is a 75 overall. If Bilal makes a 100 on everything else, Bilal will  end with just below a 90, significantly above the class average. Bilal would probably make an A.

In this course, the mean overall was a 78, and the standard deviation was about 15. In this case, an 89 would be an A. So there was enough space and distance to overcome even a disastrous exam.

But, did anyone actually do this? The way I like to look at this is to look at changes in relative performance in terms of standard deviations away from the mean. Performing at one standard deviation below the mean on Midterm 1 and one standard deviation above the mean on Midterm 2 indicates a more meaningful amount of grade fluidity than merely looking at points above or below the mean

Looking at the First Midterm vs the Second Midterm, we see that there is a rough linear relationship (Pearson R suggests 44% predictive value). That’s to be expected. What’s relevant now are the points above or below the line $y = x$. To be above the line $y = x$ means that you did better on the Second Midterm than you did on the First Midterm, all in comparison to the rest of the class. To be below means the opposite.

Even more relevant is to be in the Fourth Quadrant, which indicates that you did worse than average on the first midterm and better than average on the second. Looking here, there is a very healthy amount of people who are in the Fourth Quadrant. There are many people who changed by 2 standard deviations in comparison to the mean — a very meaningful change. [Many people lost a few standard deviations too – grade mobility is a two way street].

The First Midterm to the Overall grade shows healthy mobility as well.

The Second Midterm to Overall shows some mobility, but it is interesting that more people lost ground (by performing well on the Second Midterm, and then performing badly Overall) than gained ground (by performing badly on the Second Midterm, but performing well Overall).

Although I don’t show the plots, this trend carries through pretty well. Many people were able to salvage or boost a letter grade based solely on the final (and correspondingly many people managed to lose just enough on the final to drop a letter grade). Interestingly, very many people were able to turn a likely B into an A through the final.

So overall, I would say that it was definitely possible to salvage grades this semester.

If you’ve never thought about this before, then keep this in mind the next time you hear complaints about a course with challenging exams — it gives enough space for students to demonstrate sufficient understanding to make up for a bad past assessment.

## Non-Numerical Reflection

The numbers tell some characteristics of the class, but not the whole story.

### Standard Class Materials

We used Thomas’ Calculus. I think this is an easy book to teach from, and relatively easy to read. It feels like many other cookie-cutter calculus books (such as Larson and Edwards or Stewart). But it’s quite expensive for students. However, as we do not use an electronic homework component (which seems to be becoming more popular elsewhere), at least students can buy used copies (or use other methods of procural).

However, solutions manuals are available online (I noticed some students had copies). Some of the pay-for sites have complete (and mostly but not entirely correct) provided solutions manuals as well. This makes some parts of Thomas challenging to use, especially as we do not write our own homework to give. I suppose that this is a big reason why one might want to use an electronic system.

The book has much more material in it than we teach. For instance, the book includes all of a first semester of calculus, and also more details in many sections. We avoid numerical integration, Fourier series, some applications, some details concerning polar and parametric plots, etc. Ideally, there would exist a book catering to exactly our needs. But there isn’t, so I suppose Thomas is about as good as any.

I’ve now taught elementary calculus for a few years, and I’m surprised at how often I am able to reuse two notes I wrote years ago, namely the refresher on first semester calculus (An Intuitive Introduction to Calculus) and my additional note on Taylor series (An Intuitive Overview of Taylor Series). Perhaps more surprisingly, I’m astounded at how often people from other places link to and visit these two notes (and in particular, the Taylor Series note).

These were each written for a Math 100 course in 2013. So my note to myself is that there is good value in writing something well enough that I can reuse it, and others might even find it valuable.

Unfortunately, while I wrote a few notes this semester, I don’t think that they will have the same lasting appeal. The one I wrote on the series convergence tests is something that (perhaps after one more round of editing) I will use each time I teach this subject in the future. I’m tremendously happy with my note on computing $\pi$ with Math 100 tools, but as it sits outside the curriculum, many students won’t actually read it. [However many did read it, and it generated many interesting conversations about actual mathematics]. Perhaps sometime I will teach a calculus class ending with some sort of project, as computing $\pi$ leads to very many interested and interrelated project thoughts.

### Course Content

I must admit that I do not know why this course is the way it is, and this bothers me a bit. In many ways, this course is a grab bag of calculus nuggets. Presumably each piece was added in because it is necessary in sufficiently many other places, or is so directly related to the “core material” of this course, that it makes sense to include it. But from what I can tell, these reasons have been lost to the sands of time.

The core material in this course are: Integration by Parts, Taylor’s Theorem, Parametric and Polar coordinates, and First Order Linear Differential Equations. We also spend a large amount of time towards other techniques of integration (partial fraction decomposition, trig substitution) and understanding generic series (including the various series convergence/divergence tests). Along the way, there are some seemingly arbitrary decisions on what to include or exclude. For instance, we learn how to integrate

$$\int \sin^n x \cos^m x \; dx$$

because we have decided that being able to perform trigonometric substitution in integrals is a good idea. But we omit integrals like

$$\int \sin(nx) \sin(mx) \; dx$$

which would come up naturally in talking about Fourier series. Fourier series fit naturally into this class, and in some variants of this class they are taught. But so does trigonometric substitution! So what is the rationale here? If the answer is to become better at problem solving or to develop mathematical maturity, then I think it would be good to recognize that so that we know what we should feel comfortable wiggling to build and develop the curriculum in the future. [Also, students should know that calculus is not a pinnacle. See for instance this podcast with Steven Strogatz on Innovation Hub.]

This is not restricted to Brown. I’m familiar with the equivalent of this course at other institutions, and there are similar seemingly arbitrary differences in what to include or exclude. For years at Georgia Tech, they tossed in a several week unit on linear algebra into this course [although I’ve learned that they stopped that in the past two years]. The AP Calc BC curriculum includes trig substitution but not Fourier series. Perhaps they had a reason?

What this means to me is that the intent of this course has become muddled, and separated from the content of the course. This is an overwhelmingly hard task to try to fix, as a second semester of calculus fits right in the middle of so many other pieces. Yet I would be very grateful to the instructor who sits down and identifies reasons for or against inclusion of the various topics in this course, or perhaps cuts the calculus curriculum into pieces and rearranges them to fit modern necessities.

## A Parachute is only necessary to go skydiving twice

This is the last class I teach at Brown as a graduate student (and most likely, ever). Amusingly, I taught it in the same room as the first course I taught as a graduate student. I’ve learned quite a bit about teaching inbetween, but in many ways it feels the same. Just like for students, the only scary class is the first one, although exams can be a real pain (to take, or to grade).

It’s been a pleasure. As usual, if you have any questions, please let me know.

Posted in Brown University, Math 100, Mathematics, Teaching | Tagged , , , , | Leave a comment

## Computing $\pi$

This note was originally written in the context of my fall Math 100 class at Brown University. It is also available as a pdf note.

While investigating Taylor series, we proved that
\label{eq:base}
\frac{\pi}{4} = 1 – \frac{1}{3} + \frac{1}{5} – \frac{1}{7} + \frac{1}{9} + \cdots

Let’s remind ourselves how. Begin with the geometric series

\frac{1}{1 + x^2} = 1 – x^2 + x^4 – x^6 + x^8 + \cdots = \sum_{n = 0}^\infty (-1)^n x^{2n}. \notag

(We showed that this has interval of convergence $\lvert x \rvert < 1$). Integrating this geometric series yields

\int_0^x \frac{1}{1 + t^2} dt = x – \frac{x^3}{3} + \frac{x^5}{5} – \frac{x^7}{7} + \cdots = \sum_{n = 0}^\infty (-1)^n \frac{x^{2n+1}}{2n+1}. \notag

Note that this has interval of convergence $-1 < x \leq 1$.

We also recognize this integral as

\int_0^x \frac{1}{1 + t^2} dt = \text{arctan}(x), \notag

one of the common integrals arising from trigonometric substitution. Putting these together, we find that

\text{arctan}(x) = x – \frac{x^3}{3} + \frac{x^5}{5} – \frac{x^7}{7} + \cdots = \sum_{n = 0}^\infty (-1)^n \frac{x^{2n+1}}{2n+1}. \notag

As $x = 1$ is within the interval of convergence, we can substitute $x = 1$ into the series to find the representation

\text{arctan}(1) = 1 – \frac{1}{3} + \frac{1}{5} – \frac{1}{7} + \cdots = \sum_{n = 0}^\infty (-1)^n \frac{1}{2n+1}. \notag

Since $\text{arctan}(1) = \frac{\pi}{4}$, this gives the representation for $\pi/4$ given in \eqref{eq:base}.

However, since $x=1$ was at the very edge of the interval of convergence, this series converges very, very slowly. For instance, using the first $50$ terms gives the approximation

\pi \approx 3.121594652591011. \notag

The expansion of $\pi$ is actually

\pi = 3.141592653589793238462\ldots \notag

So the first $50$ terms of \eqref{eq:base} gives two digits of accuracy. That’s not very good.

I think it is very natural to ask: can we do better? This series converges slowly — can we find one that converges more quickly?

### Aside

As an aside: one might also ask if we can somehow speed up the convergence of the series we already have. It turns out that in many cases, you can! For example, we know in alternating series that the sum of the whole series is between any two consecutive partial sums. So what if you took the average of two consecutive partial sums? [Equivalently, what if you added only one half of the last term in a partial sum. Do you see why these are the same?]

The average of the partial sum of the first 49 terms and the partial sum of the first 50 terms is actually

3.141796672793031, \notag

which is correct to within $0.001$. That’s an improvement!

What if you do still more? More on this can be found in the last Section.

## Estimating $\pi$ through a different series

We return to the question: can we find a series that gives us $\pi$, but which converges faster? Yes we can! And we don’t have to look too far — we can continue to rely on our expansion for $\text{arctan}(x)$.

We had been using that $\text{arctan}(1) = \frac{\pi}{4}$. But we also know that $\text{arctan}(1/\sqrt{3}) = \frac{\pi}{6}$. Since $1/\sqrt{3}$ is closer to the center of the power series than $1$, we should expect that the convergence is much better.

Recall that

\text{arctan}(x) = x – \frac{x^3}{3} + \frac{x^5}{5} – \frac{x^7}{7} + \cdots = \sum_{n = 0}^\infty (-1)^n \frac{x^{2n + 1}}{2n + 1}. \notag

Then we have that
\begin{align}
\text{arctan}\left(\frac{1}{\sqrt 3}\right) &= \frac{1}{\sqrt 3} – \frac{1}{3(\sqrt 3)^3} + \frac{1}{5(\sqrt 3)^5} + \cdots \notag \\
&= \frac{1}{\sqrt 3} \left(1 – \frac{1}{3 \cdot 3} + \frac{1}{5 \cdot 3^2} – \frac{1}{7 \cdot 3^3} + \cdots \right) \notag \\
&= \frac{1}{\sqrt 3} \sum_{n = 0}^\infty (-1)^n \frac{1}{(2n + 1) 3^n}. \notag
\end{align}
Therefore, we have the equality

\frac{\pi}{6} = \frac{1}{\sqrt 3} \sum_{n = 0}^\infty (-1)^n \frac{1}{(2n + 1) 3^n} \notag

or rather that

\pi = 2 \sqrt{3} \sum_{n = 0}^\infty (-1)^n \frac{1}{(2n + 1) 3^n}. \notag

From a computation perspective, this is far superior. For instance, based on our understanding of error from the alternating series test, using the first $10$ terms of this series will approximate $\pi$ to within

2 \sqrt 3 \frac{1}{23 \cdot 3^{11}} \approx \frac{1}{26680}. \notag

Let’s check this.

2 \sqrt 3 \left(1 – \frac{1}{3\cdot 3} + \frac{1}{5 \cdot 3^2} + \cdots + \frac{1}{21 \cdot 3^{10}}\right) = 3.1415933045030813. \notag

Look at how close that approximation is, and we only used the first $10$ terms!
Roughly speaking, each additional 2.5 terms yields another digit of $\pi$. Using the first $100$ terms would give the first 48 digits of $\pi$.
Using the first million terms would give the first 47000 (or so) digits of $\pi$ — and this is definitely doable, even on a personal laptop. (On my laptop, it takes approximately 4 milliseconds to compute the first 20 digits of $\pi$ using this technique).

### Even Better Series

I think it is very natural to ask again: can we find an even faster converging series? Perhaps we can choose better values to evaluate arctan at? This turns out to be a very useful line of thought, and it leads to some of the best-known methods for evaluating $\pi$. Through clever choices of values and identities involving arctangents, one can construct extremely quickly converging series for $\pi$. For more information on this line of thought, look up Machin-like formula.

## Patterns in the Approximation of $\pi/4$

Looking back at the approximation of $\pi$ coming from the first $50$ terms of the series
\label{eq:series_pi4_base}
1 – \frac{1}{3} + \frac{1}{5} – \frac{1}{7} + \cdots

we found an approximation of $\pi$, which I’ll represent as $\widehat{\pi}$,

\pi \approx \widehat{\pi} = 3.121594652591011. \notag

Let’s look very carefully at how this compares to $\pi$, up to the first $10$ decimals. We color the incorrect digits in ${\color{orange}{orange}}$.
\begin{align}
\pi &= 3.1415926535\ldots \notag \\
\widehat{\pi} &= 3.1{\color{orange}2}159{\color{orange}4}65{\color{orange}2}5 \notag
\end{align}
Notice that most of the digits are correct — in fact, only three (of the first ten) are incorrect! Isn’t that weird?

It happens to be that when one uses the first $10^N / 2$ terms (for any $N$) of the series \eqref{eq:series_pi4_base}, there will be a pattern of mostly correct digits with disjoint strings of incorrect digits in the middle. This is an unusual and surprising phenomenon.

The positions of the incorrect digits can be predicted. Although I won’t go into any detail here, the positions of the errors are closely related to something called Euler Numbers or, more deeply, to Boole Summation.

Playing with infinite series leads to all sorts of interesting patterns. There is a great history of mathematicians and physicists messing around with series and stumbling across really deep ideas.

## Speeding up computation

Take an alternating series

\sum_{n = 0}^\infty (-1)^{n} a_n = a_0 – a_1 + a_2 – a_3 + \cdots \notag

If ${a_n}$ is a sequence of positive, decreasing terms with limit $0$, then the alternating series converges to some value $S$. And further, consecutive partial sums bound the value of $S$, in that

\sum_{n = 0}^{2K-1} (-1)^{n} a_n \leq S \leq \sum_{n = 1}^{2K} (-1)^{n} a_n. \notag

For example,

1 – \frac{1}{3} < \sum_{n = 0}^\infty \frac{(-1)^{n}}{2n+1} < 1 – \frac{1}{3} + \frac{1}{5}. \notag

Instead of approximating the value of the whole sum $S$ by the $K$th partial sum $\sum_{n \leq K} (-1)^n a_n$, it might seem reasonable to approximate $S$ by the average of the $(K-1)$st partial sum and the $K$th partial sum. Since we know $S$ is between the two, taking their average might be closer to the real result.

As mentioned above, the average of the partial sum consisting of the first $49$ terms of \eqref{eq:base} and the first $50$ terms of \eqref{eq:base} gives a much improved estimate of $\pi$ than using either the first $49$ or first $50$ terms on their own. (And indeed, it works much better than even the first $500$ terms on their own).

Before we go on, let’s introduce a little notation. Let $S_K$ denote the partial sum of the terms up to $K$, i.e.

S_K = \sum_{n = 0}^K (-1)^{n} a_n. \notag

Then the idea is that instead of using $S_{K}$ to approximate the wholse sum $S$, we’ll use the average

\frac{S_{K-1} + S_{K}}{2} \approx S. \notag

Averaging once seems like a great idea. What if we average again? That is, what if instead of using the average of $S_{K-1}$ and $S_K$, we actually use the average of (the average of $S_{K-2}$ and $S_{K-1}$) and (the average of $S_{K_1}$ and $S_K$),
\label{eq:avgavg}
\frac{\frac{S_{K-2} + S_{K-1}}{2} + \frac{S_{K-1} + S_{K}}{2}}{2}.

As this is really annoying to write, let’s come up with some new notation. Write the average between a quantity $X$ and $Y$ as

[X, Y] = \frac{X + Y}{2}. \notag

Further, define the average of $[X, Y]$ and $[Y, Z]$ to be $[X, Y, Z]$,

[X, Y, Z] = \frac{[X, Y] + [Y, Z]}{2} = \frac{\frac{X + Y}{2} + \frac{Y + Z}{2}}{2}. \notag

So the long expression in \eqref{eq:avgavg} can be written as $[S_{K-2}, S_{K-1}, S_{K}]$.

With this notation in mind, let’s compute some numerics. Below, we give the actual value of $\pi$, the values of $S_{48}, S_{49}$, and $S_{50}$, pairwise averages, and the average-of-the-average, in the case of $1 – \frac{1}{3} + \frac{1}{5} + \cdots$.
\notag
\begin{array}{c|l|l}
& \text{Value} & \text{Difference from } \pi \\ \hline
\pi & 3.141592653589793238462\ldots & \phantom{-}0 \\ \hline
4 \cdot S_{48} & 3.1207615795929895 & \phantom{-}0.020831073996803617 \\ \hline
4 \cdot S_{49} & 3.161998692995051 & -0.020406039405258092 \\ \hline
4 \cdot S_{50} & 3.121594652591011 & \phantom{-}0.01999800099878213 \\ \hline
4 \cdot [S_{48}, S_{49}] & 3.1413801362940204 & \phantom{-}0.0002125172957727628 \\ \hline
4 \cdot [S_{49}, S_{50}] & 3.1417966727930313 & -0.00020401920323820377 \\ \hline
4 \cdot [S_{48}, S_{49}, S_{50}] & 3.141588404543526 & \phantom{-}0.00000424904626727951 \\ \hline
\end{array}

So using the average of averages from the three sums $S_{48}, S_{49}$, and $S_{50}$ gives $\pi$ to within $4.2 \cdot 10^{-6}$, an incredible improvement compared to $S_{50}$ on its own.

There is something really odd going on here. We are not computing additional summands in the overall sum \eqref{eq:base}. We are merely combining some of our partial results together in a really simple way, repeatedly. Somehow, the sequence of partial sums contains more information about the limit $S$ than individual terms, and we are able to extract some of this information.

I think there is a very natural question. What if we didn’t stop now? What if we took averages-of-averages-of-averages, and averages-of-averages-of-averages-of-averages, and so on? Indeed, we might define the average

[X, Y, Z, W] = \frac{[X, Y, Z] + [Y, Z, W]}{2}, \notag

and so on for larger numbers of terms. In this case, it happens to be that

[S_{15}, S_{16}, \ldots, S_{50}] = 3.141592653589794,

which has the first 15 digits of $\pi$ correct!

By repeatedly averaging alternating sums of just the first $50$ reciprocals of odd integers, we can find $\pi$ up to 15 digits. I think that’s incredible — it seems both harder than it might have been (as this involves lots of averaging) and much easier than it might have been (as the only arithmetic input are the fractions $1/(2n+1)$ for $n$ up to $50$.

Although we leave the thread of ideas here, there are plenty of questions that I think are now asking themselves. I encourage you to ask them, and we may return to this (or related) topics in the future. I’ll see you in class.

## Series Convergence Tests with Prototypical Examples

This is a note written for my Fall 2016 Math 100 class at Brown University. We are currently learning about various tests for determining whether series converge or diverge. In this note, we collect these tests together in a single document. We give a brief description of each test, some indicators of when each test would be good to use, and give a prototypical example for each. Note that we do justify any of these tests here — we’ve discussed that extensively in class. [But if something is unclear, send me an email or head to my office hours]. This is here to remind us of the variety of the various tests of convergence.

A copy of just the statements of the tests, put together, can be found here. A pdf copy of this whole post can be found here.

In order, we discuss the following tests:

1. The $n$th term test, also called the basic divergence test
2. Recognizing an alternating series
3. Recognizing a geometric series
4. Recognizing a telescoping series
5. The Integral Test
6. P-series
7. Direct (or basic) comparison
8. Limit comparison
9. The ratio test
10. The root test

## The $n$th term test

### Statement

Suppose we are looking at $\sum_{n = 1}^\infty a_n$ and

\lim_{n \to \infty} a_n \neq 0. \notag

Then $\sum_{n = 1}^\infty a_n$ does not converge.

### When to use it

When applicable, the $n$th term test for divergence is usually the easiest and quickest way to confirm that a series diverges. When first considering a series, it’s a good idea to think about whether the terms go to zero or not. But remember that if the limit of the individual terms is zero, then it is necessary to think harder about whether the series converges or diverges.

### Example

Each of the series

\sum_{n = 1}^\infty \frac{n+1}{2n + 4}, \quad \sum_{n = 1}^\infty \cos n, \quad \sum_{n = 1}^\infty \sqrt{n} \notag

diverges since their limits are not $0$.

## Recognizing alternating series

### Statement

Suppose $\sum_{n = 1}^\infty (-1)^n a_n$ is a series where

1. $a_n \geq 0$,
2. $a_n$ is decreasing, and
3. $\lim_{n \to \infty} a_n = 0$.

Then $\sum_{n = 1}^\infty (-1)^n a_n$ converges.

Stated differently, if the terms are alternating sign, decreasing in absolute size, and converging to zero, then the series converges.

### When to use it

The key is in the name — if the series is alternating, then this is the goto idea of analysis. Note that if the terms of a series are alternating and decreasing, but the terms do not go to zero, then the series diverges by the $n$th term test.

### Example

Suppose we are looking at the series

\sum_{n = 1}^\infty \frac{(-1)^n}{\log(n+1)} = \frac{-1}{\log 2} + \frac{1}{\log 3} + \frac{-1}{\log 4} + \cdots \notag

The terms are alternating.
The sizes of the terms are $\frac{1}{\log (n+1)}$, and these are decreasing.
Finally,

\lim_{n \to \infty} \frac{1}{\log(n+1)} = 0. \notag

Thus the alternating series test applies and shows that this series converges.

## A Notebook Preparing for a Talk at Quebec-Maine

This is a notebook containing a representative sample of the code I used to  generate the results and pictures presented at the Quebec-Maine Number Theory Conference on 9 October 2016. It was written in a Jupyter Notebook using Sage 7.3, and later converted for presentation on this site.
There is a version of the notebook available on github. Alternately, a static html version without WordPress formatting is available here. Finally, this notebook is also available in pdf form.
The slides for my talk are available here.

# Testing for a Generalized Conjecture on Iterated Sums of Coefficients of Cusp Forms¶

Let $f$ be a weight $k$ cusp form with Fourier expansion

$$f(z) = \sum_{n \geq 1} a(n) e(nz).$$

Deligne has shown that $a(n) \ll n^{\frac{k-1}{2} + \epsilon}$. It is conjectured that

$$S_f^1(n) := \sum_{m \leq X} a(m) \ll X^{\frac{k-1}{2} + \frac{1}{4} + \epsilon}.$$

It is known that this holds on average, and we recently showed that this holds on average in short intervals.
(See HKLDW1, HKLDW2, and HKLDW3 for details and an overview of work in this area).
This is particularly notable, as the resulting exponent is only 1/4 higher than that of a single coefficient.
This indicates extreme cancellation, far more than what is implied merely by the signs of $a(n)$ being random.

It seems that we also have

$$\sum_{m \leq X} S_f^1(m) \ll X^{\frac{k-1}{2} + \frac{2}{4} + \epsilon}.$$

That is, the sum of sums seems to add in only an additional 1/4 exponent.
This is unexpected and a bit mysterious.

The purpose of this notebook is to explore this and higher conjectures.
Define the $j$th iterated sum as

$$S_f^j(X) := \sum_{m \leq X} S_f^{j-1} (m).$$

Then we numerically estimate bounds on the exponent $\delta(j)$ such that

$$S_f^j(X) \ll X^{\frac{k-1}{2} + \delta(j) + \epsilon}.$$

In [1]:
# This was written in SageMath 7.3 through a Jupyter Notebook.

# sage plays strangely with ipython. This re-allows inline plotting
from IPython.display import display, Image


We first need a list of coefficients of one (or more) cusp forms.
For initial investigation, we begin with a list of 50,000 coefficients of the weight $12$ cusp form on $\text{SL}(2, \mathbb{Z})$, $\Delta(z)$, i.e. Ramanujan’s delta function.
We will use the data associated to the 50,000 coefficients for pictoral investigation as well.

We will be performing some numerical investigation as well.
For this, we will use the first 2.5 million coefficients of $\Delta(z)$

In [2]:
# Gather 10 coefficients for simple checking
check_10 = delta_qexp(11).coefficients()
print check_10

fiftyk_coeffs = delta_qexp(50000).coefficients()
print fiftyk_coeffs[:10] # these match expected

twomil_coeffs = delta_qexp(2500000).coefficients()
print twomil_coeffs[:10] # these also match expected

[1, -24, 252, -1472, 4830, -6048, -16744, 84480, -113643, -115920]
[1, -24, 252, -1472, 4830, -6048, -16744, 84480, -113643, -115920]
[1, -24, 252, -1472, 4830, -6048, -16744, 84480, -113643, -115920]

In [3]:
# Function which iterates partial sums from a list of coefficients

def partial_sum(baselist):
ret_list = [baselist[0]]
for b in baselist[1:]:
ret_list.append(ret_list[-1] + b)
return ret_list

print check_10
print partial_sum(check_10) # Should be the partial sums

[1, -24, 252, -1472, 4830, -6048, -16744, 84480, -113643, -115920]
[1, -23, 229, -1243, 3587, -2461, -19205, 65275, -48368, -164288]

In [4]:
# Calculate the first 10 iterated partial sums
# We store them in a single list list, sums_list
# the zeroth elelemnt of the list is the array of initial coefficients
# the first element is the array of first partial sums, S_f(n)
# the second element is the array of second iterated partial sums, S_f^2(n)

fiftyk_sums_list = []
fiftyk_sums_list.append(fiftyk_coeffs) # zeroth index contains coefficients
for j in range(10):                    # jth index contains jth iterate
fiftyk_sums_list.append(partial_sum(fiftyk_sums_list[-1]))

print partial_sum(check_10)
print fiftyk_sums_list[1][:10]         # should match above

twomil_sums_list = []
twomil_sums_list.append(twomil_coeffs) # zeroth index contains coefficients
for j in range(10):                    # jth index contains jth iterate
twomil_sums_list.append(partial_sum(twomil_sums_list[-1]))

print twomil_sums_list[1][:10]         # should match above

[1, -23, 229, -1243, 3587, -2461, -19205, 65275, -48368, -164288]
[1, -23, 229, -1243, 3587, -2461, -19205, 65275, -48368, -164288]
[1, -23, 229, -1243, 3587, -2461, -19205, 65275, -48368, -164288]


As is easily visible, the sums alternate in sign very rapidly.
For instance, we believe tha the first partial sums should change sign about once every $X^{1/4}$ terms in the interval $[X, 2X]$.
In this exploration, we are interested in the sizes of the coefficients.
But in HKLDW3, we investigated some of the sign changes of the partial sums.

Now seems like a nice time to briefly look at the data we currently have.
What do the first 50 thousand coefficients look like?
So we normalize them, getting $A(n) = a(n)/n^{5.5}$ and plot these coefficients.

In [5]:
norm_list = []
for n,e in enumerate(fiftyk_coeffs, 1):
normalized_element = 1.0 * e / (1.0 * n**(5.5))
norm_list.append(normalized_element)
print norm_list[:10]

[1.00000000000000, -0.530330085889911, 0.598733612492945, -0.718750000000000, 0.691213333204735, -0.317526448138560, -0.376547696558964, 0.911504835123284, -0.641518061271148, -0.366571226366719]

In [6]:
# Make a quick display
normed_coeffs_plot = scatter_plot(zip(range(1,60000), norm_list), markersize=.02)
normed_coeffs_plot.save("normed_coeffs_plot.png")
display(Image("normed_coeffs_plot.png"))


Since some figures will be featuring prominently in the talk I’m giving at Quebec-Maine, let us make high-quality figures now.

## Math 100: Completing the partial fractions example from class

### An Unfinished Example

At the end of class today, someone asked if we could do another example of a partial fractions integral involving an irreducible quadratic. We decided to look at the integral

$$\int \frac{1}{(x^2 + 4)(x+1)}dx.$$
Notice that ${x^2 + 4}$ is an irreducible quadratic polynomial. So when setting up the partial fraction decomposition, we treat the ${x^2 + 4}$ term as a whole.

So we seek to find a decomposition of the form

$$\frac{1}{(x^2 + 4)(x+1)} = \frac{A}{x+1} + \frac{Bx + C}{x^2 + 4}.$$
Now that we have the decomposition set up, we need to solve for ${A,B,}$ and ${C}$ using whatever methods we feel most comfortable with. Multiplying through by ${(x^2 + 4)(x+1)}$ leads to

$$1 = A(x^2 + 4) + (Bx + C)(x+1) = (A + B)x^2 + (B + C)x + (4A + C).$$
Matching up coefficients leads to the system of equations

\begin{align} 0 &= A + B \\ 0 &= B + C \\ 1 &= 4A + C. \end{align}
So we learn that ${A = -B = C}$, and ${A = 1/5}$. So ${B = -1/5}$ and ${C = 1/5}$.

Together, this means that

$$\frac{1}{(x^2 + 4)(x+1)} = \frac{1}{5}\frac{1}{x+1} + \frac{1}{5} \frac{-x + 1}{x^2 + 4}.$$
Recall that if you wanted to, you could check this decomposition by finding a common denominator and checking through.

Now that we have performed the decomposition, we can return to the integral. We now have that

$$\int \frac{1}{(x^2 + 4)(x+1)}dx = \underbrace{\int \frac{1}{5}\frac{1}{x+1}dx}_ {\text{first integral}} + \underbrace{\int \frac{1}{5} \frac{-x + 1}{x^2 + 4} dx.}_ {\text{second integral}}$$
We can handle both of the integrals on the right hand side.

The first integral is

$$\frac{1}{5} \int \frac{1}{x+1} dx = \frac{1}{5} \ln (x+1) + C.$$

The second integral is a bit more complicated. It’s good to see if there is a simple ${u}$-substition, since there is an ${x}$ in the numerator and an ${x^2}$ in the denominator. But unfortunately, this integral needs to be further broken into two pieces that we know how to handle separately.

$$\frac{1}{5} \int \frac{-x + 1}{x^2 + 4} dx = \underbrace{\frac{-1}{5} \int \frac{x}{x^2 + 4}dx}_ {\text{first piece}} + \underbrace{\frac{1}{5} \int \frac{1}{x^2 + 4}dx.}_ {\text{second piece}}$$

The first piece is now a ${u}$-substitution problem with ${u = x^2 + 4}$. Then ${du = 2x dx}$, and so

$$\frac{-1}{5} \int \frac{x}{x^2 + 4}dx = \frac{-1}{10} \int \frac{du}{u} = \frac{-1}{10} \ln u + C = \frac{-1}{10} \ln (x^2 + 4) + C.$$

The second piece is one of the classic trig substitions. So we draw a triangle.

In this triangle, thinking of the bottom-left angle as ${\theta}$ (sorry, I forgot to label it), then we have that ${2\tan \theta = x}$ so that ${2 \sec^2 \theta d \theta = dx}$. We can express the so-called hard part of the triangle by ${2\sec \theta = \sqrt{x^2 + 4}}$.

Going back to our integral, we can think of ${x^2 + 4}$ as ${(\sqrt{x^2 + 4})^2}$ so that ${x^2 + 4 = (2 \sec \theta)^2 = 4 \sec^2 \theta}$. We can now write our integral as

$$\frac{1}{5} \int \frac{1}{x^2 + 4}dx = \frac{1}{5} \int \frac{1}{4 \sec^2 \theta} 2 \sec^2 \theta d \theta = \frac{1}{5} \int \frac{1}{2} d\theta = \frac{1}{10} \theta.$$
As ${2 \tan \theta = x}$, we have that ${\theta = \text{arctan}(x/2)}$. Inserting this into our expression, we have

$$\frac{1}{10} \int \frac{1}{x^2 + 4} dx = \frac{1}{10} \text{arctan}(x/2) + C.$$

Combining the first integral and the first and second parts of the second integral together (and combining all the constants ${C}$ into a single constant, which we also denote by ${C}$), we reach the final expression

$$\int \frac{1}{(x^2 + 4)(x + 1)} dx = \frac{1}{5} \ln (x+1) – \frac{1}{10} \ln(x^2 + 4) + \frac{1}{10} \text{arctan}(x/2) + C.$$

### Other Notes

If you have any questions or concerns, please let me know. As a reminder, I have office hours on Tuesday from 9:30–11:30 (or perhaps noon) in my office, and I highly recommend attending the Math Resource Center in the Kassar House from 8pm-10pm, offered Monday-Thursday. [Especially on Tuesday and Thursdays, when there tend to be fewer people there].

On my course page, I have linked to two additional resources. One is to Paul’s Online Math notes for partial fraction decomposition (which I think is quite a good resource). The other is to the Khan Academy for some additional worked through examples on polynomial long division, in case you wanted to see more worked examples. This note can also be found on my website, or in pdf form.

Good luck, and I’ll see you in class.

## “On Functions Whose Mean Value Abscissas are Midpoints, with Connections to Harmonic Functions” (with Paul Carter)

This is joint work with Paul Carter. Humorously, we completed this while on a cross-country drive as we moved the newly minted Dr. Carter from Brown to Arizona.

I’ve had a longtime fascination with the standard mean value theorem of calculus.

Mean Value Theorem
Suppose $f$ is a differentiable function. Then there is some $c \in (a,b)$ such that

\frac{f(b) – f(a)}{b-a} = f'(c).

The idea for this project started with a simple question: what happens when we interpret the mean value theorem as a differential equation and try to solve it? As stated, this is too broad. To narrow it down, we might specify some restriction on the $c$, which we refer to as the mean value abscissa, guaranteed by the Mean Value Theorem.

So I thought to try to find functions satisfying

\frac{f(b) – f(a)}{b-a} = f’ \left( \frac{a + b}{2} \right)

for all $a$ and $b$ as a differential equation. In other words, let’s try to find all functions whose mean value abscissas are midpoints.

This looks like a differential equation, which I only know some things about. But my friend and colleague Paul Carter knows a lot about them, so I thought it would be fun to ask him about it.

He very quickly told me that it’s essentially impossible to solve this from the perspective of differential equations. But like a proper mathematician with applied math leanings, he thought we should explore some potential solutions in terms of their Taylor expansions. Proceeding naively in this way very quickly leads to the answer that those (assumed smooth) solutions are precisely quadratic polynomials.

It turns out that was too simple. It was later pointed out to us that verifying that quadratic polynomials satisfy the midpoint mean value property is a common exercise in calculus textbooks, including the one we use to teach from at Brown. Digging around a bit reveals that this was even known (in geometric terms) to Archimedes.

So I thought we might try to go one step higher, and see what’s up with
\label{eq:original_midpoint}
\frac{f(b) – f(a)}{b-a} = f’ (\lambda a + (1-\lambda) b), \tag{1}

where $\lambda \in (0,1)$ is a weight. So let’s find all functions whose mean value abscissas are weighted averages. A quick analysis with Taylor expansions show that (assumed smooth) solutions are precisely linear polynomials, except when $\lambda = \frac{1}{2}$ (in which case we’re looking back at the original question).

That’s a bit odd. It turns out that the midpoint itself is distinguished in this way. Why might that be the case?

It is beneficial to look at the mean value property as an integral property instead of a differential property,

\frac{1}{b-a} \int_a^b f'(t) dt = f’\big(c(a,b)\big).

We are currently examining cases when $c = c_\lambda(a,b) = \lambda a + (1-\lambda b)$. We can see the right-hand side is differentiable by differentiating the left-hand side directly. Since any point can be a weighted midpoint, one sees that $f$ is at least twice-differentiable. One can actually iterate this argument to show that any $f$ satisfying one of the weighted mean value properties is actually smooth, justifying the Taylor expansion analysis indicated above.

An attentive eye might notice that the midpoint mean value theorem, written as the integral property

\frac{1}{b-a} \int_a^b f'(t) dt = f’ \left( \frac{a + b}{2} \right)

is exactly the one-dimensional case of the harmonic mean value property, usually written

\frac{1}{\lvert B_h \rvert} = \int_{B_h(x)} g(t) dV = g(x).

Here, $B_h(x)$ is the ball of radius $h$ and center $x$. Any harmonic function satisfies this mean value property, and any function satisfying this mean value property is harmonic.

From this viewpoint, functions satisfying our original midpoint mean value property~\eqref{eq:original_midpoint} have harmonic derivatives. But the only one-dimensional harmonic functions are affine functions $g(x) = cx + d$. This gives immediately that the set of solutions to~\eqref{eq:original_midpoint} are quadratic polynomials.

The weighted mean value property can also be written as an integral property. Trying to connect it similarly to harmonic functions led us to consider functions satisfying

\frac{1}{\lvert B_h \rvert} = \int_{B_h(x)} g(t) dV = g(c_\lambda(x,h)),

where $c_\lambda(x,h)$ should be thought of as some distinguished point in the ball $B_h(x)$ with a weight parameter $\lambda$. More specifically,

Are there weighted harmonic functions corresponding to a weighted harmonic mean value property?
In one dimension, the answer is no, as seen above. But there are many more multivariable harmonic functions [in fact, I’ve never thought of harmonic functions on $\mathbb{R}^1$ until this project, as they’re too trivial]. So maybe there are weighted harmonic functions in higher dimensions?

This ends up being the focus of the latter half of our paper. Unexpectedly (to us), an analogous methodology to our approach in the one-dimensional case works, with only a few differences.

It turns out that no, there are no weighted harmonic functions on $\mathbb{R}^n$ other than trivial extensions of harmonic functions from $\mathbb{R}^{n-1}$.

Harmonic functions are very special, and even more special than we had thought. The paper is a fun read, and can be found on the arxiv now. It has been accepted and will appear in American Mathematical Monthly.

## Paper: Sign Changes of Coefficients and Sums of Coefficients of Cusp Forms

This is joint work with Thomas Hulse, Chan Ieong Kuan, and Alex Walker, and is a another sequel to our previous work. This is the third in a trio of papers, and completes an answer to a question posed by our advisor Jeff Hoffstein two years ago.

We have just uploaded a preprint to the arXiv giving conditions that guarantee that a sequence of numbers contains infinitely many sign changes. More generally, if the sequence consists of complex numbers, then we give conditions that guarantee sign changes in a generalized sense.

Let $\mathcal{W}(\theta_1, \theta_2) := { re^{i\theta} : r \geq 0, \theta \in [\theta_1, \theta_2]}$ denote a wedge of complex plane.

Suppose ${a(n)}$ is a sequence of complex numbers satisfying the following conditions:

1. $a(n) \ll n^\alpha$,
2. $\sum_{n \leq X} a(n) \ll X^\beta$,
3. $\sum_{n \leq X} \lvert a(n) \rvert^2 = c_1 X^{\gamma_1} + O(X^{\eta_1})$,

where $\alpha, \beta, c_1, \gamma_1$, and $\eta_1$ are all real numbers $\geq 0$. Then for any $r$ satisfying $\max(\alpha+\beta, \eta_1) – (\gamma_1 – 1) < r < 1$, the sequence ${a(n)}$ has at least one term outside any wedge $\mathcal{W}(\theta_1, \theta_2)$ with $0 \theta_2 – \theta_1 < \pi$ for some $n \in [X, X+X^r)$ for all sufficiently large $X$.

These wedges can be thought of as just slightly smaller than a half-plane. For a complex number to escape a half plane is analogous to a real number changing sign. So we should think of this result as guaranteeing a sort of sign change in intervals of width $X^r$ for all sufficiently large $X$.

The intuition behind this result is very straightforward. If the sum of coefficients is small while the sum of the squares of the coefficients are large, then the sum of coefficients must experience a lot of cancellation. The fact that we can get quantitative results on the number of sign changes is merely a task of bookkeeping.

Both the statement and proof are based on very similar criteria for sign changes when ${a(n)}$ is a sequence of real numbers first noticed by Ram Murty and Jaban Meher. However, if in addition it is known that

\sum_{n \leq X} (a(n))^2 = c_2 X^{\gamma_2} + O(X^{\eta_2}),

and that $\max(\alpha+\beta, \eta_1, \eta_2) – (\max(\gamma_1, \gamma_2) – 1) < r < 1$, then generically both sequences ${\text{Re} (a(n)) }$ and ${ \text{Im} (a(n)) }$ contain at least one sign change for some $n$ in $[X , X + X^r)$ for all sufficiently large $X$. In other words, we can detect sign changes for both the real and imaginary parts in intervals, which is a bit more special.

It is natural to ask for even more specific detection of sign changes. For instance, knowing specific information about the distribution of the arguments of $a(n)$ would be interesting, and very closely reltated to the Sato-Tate Conjectures. But we do not yet know how to investigate this distribution.

In practice, we often understand the various criteria for the application of these two sign changes results by investigating the Dirichlet series
\begin{align}
&\sum_{n \geq 1} \frac{a(n)}{n^s} \\
&\sum_{n \geq 1} \frac{S_f(n)}{n^s} \\
&\sum_{n \geq 1} \frac{\lvert S_f(n) \rvert^2}{n^s} \\
&\sum_{n \geq 1} \frac{S_f(n)^2}{n^s},
\end{align}
where

S_f(n) = \sum_{m \leq n} a(n).

In the case of holomorphic cusp forms, the two previous joint projects with this group investigated exactly the Dirichlet series above. In the paper, we formulate some slightly more general criteria guaranteeing sign changes based directly on the analytic properties of the Dirichlet series involved.

In this paper, we apply our sign change results to our previous work to show that $S_f(n)$ changes sign in each interval $[X, X + X^{\frac{2}{3} + \epsilon})$ for sufficiently large $X$. Further, if there are coefficients with $\text{Im} a(n) \neq 0$, then the real and imaginary parts each change signs in those intervals.

We apply our sign change results to single coefficients of $\text{GL}(2)$ cusp forms (and specifically full integral weight holomorphic cusp forms, half-integral weight holomorphic cusp forms, and Maass forms). In large part these are minor improvements over folklore and what is known, except for the extension to complex coefficients.

We also apply our sign change results to single isolated coefficients $A(1,m)$ of $\text{GL}(3)$ Maass forms. This seems to be a novel result, and adds to the very sparse literature on sign changes of sequences associated to $\text{GL}(3)$ objects. Murty and Meher recently proved a general sign change result for $\text{GL}(n)$ objects which is similar in feel.

As a final application, we also consider sign changes of partial sums of $\nu$-normalized coefficients. Let

S_f^\nu(X) := \sum_{n \leq X} \frac{a(n)}{n^{\nu}}.

As $\nu$ gets larger, the individual coefficients $a(n)n^{-\nu}$ become smaller. So one should expect that sign changes in ${S_f^\nu(n)}$ to change based on $\nu$. And in particular, as $\nu$ gets very large, the number of sign changes of $S_f^\nu$ should decrease.

Interestingly, in the case of holomorphic cusp forms of weight $k$, we are able to show that there are sign changes of $S_f^\nu(n)$ in intervals even for normalizations $\nu$ a bit above $\nu = \frac{k-1}{2}$. This is particularly interesting as $a(n) \ll n^{\frac{k-1}{2} + \epsilon}$, so for $\nu > \frac{k-1}{2}$ the coefficients are \emph{decreasing} with $n$. We are able to show that when $\nu = \frac{k-1}{2} + \frac{1}{6} – \epsilon$, the sequence ${S_f^\nu(n)}$ has at least one sign change for $n$ in $[X, 2X)$ for all sufficiently large $X$.

It may help to consider a simpler example to understand why this is surprising. Consider the classic example of a sequence of $b(n)$, where $b(n) = 1$ or $b(n) = -1$, randomly, with equal probability. Then the expected size of the sums of $b(n)$ is about $\sqrt n$. This is an example of \emph{square-root cancellation}, and such behaviour is a common point of comparison. Similarly, the number of sign changes of the partial sums of $b(n)$ is also expected to be about $\sqrt n$.

Suppose now that $b(n) = \frac{\pm 1}{\sqrt n}$. If the first term is $1$, then it takes more then the second term being negative to make the overall sum negative. And if the first two terms are positive, then it would take more then the following three terms being negative to make the overall sum negative. So sign changes of the partial sums are much rarer. In fact, they’re exceedingly rare, and one might barely detect more than a dozen through computational experiment (although one should still expect infinitely many).

This regularity, in spite of the decreasing size of the individual coefficients $a(n)n^{-\nu}$, suggests an interesting regularity in the sign changes of the individual $a(n)$. We do not know how to understand or measure this effect or its regularity, and for now it remains an entirely qualitative observation.

For more details and specific references, see the paper on the arXiv.

## Math 42 Spring 2016 Student Showcase

This spring, I taught Math 42: An Introduction to Elementary Number Theory at Brown University. An important aspect of the course was the final project. In these projects, students either followed up on topics that interested them from the semester, or chose and investigated topics related to number theory.  Projects could be done individual or in small groups.

I thought it would be nice to showcase some excellent student projects from my class. Most of the projects were quite good, and some showed extraordinary effort. Some students really dove in and used this as an opportunity to explore and digest a topic far more thoroughly than could possibly be expected from an introductory class such as this one. With the students’ permission, I’ve chosen five student projects (in no particular order) for a blog showcase (impressed by similar sorts  of showcases from Scott Aaronson).

• Factorization Techniques, by Elvis Nunez and Chris Shaw. In this project, Elvis and Chris look at Fermat Factorization, which looks to factor $n$ by expressing $n = a^2 – b^2$. Further, they investigate improvements to Fermat’s Algorithm by Dixon and Kraitchik. Following this line of investigation leads to the development of the modern quadratic sieve and factor base methods of factorization.

• Pseudoprimes and Carmichael Numbers, by Emily Riemer. Fermat’s Little Theorem is one of the first “big idea” theorems we encounter in the course, and we came back to it again and again throughout. Emily explored the Fermat’s Little Theorem as a primality test, leading to pseudoprimes, strong pseudoprimes, and Carmichael numbers. [As an aside, one of her references concerning Carmichael numbers were notes from an algebraic number theory class taught by Matt Baker, who first got me interested in number theory].

• Continued Fractions and Pell’s Equation, by Max Lahn and Jonathan Spiegel. As it happened, I did not have time to teach continued fractions in the course.  So Max and Jonathan decided to look at them on their own. They explore some ideas related to the convergence of continued fractions and see how one uses continued fractions to solve Pell’s Equation.

• Quantum Computing, by Edward Hu and Chris Long. Edward and Chris explore quantum computing with particular emphasis towards gaining some idea of how Shor’s factorization algorithm works. For some of the more complicated ideas, like the quantum Fourier transform, they make use of heuristic and analogy to purvey the main ideas.

• Fermat’s Last Theorem, by Dylan Groos, Natalie Schudrowitz, and Kenneth Berglund. Dylan, Natalie, and Kenneth provide a historical look at attacks on Fermat’s Last Theorem. They examine proofs for $n=4$ and Sophie Germaine’s remarkable advances. They also touch on elliptic curves and modular forms, hinting at some of the deep ideas lying beneath the surface.

Posted in Brown University, Math 420, Mathematics, Teaching | | 1 Comment