Tag Archives: math

Math 100 Fall 2016: Concluding Remarks

It is that time of year. Classes are over. Campus is emptying. Soon it will be mostly emptiness, snow, and grad students (who of course never leave).

I like to take some time to reflect on the course. How did it go? What went well and what didn’t work out? And now that all the numbers are in, we can examine course trends and data.

Since numbers are direct and graphs are pretty, let’s look at the numbers first.

Math 100 grades at a glance

Let’s get an understanding of the distribution of grades in the course, all at once.

These are classic box plots. The center line of each box denotes the median. The left and right ends of the box indicate the 1st and 3rd quartiles. As a quick reminder, the 1st quartile is the point where 25% of students received that grade or lower. The 3rd quartile is the point where 75% of students received that grade or lower. So within each box lies 50% of the course.

Each box has two arms (or “whiskers”) extending out, indicating the other grades of students. Points that are plotted separately are statistical outliers, which means that they are $1.5 \cdot (Q_3 – Q_1)$ higher than $Q_3$ or lower than $Q_1$ (where $Q_1$ denotes the first quartile and $Q_3$ indicates the third quartile).

A bit more information about the distribution itself can be seen in the following graph.

Within each blob, you’ll notice an embedded box-and-whisker graph. The white dots indicate the medians, and the thicker black parts indicate the central 50% of the grade. The width of the colored blobs roughly indicate how many students scored within that region. [As an aside, each blob actually has the same area, so the area is a meaningful data point].

So what can we determine from these graphs? Firstly, students did extremely well in their recitation sections and on the homework. I am perhaps most stunned by the tightness of the homework distribution. Remarkably, 75% of students had at least a 93 homework average. Recitation scores were very similar.

I also notice some patterns between the two midterms and final. The median on the first midterm was very high and about 50% of students earned a score within about 12 points of the median. The median on the second midterm was a bit lower, but the spread of the middle 50% of students was about the same. However the lower end was significantly lower on the second midterm in comparison to the first midterm. The median on the final was significantly lower, and the 50% spread was much, much larger.

Looking at the Overall grade, it looks very similar to the distribution of the first midterm, except shifted a bit.

It is interesting to note that that Recitation (10%), Homework (20%), and the First Midterm (20%) accounted for 50% of the course grade; the Second Midterm (20%) and the Final (30%) accounted for the other 50% of the course grade. The Recitation, Homework, and First Midterm grades pulled the Overall grade distribution up, while the Second Midterm and Final pulled the Overall grade distribution down.

Correlation between assignments and Overall Grade

I post the question: was any individual assignment type a good predictor of the final grade? For example, to what extent can we predict your final grade based on your First Midterm grade?

No, doing well on homework is a terrible predictor of final grade. The huge vertical cluster of dots indicates that the overall grades vary significantly over a very small amount of homework. However, I note that doing poorly on homework is a great predictor of doing poorly overall. No one whose homework average was below an 80 got an A in the course. Having a homework grade below a 70 is a very strong indicator of failing the course. In terms of Pearson’s R correlation, one might say that about 40% of overall performance is predicted from performance on homework (which is very little).

Although drastic, this is in line with my expectations for calculus courses. This is perhaps a bit more extreme than normal — the level of clustering in the homework averages is truly stunning. Explaining this is a bit hard. It is possible to get homework help from the instructor or TA, or to work with other students, or to get help from the Math Resource Center or other tutoring. It is also possible to cheat, either with a solutions manual (which I know some students have), or a paid answer service (which I also witnessed), or to check answers on a computer algebra system like WolframAlpha. Each of these weakens the relationship between homework as an indicator of mastery.

In the calculus curriculum at Brown, I think it’s safe to say that homework plays a formative role instead of a normative role. It serves to provide opportunities for students to work through and learn material, and we don’t expect the grades to correspond strongly to understanding. To that end, half of the homework isn’t even collected.

The two midterms each correlate pretty strongly with Overall grade. In particular, the second midterm visually indicates really strong correlation. Statistically speaking (i.e. from Pearson’s R), it turns out that 67% of the Overall grade can be predict from the First Midterm (higher than might be expected) and 80% can be predicted from the Second Midterm (which is really, really high).

If we are willing to combine some pieces of information, the Homework and First Midterm (together) predict 77% of the Overall grade. As each student’s initial homework effort is very indicative of later homework, this means that we can often predict a student’s final grade (to a pretty good accuracy) as early as the first midterm.

(The Homework and the Second Midterm together predict 85% of the Overall grade. The two midterms together predict 88% of the Overall grade.)

This has always surprised me a bit, since for many students the first midterm is at least partially a review of material taught before. However, this course is very cumulative, so it does make sense that doing poorly on earlier tests indicates a hurdle that must be overcome in order to succeed on later tests. This is one of the unforgiving aspects of math, the sciences, and programming — early disadvantages compound. I’ve noted roughly this pattern in the past as well.

However the correlation between the Final and the Overall grade is astounding. I mean, look at how much the relationship looks like a line. Even the distributions (shown around the edges) look similar. Approximately 90% of the Overall grade is predicted by the grade on the Final Exam.

This is a bit atypical. One should expect a somewhat high correlation, as the final exam is cumulative and covers everything from the course (or at least tries to). But 90% is extremely high.

I think one reason why this occurred this semester is that the final exam was quite hard. It was distinctly harder than the midterms (though still easier than many of the homework problems). A hard final gives more opportunities for students who really understand the material to demonstrate their mastery. Conversely, a hard final punishes students with only a cursory understanding. Although stressful, I’ve always been a fan of exams that are difficult enough to distinguish between students, and to provide a chance for students to catch up. See Chances for a Comeback below for more on this.

Related statistics of interest might concern to what extent performance on the First Midterm predicts performance on the Second Midterm (44%) or the Final Exam (48%), or to what extent the Second Midterm predicts performance on the Final Exam (63%).

Homework and Recitations

As mentioned above, homework performance is a terrible predictor of course grade. I thought it was worth diving into a bit more. Does homework performance predict anything well? The short answer is not really.

Plotting Homework grade vs the First Midterm shows such a lack of meaning that it doesn’t even make sense to try to draw a line of best fit.

To be fair, homework is a better predictor of performance on the Second Midterm and Final Exam, but it’s still very bad.

Here’s a related question: what about Recitation sections? Are these good predictors of any other aspect of the course?

Plotting Recitation vs Homework is sort of interesting. Evidently, most people did very well on both homework and recitation. It is perhaps no surprise that most students who did very well in Recitation also did very well on their Homework, and vice versa. However it turns out that there are more people with high recitation grades and low homework grades than the other way around. But thinking about it, this makes sense.

These distributions are so tight that it still doesn’t make sense to try to draw a line of best fit or to talk about Pearson coefficients – most variation is simply too small to be meaningful.

Together, Homework and Recitation predict a measly 50% of the Overall grade of the course (in the Pearson’s R sense). One would expect more, as Homework and Recitation are directly responsible for 30% of the Overall grade, and one would expect homework and recitation to correlate at least somewhat meaningfully with the rest of graded content of the course, right?

I guess not.

So what does this mean about recitation and homework? Should we toss them aside? Does something need to be changed?

I would say “Not necessarily,” as it is important to recognize that not all grades are equal. Both homework and recitation are the places for students to experiment and learn. Recitations are supposed to be times where students are still learning material. They are to be inoffensive and safe, where students can mess up, fall over, and get back up again with the help of their peers and TA. I defend the lack of stress on grade or challenging and rigorous examination during recitation.

Homework is sort of the same, and sort of completely different. What gives me pause concerning homework is that homework is supposed to be the barometer by which students can measure their own understanding. When students ask us about how they should prepare for exams, our usual response is “If you can do all the homework (including self-check) without referencing the book, then you will be well-prepared for the exam.” If homework grade is such a poor predictor of exam grades, then is it possible that homework gives a poor ruler for students to measure themselves by?

I’m not sure. Perhaps it would be a good idea to indicate all the relevant questions in the textbook so that students have more problems to work on. In theory, students could do this themselves (and for that matter, I’m confident that such students would do very well in the course). But the problem is that we only cover a subset of the material in most sections of the textbook, and many questions (even those right next to ones we assign) require ideas or concepts that we don’t teach.

On the other hand, learning how to actually learn is a necessary skill, and probably one that most people struggle with when they first actually have to learn it. It’s necessary to learn it sooner or later.

Chances for a Comeback

The last numerical aspect I’ll consider is about whether or not it is possible to come back after doing badly on an earlier assessment. There are two things to consider: whether it is actually feasible or not, and whether any students did make it after a poor initial/early performance.

As to whether it is possible, the answer is yes (but it may be hard). And the reason why is that the Second Midterm and Final grades were each relatively low. It may be counterintuitive, but in order to return from a failing grade, it is necessary that there be enough room to actually come back.

Suppose Aiko is a pretty good student, but it just so happens that she makes a 49 on the first midterm due to some particular misunderstanding. If the class average on every assessment is a 90, then Aiko cannot claw her way back. That is, even if Aiko makes a 100 on everything else, Aiko’s final grade would be below a 90, and thus below average. Aiko would probably make a B.

In this situation, the class is too easy, and thus there are no chances for students to overcome a setback on any single exam.

On the other hand, suppose that Bilal makes a 49 on the first midterm, but that the class average is a 75 overall. If Bilal makes a 100 on everything else, Bilal will  end with just below a 90, significantly above the class average. Bilal would probably make an A.

In this course, the mean overall was a 78, and the standard deviation was about 15. In this case, an 89 would be an A. So there was enough space and distance to overcome even a disastrous exam.

But, did anyone actually do this? The way I like to look at this is to look at changes in relative performance in terms of standard deviations away from the mean. Performing at one standard deviation below the mean on Midterm 1 and one standard deviation above the mean on Midterm 2 indicates a more meaningful amount of grade fluidity than merely looking at points above or below the mean

Looking at the First Midterm vs the Second Midterm, we see that there is a rough linear relationship (Pearson R suggests 44% predictive value). That’s to be expected. What’s relevant now are the points above or below the line $y = x$. To be above the line $y = x$ means that you did better on the Second Midterm than you did on the First Midterm, all in comparison to the rest of the class. To be below means the opposite.

Even more relevant is to be in the Fourth Quadrant, which indicates that you did worse than average on the first midterm and better than average on the second. Looking here, there is a very healthy amount of people who are in the Fourth Quadrant. There are many people who changed by 2 standard deviations in comparison to the mean — a very meaningful change. [Many people lost a few standard deviations too – grade mobility is a two way street].

The First Midterm to the Overall grade shows healthy mobility as well.

The Second Midterm to Overall shows some mobility, but it is interesting that more people lost ground (by performing well on the Second Midterm, and then performing badly Overall) than gained ground (by performing badly on the Second Midterm, but performing well Overall).

Although I don’t show the plots, this trend carries through pretty well. Many people were able to salvage or boost a letter grade based solely on the final (and correspondingly many people managed to lose just enough on the final to drop a letter grade). Interestingly, very many people were able to turn a likely B into an A through the final.

So overall, I would say that it was definitely possible to salvage grades this semester.

If you’ve never thought about this before, then keep this in mind the next time you hear complaints about a course with challenging exams — it gives enough space for students to demonstrate sufficient understanding to make up for a bad past assessment.

Non-Numerical Reflection

The numbers tell some characteristics of the class, but not the whole story.

Standard Class Materials

We used Thomas’ Calculus. I think this is an easy book to teach from, and relatively easy to read. It feels like many other cookie-cutter calculus books (such as Larson and Edwards or Stewart). But it’s quite expensive for students. However, as we do not use an electronic homework component (which seems to be becoming more popular elsewhere), at least students can buy used copies (or use other methods of procural).

However, solutions manuals are available online (I noticed some students had copies). Some of the pay-for sites have complete (and mostly but not entirely correct) provided solutions manuals as well. This makes some parts of Thomas challenging to use, especially as we do not write our own homework to give. I suppose that this is a big reason why one might want to use an electronic system.

The book has much more material in it than we teach. For instance, the book includes all of a first semester of calculus, and also more details in many sections. We avoid numerical integration, Fourier series, some applications, some details concerning polar and parametric plots, etc. Ideally, there would exist a book catering to exactly our needs. But there isn’t, so I suppose Thomas is about as good as any.

I’ve now taught elementary calculus for a few years, and I’m surprised at how often I am able to reuse two notes I wrote years ago, namely the refresher on first semester calculus (An Intuitive Introduction to Calculus) and my additional note on Taylor series (An Intuitive Overview of Taylor Series). Perhaps more surprisingly, I’m astounded at how often people from other places link to and visit these two notes (and in particular, the Taylor Series note).

These were each written for a Math 100 course in 2013. So my note to myself is that there is good value in writing something well enough that I can reuse it, and others might even find it valuable.

Unfortunately, while I wrote a few notes this semester, I don’t think that they will have the same lasting appeal. The one I wrote on the series convergence tests is something that (perhaps after one more round of editing) I will use each time I teach this subject in the future. I’m tremendously happy with my note on computing $\pi$ with Math 100 tools, but as it sits outside the curriculum, many students won’t actually read it. [However many did read it, and it generated many interesting conversations about actual mathematics]. Perhaps sometime I will teach a calculus class ending with some sort of project, as computing $\pi$ leads to very many interested and interrelated project thoughts.

Course Content

I must admit that I do not know why this course is the way it is, and this bothers me a bit. In many ways, this course is a grab bag of calculus nuggets. Presumably each piece was added in because it is necessary in sufficiently many other places, or is so directly related to the “core material” of this course, that it makes sense to include it. But from what I can tell, these reasons have been lost to the sands of time.

The core material in this course are: Integration by Parts, Taylor’s Theorem, Parametric and Polar coordinates, and First Order Linear Differential Equations. We also spend a large amount of time towards other techniques of integration (partial fraction decomposition, trig substitution) and understanding generic series (including the various series convergence/divergence tests). Along the way, there are some seemingly arbitrary decisions on what to include or exclude. For instance, we learn how to integrate

$$\int \sin^n x \cos^m x \; dx$$

because we have decided that being able to perform trigonometric substitution in integrals is a good idea. But we omit integrals like

$$\int \sin(nx) \sin(mx) \; dx$$

which would come up naturally in talking about Fourier series. Fourier series fit naturally into this class, and in some variants of this class they are taught. But so does trigonometric substitution! So what is the rationale here? If the answer is to become better at problem solving or to develop mathematical maturity, then I think it would be good to recognize that so that we know what we should feel comfortable wiggling to build and develop the curriculum in the future. [Also, students should know that calculus is not a pinnacle. See for instance this podcast with Steven Strogatz on Innovation Hub.]

This is not restricted to Brown. I’m familiar with the equivalent of this course at other institutions, and there are similar seemingly arbitrary differences in what to include or exclude. For years at Georgia Tech, they tossed in a several week unit on linear algebra into this course [although I’ve learned that they stopped that in the past two years]. The AP Calc BC curriculum includes trig substitution but not Fourier series. Perhaps they had a reason?

What this means to me is that the intent of this course has become muddled, and separated from the content of the course. This is an overwhelmingly hard task to try to fix, as a second semester of calculus fits right in the middle of so many other pieces. Yet I would be very grateful to the instructor who sits down and identifies reasons for or against inclusion of the various topics in this course, or perhaps cuts the calculus curriculum into pieces and rearranges them to fit modern necessities.

A Parachute is only necessary to go skydiving twice

This is the last class I teach at Brown as a graduate student (and most likely, ever). Amusingly, I taught it in the same room as the first course I taught as a graduate student. I’ve learned quite a bit about teaching inbetween, but in many ways it feels the same. Just like for students, the only scary class is the first one, although exams can be a real pain (to take, or to grade).

It’s been a pleasure. As usual, if you have any questions, please let me know.

Posted in Brown University, Math 100, Mathematics, Teaching | Tagged , , , , | Leave a comment

Paper: Sign Changes of Coefficients and Sums of Coefficients of Cusp Forms

This is joint work with Thomas Hulse, Chan Ieong Kuan, and Alex Walker, and is a another sequel to our previous work. This is the third in a trio of papers, and completes an answer to a question posed by our advisor Jeff Hoffstein two years ago.

We have just uploaded a preprint to the arXiv giving conditions that guarantee that a sequence of numbers contains infinitely many sign changes. More generally, if the sequence consists of complex numbers, then we give conditions that guarantee sign changes in a generalized sense.

Let $\mathcal{W}(\theta_1, \theta_2) := { re^{i\theta} : r \geq 0, \theta \in [\theta_1, \theta_2]}$ denote a wedge of complex plane.

Suppose ${a(n)}$ is a sequence of complex numbers satisfying the following conditions:

1. $a(n) \ll n^\alpha$,
2. $\sum_{n \leq X} a(n) \ll X^\beta$,
3. $\sum_{n \leq X} \lvert a(n) \rvert^2 = c_1 X^{\gamma_1} + O(X^{\eta_1})$,

where $\alpha, \beta, c_1, \gamma_1$, and $\eta_1$ are all real numbers $\geq 0$. Then for any $r$ satisfying $\max(\alpha+\beta, \eta_1) – (\gamma_1 – 1) < r < 1$, the sequence ${a(n)}$ has at least one term outside any wedge $\mathcal{W}(\theta_1, \theta_2)$ with $0 \theta_2 – \theta_1 < \pi$ for some $n \in [X, X+X^r)$ for all sufficiently large $X$.

These wedges can be thought of as just slightly smaller than a half-plane. For a complex number to escape a half plane is analogous to a real number changing sign. So we should think of this result as guaranteeing a sort of sign change in intervals of width $X^r$ for all sufficiently large $X$.

The intuition behind this result is very straightforward. If the sum of coefficients is small while the sum of the squares of the coefficients are large, then the sum of coefficients must experience a lot of cancellation. The fact that we can get quantitative results on the number of sign changes is merely a task of bookkeeping.

Both the statement and proof are based on very similar criteria for sign changes when ${a(n)}$ is a sequence of real numbers first noticed by Ram Murty and Jaban Meher. However, if in addition it is known that

\sum_{n \leq X} (a(n))^2 = c_2 X^{\gamma_2} + O(X^{\eta_2}),

and that $\max(\alpha+\beta, \eta_1, \eta_2) – (\max(\gamma_1, \gamma_2) – 1) < r < 1$, then generically both sequences ${\text{Re} (a(n)) }$ and ${ \text{Im} (a(n)) }$ contain at least one sign change for some $n$ in $[X , X + X^r)$ for all sufficiently large $X$. In other words, we can detect sign changes for both the real and imaginary parts in intervals, which is a bit more special.

It is natural to ask for even more specific detection of sign changes. For instance, knowing specific information about the distribution of the arguments of $a(n)$ would be interesting, and very closely reltated to the Sato-Tate Conjectures. But we do not yet know how to investigate this distribution.

In practice, we often understand the various criteria for the application of these two sign changes results by investigating the Dirichlet series
\begin{align}
&\sum_{n \geq 1} \frac{a(n)}{n^s} \\
&\sum_{n \geq 1} \frac{S_f(n)}{n^s} \\
&\sum_{n \geq 1} \frac{\lvert S_f(n) \rvert^2}{n^s} \\
&\sum_{n \geq 1} \frac{S_f(n)^2}{n^s},
\end{align}
where

S_f(n) = \sum_{m \leq n} a(n).

In the case of holomorphic cusp forms, the two previous joint projects with this group investigated exactly the Dirichlet series above. In the paper, we formulate some slightly more general criteria guaranteeing sign changes based directly on the analytic properties of the Dirichlet series involved.

In this paper, we apply our sign change results to our previous work to show that $S_f(n)$ changes sign in each interval $[X, X + X^{\frac{2}{3} + \epsilon})$ for sufficiently large $X$. Further, if there are coefficients with $\text{Im} a(n) \neq 0$, then the real and imaginary parts each change signs in those intervals.

We apply our sign change results to single coefficients of $\text{GL}(2)$ cusp forms (and specifically full integral weight holomorphic cusp forms, half-integral weight holomorphic cusp forms, and Maass forms). In large part these are minor improvements over folklore and what is known, except for the extension to complex coefficients.

We also apply our sign change results to single isolated coefficients $A(1,m)$ of $\text{GL}(3)$ Maass forms. This seems to be a novel result, and adds to the very sparse literature on sign changes of sequences associated to $\text{GL}(3)$ objects. Murty and Meher recently proved a general sign change result for $\text{GL}(n)$ objects which is similar in feel.

As a final application, we also consider sign changes of partial sums of $\nu$-normalized coefficients. Let

S_f^\nu(X) := \sum_{n \leq X} \frac{a(n)}{n^{\nu}}.

As $\nu$ gets larger, the individual coefficients $a(n)n^{-\nu}$ become smaller. So one should expect that sign changes in ${S_f^\nu(n)}$ to change based on $\nu$. And in particular, as $\nu$ gets very large, the number of sign changes of $S_f^\nu$ should decrease.

Interestingly, in the case of holomorphic cusp forms of weight $k$, we are able to show that there are sign changes of $S_f^\nu(n)$ in intervals even for normalizations $\nu$ a bit above $\nu = \frac{k-1}{2}$. This is particularly interesting as $a(n) \ll n^{\frac{k-1}{2} + \epsilon}$, so for $\nu > \frac{k-1}{2}$ the coefficients are \emph{decreasing} with $n$. We are able to show that when $\nu = \frac{k-1}{2} + \frac{1}{6} – \epsilon$, the sequence ${S_f^\nu(n)}$ has at least one sign change for $n$ in $[X, 2X)$ for all sufficiently large $X$.

It may help to consider a simpler example to understand why this is surprising. Consider the classic example of a sequence of $b(n)$, where $b(n) = 1$ or $b(n) = -1$, randomly, with equal probability. Then the expected size of the sums of $b(n)$ is about $\sqrt n$. This is an example of \emph{square-root cancellation}, and such behaviour is a common point of comparison. Similarly, the number of sign changes of the partial sums of $b(n)$ is also expected to be about $\sqrt n$.

Suppose now that $b(n) = \frac{\pm 1}{\sqrt n}$. If the first term is $1$, then it takes more then the second term being negative to make the overall sum negative. And if the first two terms are positive, then it would take more then the following three terms being negative to make the overall sum negative. So sign changes of the partial sums are much rarer. In fact, they’re exceedingly rare, and one might barely detect more than a dozen through computational experiment (although one should still expect infinitely many).

This regularity, in spite of the decreasing size of the individual coefficients $a(n)n^{-\nu}$, suggests an interesting regularity in the sign changes of the individual $a(n)$. We do not know how to understand or measure this effect or its regularity, and for now it remains an entirely qualitative observation.

For more details and specific references, see the paper on the arXiv.

Estimating the number of squarefree integers up to $X$

I recently wrote an answer to a question on MSE about estimating the number of squarefree integers up to $X$. Although the result is known and not too hard, I very much like the proof and my approach. So I write it down here.

First, let’s see if we can understand why this “should” be true from an analytic perspective.

We know that
$$\sum_{n \geq 1} \frac{\mu(n)^2}{n^s} = \frac{\zeta(s)}{\zeta(2s)},$$
and a general way of extracting information from Dirichlet series is to perform a cutoff integral transform (or a type of Mellin transform). In this case, we get that
$$\sum_{n \leq X} \mu(n)^2 = \frac{1}{2\pi i} \int_{(2)} \frac{\zeta(s)}{\zeta(2s)} X^s \frac{ds}{s},$$
where the contour is the vertical line $\text{Re }s = 2$. By Cauchy’s theorem, we shift the line of integration left and poles contribute terms or large order. The pole of $\zeta(s)$ at $s = 1$ has residue
$$\frac{X}{\zeta(2)},$$
so we expect this to be the leading order. Naively, since we know that there are no zeroes of $\zeta(2s)$ on the line $\text{Re } s = \frac{1}{2}$, we might expect to push our line to exactly there, leading to an error of $O(\sqrt X)$. But in fact, we know more. We know the zero-free region, which allows us to extend the line of integration ever so slightly inwards, leading to a $o(\sqrt X)$ result (or more specifically, something along the lines of $O(\sqrt X e^{-c (\log X)^\alpha})$ where $\alpha$ and $c$ come from the strength of our known zero-free region.

In this heuristic analysis, I have omitted bounding the top, bottom, and left boundaries of the rectangles of integration. But proceeding in a similar way as in the proof of the analytic prime number theorem, you could proceed here. So we expect the answer to look like
$$\frac{X}{\zeta(2)} + O(\sqrt X e^{-c (\log X)^\alpha})$$
using no more than the zero-free region that goes into the prime number theorem.

We will now prove this result, but in an entirely elementary way (except that I will refer to a result from the prime number theorem). This is below the fold.

Posted in Expository, Math.NT, Mathematics | | Leave a comment

Notes from a talk on the Mean Value Theorem

1. Introduction

When I first learned the Mean Value Theorem and the Intermediate Value Theorem, I thought they were both intuitively obvious and utterly useless. In one of my courses in analysis, I was struck when, after proving the Mean Value Theorem, my instructor said that all of calculus was downhill from there. But it was a case of not being able to see the forest for the trees, and I missed the big picture.

I have since come to realize that almost every major (and often, minor) result of calculus is a direct and immediate consequence of the Mean Value Theorem and the Intermediate Value Theorem. In this talk, we will focus on the forest, the big picture, and see the Mean Value Theorem for what it really is: the true Fundamental Theorem of Calculus.

Posted in Expository, Mathematics | | 2 Comments

Continuity of the Mean Value

1. Introduction

When I first learned the mean value theorem as a high schooler, I was thoroughly unimpressed. Part of this was because it’s just like Rolle’s Theorem, which feels obvious. But I think the greater part is because I thought it was useless. And I continued to think it was useless until I began my first proof-oriented treatment of calculus as a second year at Georgia Tech. Somehow, in the interceding years, I learned to value intuition and simple statements.

I have since completely changed my view on the mean value theorem. I now consider essentially all of one variable calculus to be the Mean Value Theorem, perhaps in various forms or disguises. In my earlier note An Intuitive Introduction to Calculus, we state and prove the Mean Value Theorem, and then show that we can prove the Fundamental Theorem of Calculus with the Mean Value Theorem and the Intermediate Value Theorem (which also felt silly to me as a high schooler, but which is not silly).

In this brief note, I want to consider one small aspect of the Mean Value Theorem: can the “mean value” be chosen continuously as a function of the endpoints? To state this more clearly, first recall the theorem:

Suppose ${f}$ is a differentiable real-valued function on an interval ${[a,b]}$. Then there exists a point ${c}$ between ${a}$ and ${b}$ such that $$\frac{f(b) – f(a)}{b – a} = f'(c), \tag{1}$$
which is to say that there is a point where the slope of ${f}$ is the same as the average slope from ${a}$ to ${b}$.

What if we allow the interval to vary? Suppose we are interested in a differentiable function ${f}$ on intervals of the form ${[0,b]}$, and we let ${b}$ vary. Then for each choice of ${b}$, the mean value theorem tells us that there exists ${c_b}$ such that $$\frac{f(b) – f(0)}{b} = f'(c_b).$$
Then the question we consider today is, as a function of ${b}$, can ${c_b}$ be chosen continuously? We will see that we cannot, and we’ll see explicit counterexamples. This, after the fold.

Review of How Not to Be Wrong by Jordan Ellenberg

Almost 100 years ago as I write this, on 21 October 1914, Martin Gardner was born. He wrote a popular “Mathematical Games” column for Scientific American from 1957 to 1981, introducing a wide audience to fun and recreational mathematics. His influence and writing were so profound that many of his subjects are still popular today. Notable examples include:

Conway’s Game of Life

After its first public appearance in Gardner’s Scientific American column in October 1970, Conway’s Game of Life grew to enormous popularity and interest.

Flexagons

The column “Mathematical Games” started with Gardner’s article on flexagons. The editor of Scientific American thought Gardner’s flexagons was engaging, and suggested that Gardner write a regular column. Fortunately, Gardner acceded.

Public Key Cryptography

The first major public key cryptosystem, the RSA system, first appeared in Gardner’s August 1977 column. (Their formal paper appeared in 1978 in Communications of the Association for Computing Machinery). Now, public key cryptography is used everywhere, all the time, mostly without the conscious thought of the user.

Martin Gardner was in constant contact with many mathematicians, and always looked for interesting recreational mathematics to share with his readers. He inspired an entire generation of mathematicians and math enthusiasts. He also inspired others to pursue popular mathematics writing (and blogging, and even youtubing, such as the excellent series produced by Vi Hart).

The current issue(October/November 2014) of the MAA Focus, a mathematical newsmagazine from the American Mathematical Association, features Martin Gardner. In addition to describing some of Gardner’s contributions and legacy, the article includes a quote from Gardner: “I’ve always thought that the best way to get students interested in mathematics is to give them something that has a recreational flavor — a puzzle or a magic trick or a paradox, or something like that. I think that hooks their interest faster than anything else.” Later, he is also quoted to say “It’s good to to know much about mathematics. I have to work hard to understand anything that I am writing about, so that makes it easier for me to explain it, perhaps, in a way that the general public can understand.”

(As an aside, the Doctor has noted the lack of recreational mathematics in school too)

It is in this noble succession that I consider Jordan Ellenberg’s recent book How Not to Be Wrong: The Power of Mathematical Thinking, for Ellenberg has made a significant effort to make an approachable, inspiring work (even though it’s not recreational math). After reading the book, it seems clear to me that Ellenberg’s beliefs about how to interest people in mathematics mirror Gardner’s. This book is full of paradoxes and magic tricks. Or rather this book is full of captivating stories each centered around a problem or misconception, and whose resolution comes through careful and explicit reasoning.

Ellenberg presents mathematics as “an extension of common sense by other means,” but I get the feeling that he means to blur what it means to be “common sense” and what it means to be “other means” as the book advances. Much as a textbook or college course eases students into the subject, starting simple and getting progressively deeper, Ellenberg starts with problems that are undeniably simple logic and ends with ideas that are truly profound.

The reader is engaged within the first five pages. After a quick justification about learning mathematics — mathematics is reason, and allows for deeper understanding of the world around us — Ellenberg demonstrates that this is not an abstract book about abstract mathematics, but is instead full of actual examples. And he begins with a tale about Abraham Wald, a mathematician and statistician considering where to reinforce the armor of planes during the Second World War. The writing is conversational, as though this were an oral history transcribed and kept safe in written word. To support the claim that mathematics is an extension of common sense, the book alternates between explaining and setting up problems and careful, but common sensical, analysis. And most of the time, he proceeds without overwhelming the reader with arithmetic details or a flood of equations.

Mathematics is not arithmetic. Yes, arithmetic is one tiny part of mathematics, but mathematics is much more. The typical student is overexposed to arithmetic and underexposed to mathematics. Stories like Abraham Wald seek to rectify this imbalance by demonstrating more mathematics. And later stories, like the chapter about challenges facing Netflix analytics — how does Netflix know what movies to recommend? — use equations and arithmetic to support the underlying mathematics.

It might seem like a delicate arrangement to go through so much mathematical reasoning with so little arithmetic, but Ellenberg succeeds. Part of this is certainly that this is a book full of what he calls “simple and profound” mathematics. The simple is what allows the conversational tone. The profound is what makes it interesting. But the larger part is that Ellenberg’s thesis, math is common sense and allows for deeper understanding of the world around us, is fundamentally true. And quite beautiful.

Ellenberg does truly get to some profound mathematics. Some of the chapter is common material for popular mathematics: survivorship bias, statistics lie, and the high likelihood of coincidence, for instance. But in many ways, he goes deeper, and more profound, than I would expect. He examines with great detail the divide in statistics between Fisher’s “significance testing” and Pearson’s “hypothesis testing,” and evidences deep dissatisfaction with the accepted standard in experiments and hypothesis testing of “Reductio ad unlikely.” He not only mentions and cautions misunderstandings about conditional probabilities, but also undertakes Bayesian inference as a decision-making model, perhaps even a good model for how we make our own decisions.

He makes a strong point that sometimes, mathematics does not have all the answers. Or more pertinently, sometimes the answer from mathematics is inaction. For this is action, this not being sure! Although Ellenberg never says it, he hints at the fact that saying anything meaningful about anything at all can be really hard, and sometimes even impossible.

Of course, the book is not without its flaws. Most chapters have their central players, central ideas, and a sort of take home message. But I found the last two chapters to suffer from a bit of indigestion. This might be because they concern the very idea of “existence.” Does public opinion exist as a measurable, or even well-defined quantity? Lurking beneath these two chapters is the problem of designing good, accurate voting systems. Though Ellenberg emphasizes Arrow’s paradox on the impossibility of having a rank order voting system that accurately reflects community opinion, this message is muddied.

And though Ellenberg confronts some of the common misunderstandings of mathematics, like thinking that all mathematics is simple arithmetic and boring, there is one more misunderstanding that I wish he tackled more explicitly, which is that there is room for more mathematics all the time. It is easy to read this book, look at how common sense and mathematics can feel so alike, and sleep comfortably under the sheets at night knowing that these mathematicians have solved all these hard problems for us. But really, more mathematics is needed in both academic and ordinary walks of life.

I think it should also be mentioned that Ellenberg’s rejection of the cult of genius, including the idea that it takes a genius to succeed in mathematics and the far worse idea that we depend on geniuses to progress the sciences, is both good and from an interesting position. Ellenberg was one of the child “geniuses” in the Study of Mathematically Precocious Youth, which found and followed high-performing children and followed them throughout their lives. In an article in the Wall Street Journal, Ellenberg wrote of the dangers of the cult of genius. He also wrote that we need more math majors who don’t become mathematicians. Math and the sciences are not only progressed by the top 0.01 percent, but instead are more often advanced by the hard work and determination of someone who pursued their interests and ignored the cult. For more, read his article. It’s not very long, and it rounds out the end of “How Not to Be Wrong” very nicely.

Ultimately, “How Not to Be Wrong” is a great read that I highly recommend, both to a mathematical and non-mathematical crowd. It’s an engaging and educational read that’s not afraid to do some real math. After finishing the book, part of me wondered if more mathematics should be taught against the history of the mathematicians themselves. Why is it that I learned the development and logic of chemistry along the lives of the chemists of the past while in middle and high school, but I heard almost no mathematician’s name until I began to major in college? This book is literally a tour-de-mathematical-force throughout recent history, and in the spirit of Martin Gardner. I look forward to reading more of his work.

Posted in Book Review, Mathematics | | Leave a comment

Another proof of Wilson’s Theorem

While teaching a largely student-discovery style elementary number theory course to high schoolers at the Summer@Brown program, we were looking for instructive but interesting problems to challenge our students. By we, I mean Alex Walker, my academic little brother, and me. After a bit of experimentation with generators and orders, we stumbled across a proof of Wilson’s Theorem, different than the standard proof.

Wilson’s theorem is a classic result of elementary number theory, and is used in some elementary texts to prove Fermat’s Little Theorem, or to introduce primality testing algorithms that give no hint of the factorization.

Theorem 1 (Wilson’s Theorem) For a prime number ${p}$, we have $$(p-1)! \equiv -1 \pmod p. \tag{1}$$

The theorem is clear for ${p = 2}$, so we only consider proofs for “odd primes ${p}$.”

The standard proof of Wilson’s Theorem included in almost every elementary number theory text starts with the factorial ${(p-1)!}$, the product of all the units mod ${p}$. Then as the only elements which are their own inverses are ${\pm 1}$ (as ${x^2 \equiv 1 \pmod p \iff p \mid (x^2 – 1) \iff p\mid x+1}$ or ${p \mid x-1}$), every element in the factorial multiples with its inverse to give ${1}$, except for ${-1}$. Thus ${(p-1)! \equiv -1 \pmod p.} \diamondsuit$

Now we present a different proof.

Take a primitive root ${g}$ of the unit group ${(\mathbb{Z}/p\mathbb{Z})^\times}$, so that each number ${1, \ldots, p-1}$ appears exactly once in ${g, g^2, \ldots, g^{p-1}}$. Recalling that ${1 + 2 + \ldots + n = \frac{n(n+1)}{2}}$ (a great example of classical pattern recognition in an elementary number theory class), we see that multiplying these together gives ${(p-1)!}$ on the one hand, and ${g^{(p-1)p/2}}$ on the other.

As ${g^{(p-1)/2}}$ is a solution to ${x^2 \equiv 1 \pmod p}$, and it is not ${1}$ since ${g}$ is a generator and thus has order ${p-1}$. So ${g^{(p-1)/2} \equiv -1 \pmod p}$, and raising ${-1}$ to an odd power yields ${-1}$, completing the proof $\diamondsuit$.

After posting this, we have since seen that this proof is suggested in a problem in Ireland and Rosen’s extremely good number theory book. But it was pleasant to see it come up naturally, and it’s nice to suggest to our students that you can stumble across proofs.

It may be interesting to question why ${x^2 \equiv 1 \pmod p \iff x \equiv \pm 1 \pmod p}$ appears in a fundamental way in both proofs.

This post appears on the author’s personal website davidlowryduda.com and on the Math.Stackexchange Community Blog math.blogoverflow.com. It is also available in pdf note form. It was typeset in \TeX, hosted on WordPress sites, converted using the utility github.com/davidlowryduda/mse2wp, and displayed with MathJax.

Posted in Expository, Math.NT, Mathematics | | 1 Comment

Trigonometric and related substitutions in integrals

$\DeclareMathOperator{\csch}{csch}$
$\DeclareMathOperator{\sech}{sech}$
$\DeclareMathOperator{\arsinh}{arsinh}$

1. Introduction

In many ways, a first semester of calculus is a big ideas course. Students learn the basics of differentiation and integration, and some of the big-hitting theorems like the Fundamental Theorems of Calculus. Even in a big ideas course, students learn how to differentiate any reasonable combination of polynomials, trig, exponentials, and logarithms (elementary functions).

But integration skills are not pushed nearly as far. Do you ever wonder why? Even at the end of the first semester of calculus, there are many elementary functions that students cannot integrate. But the reason isn’t that there wasn’t enough time, but instead that integration is hard. And when I say hard, I mean often impossible. And when I say impossible, I don’t mean unsolved, but instead provably impossible (and when I say impossible, I mean that we can’t always integrate and get a nice function out, unlike our ability to differentiate any nice function and get a nice function back). An easy example is the sine integral $$\int \frac{\sin x}{x} \mathrm d x,$$
which cannot be expressed in terms of elementary functions. In short, even though the derivative of an elementary function is always an elementary function, the antiderivative of elementary functions don’t need to be elementary.

Worse, even when antidifferentiation is possible, it might still be really hard. This is the first problem that a second semester in calculus might try to address, meaning that students learn a veritable bag of tricks of integration techniques. These might include ${u}$-substitution and integration by parts (which are like inverses of the chain rule and product rule, respectively), and then the relatively more complicated techniques like partial fraction decomposition and trig substitution.

In this note, we are going to take a closer look at problems related to trig substitution, and some related ideas. We will assume familiarity with ${u}$-substitution and integration by parts, and we might even use them here from time to time. This, after the fold.

A bit more about partial fraction decomposition

This is a short note written for my students in Math 170, talking about partial fraction decomposition and some potentially confusing topics that have come up. We’ll remind ourselves what partial fraction decomposition is, and unlike the text, we’ll prove it. Finally, we’ll look at some pitfalls in particular. All this after the fold.

1. The Result Itself

We are interested in rational functions and their integrals. Recall that a polynomial ${f(x)}$ is a function of the form ${f(x) = a_nx^n + a_{n-1}x^{n-1} + \cdots + a_1x + a_0}$, where the ${a_i}$ are constants and ${x}$ is our “intederminate” — and which we commonly imagine standing for a number (but this is not necessary).

Then a rational function ${R(x)}$ is a ratio of two polynomials ${p(x)}$ and ${q(x)}$, $$R(x) = \frac{p(x)}{q(x)}.$$

Then the big result concerning partial fractions is the following:

If ${R(x) = \dfrac{p(x)}{q(x)}}$ is a rational function and the degree of ${p(x)}$ is less than the degree of ${q(x)}$, and if ${q(x)}$ factors into $$q(x) = (x-r_1)^{k_1}(x-r_2)^{k_2} \dots (x-r_l)^{k_l} (x^2 + a_{1,1}x + a_{1,2})^{v_1} \ldots (x^2 + a_{m,1}x + a_{m,2})^{v_m},$$
then ${R(x)}$ can be written as a sum of fractions of the form ${\dfrac{A}{(x-r)^k}}$ or ${\dfrac{Ax + B}{(x^2 + a_1x + a_2)^v}}$, where in particular

• If ${(x-r)}$ appears in the denominator of ${R(x)}$, then there is a term ${\dfrac{A}{x – r}}$
• If ${(x-r)^k}$ appears in the denominator of ${R(x)}$, then there is a collection of terms $$\frac{A_1}{x-r} + \frac{A_2}{(x-r)^2} + \dots + \frac{A_k}{(x-r)^k}$$
• If ${x^2 + ax + b}$ appears in the denominator of ${R(x)}$, then there is a term ${\dfrac{Ax + B}{x^2 + ax + b}}$
• If ${(x^2 + ax + b)^v}$ appears in the denominator of ${R(x)}$, then there is a collection of terms $$\frac{A_1x + B_1}{x^2 + ax + b} + \frac{A_2 x + B_2}{(x^2 + ax + b)^2} + \dots \frac{A_v x + B_v}{(x^2 + ax + b)^v}$$

where in each of these, the capital ${A}$ and ${B}$ represent some constants that can be solved for through basic algebra.

I state this result this way because it is the one that leads to integrals that we can evaluate. But in principle, this theorem can be restated in a couple different ways.

Let’s parse this theorem through an example – the classic example, after the fold.

In this note, I remind myself of the functional equations for the ${L}$-functions ${\displaystyle \sum_{n\geq 0} \frac{a(n)}{n^s}}$ and ${\displaystyle \sum_{n\geq 0} \frac{a(n)}{n^s}e(\frac{n\overline{r}}{c})}$, where ${\overline{r}}$ is the multiplicative inverse of ${r \bmod c}$.