Category Archives: Expository

Continuity of the Mean Value

1. Introduction

When I first learned the mean value theorem as a high schooler, I was thoroughly unimpressed. Part of this was because it’s just like Rolle’s Theorem, which feels obvious. But I think the greater part is because I thought it was useless. And I continued to think it was useless until I began my first proof-oriented treatment of calculus as a second year at Georgia Tech. Somehow, in the interceding years, I learned to value intuition and simple statements.

I have since completely changed my view on the mean value theorem. I now consider essentially all of one variable calculus to be the Mean Value Theorem, perhaps in various forms or disguises. In my earlier note An Intuitive Introduction to Calculus, we state and prove the Mean Value Theorem, and then show that we can prove the Fundamental Theorem of Calculus with the Mean Value Theorem and the Intermediate Value Theorem (which also felt silly to me as a high schooler, but which is not silly).

In this brief note, I want to consider one small aspect of the Mean Value Theorem: can the “mean value” be chosen continuously as a function of the endpoints? To state this more clearly, first recall the theorem:

Suppose $latex {f}$ is a differentiable real-valued function on an interval $latex {[a,b]}$. Then there exists a point $latex {c}$ between $latex {a}$ and $latex {b}$ such that $$ \frac{f(b) – f(a)}{b – a} = f'(c), \tag{1}$$
which is to say that there is a point where the slope of $latex {f}$ is the same as the average slope from $latex {a}$ to $latex {b}$.

What if we allow the interval to vary? Suppose we are interested in a differentiable function $latex {f}$ on intervals of the form $latex {[0,b]}$, and we let $latex {b}$ vary. Then for each choice of $latex {b}$, the mean value theorem tells us that there exists $latex {c_b}$ such that $$ \frac{f(b) – f(0)}{b} = f'(c_b). $$
Then the question we consider today is, as a function of $latex {b}$, can $latex {c_b}$ be chosen continuously? We will see that we cannot, and we’ll see explicit counterexamples. This, after the fold.


Posted in Expository, Mathematics | Tagged , , , , | Leave a comment

Another proof of Wilson’s Theorem

While teaching a largely student-discovery style elementary number theory course to high schoolers at the Summer@Brown program, we were looking for instructive but interesting problems to challenge our students. By we, I mean Alex Walker, my academic little brother, and me. After a bit of experimentation with generators and orders, we stumbled across a proof of Wilson’s Theorem, different than the standard proof.

Wilson’s theorem is a classic result of elementary number theory, and is used in some elementary texts to prove Fermat’s Little Theorem, or to introduce primality testing algorithms that give no hint of the factorization.

Theorem 1 (Wilson’s Theorem) For a prime number $latex {p}$, we have $$ (p-1)! \equiv -1 \pmod p. \tag{1}$$

The theorem is clear for $latex {p = 2}$, so we only consider proofs for “odd primes $latex {p}$.”

The standard proof of Wilson’s Theorem included in almost every elementary number theory text starts with the factorial $latex {(p-1)!}$, the product of all the units mod $latex {p}$. Then as the only elements which are their own inverses are $latex {\pm 1}$ (as $latex {x^2 \equiv 1 \pmod p \iff p \mid (x^2 – 1) \iff p\mid x+1}$ or $latex {p \mid x-1}$), every element in the factorial multiples with its inverse to give $latex {1}$, except for $latex {-1}$. Thus $latex {(p-1)! \equiv -1 \pmod p.} \diamondsuit$

Now we present a different proof.

Take a primitive root $latex {g}$ of the unit group $latex {(\mathbb{Z}/p\mathbb{Z})^\times}$, so that each number $latex {1, \ldots, p-1}$ appears exactly once in $latex {g, g^2, \ldots, g^{p-1}}$. Recalling that $latex {1 + 2 + \ldots + n = \frac{n(n+1)}{2}}$ (a great example of classical pattern recognition in an elementary number theory class), we see that multiplying these together gives $latex {(p-1)!}$ on the one hand, and $latex {g^{(p-1)p/2}}$ on the other.

As $latex {g^{(p-1)/2}}$ is a solution to $latex {x^2 \equiv 1 \pmod p}$, and it is not $latex {1}$ since $latex {g}$ is a generator and thus has order $latex {p-1}$. So $latex {g^{(p-1)/2} \equiv -1 \pmod p}$, and raising $latex {-1}$ to an odd power yields $latex {-1}$, completing the proof $\diamondsuit$.

After posting this, we have since seen that this proof is suggested in a problem in Ireland and Rosen’s extremely good number theory book. But it was pleasant to see it come up naturally, and it’s nice to suggest to our students that you can stumble across proofs.

It may be interesting to question why $latex {x^2 \equiv 1 \pmod p \iff x \equiv \pm 1 \pmod p}$ appears in a fundamental way in both proofs.

This post appears on the author’s personal website and on the Math.Stackexchange Community Blog It is also available in pdf note form. It was typeset in \TeX, hosted on WordPress sites, converted using the utility, and displayed with MathJax.

Posted in Expository, Math.NT, Mathematics | Tagged , , , , , , | 1 Comment

Trigonometric and related substitutions in integrals


1. Introduction

In many ways, a first semester of calculus is a big ideas course. Students learn the basics of differentiation and integration, and some of the big-hitting theorems like the Fundamental Theorems of Calculus. Even in a big ideas course, students learn how to differentiate any reasonable combination of polynomials, trig, exponentials, and logarithms (elementary functions).

But integration skills are not pushed nearly as far. Do you ever wonder why? Even at the end of the first semester of calculus, there are many elementary functions that students cannot integrate. But the reason isn’t that there wasn’t enough time, but instead that integration is hard. And when I say hard, I mean often impossible. And when I say impossible, I don’t mean unsolved, but instead provably impossible (and when I say impossible, I mean that we can’t always integrate and get a nice function out, unlike our ability to differentiate any nice function and get a nice function back). An easy example is the sine integral $$ \int \frac{\sin x}{x} \mathrm d x, $$
which cannot be expressed in terms of elementary functions. In short, even though the derivative of an elementary function is always an elementary function, the antiderivative of elementary functions don’t need to be elementary.

Worse, even when antidifferentiation is possible, it might still be really hard. This is the first problem that a second semester in calculus might try to address, meaning that students learn a veritable bag of tricks of integration techniques. These might include $latex {u}$-substitution and integration by parts (which are like inverses of the chain rule and product rule, respectively), and then the relatively more complicated techniques like partial fraction decomposition and trig substitution.

In this note, we are going to take a closer look at problems related to trig substitution, and some related ideas. We will assume familiarity with $latex {u}$-substitution and integration by parts, and we might even use them here from time to time. This, after the fold.


Posted in Brown University, Expository, Math 170, Mathematics | Tagged , , , , , , | Leave a comment

A bit more about partial fraction decomposition

This is a short note written for my students in Math 170, talking about partial fraction decomposition and some potentially confusing topics that have come up. We’ll remind ourselves what partial fraction decomposition is, and unlike the text, we’ll prove it. Finally, we’ll look at some pitfalls in particular. All this after the fold.

1. The Result Itself

We are interested in rational functions and their integrals. Recall that a polynomial $latex {f(x)}$ is a function of the form $latex {f(x) = a_nx^n + a_{n-1}x^{n-1} + \cdots + a_1x + a_0}$, where the $latex {a_i}$ are constants and $latex {x}$ is our “intederminate” — and which we commonly imagine standing for a number (but this is not necessary).

Then a rational function $latex {R(x)}$ is a ratio of two polynomials $latex {p(x)}$ and $latex {q(x)}$, $$ R(x) = \frac{p(x)}{q(x)}. $$

Then the big result concerning partial fractions is the following:

If $latex {R(x) = \dfrac{p(x)}{q(x)}}$ is a rational function and the degree of $latex {p(x)}$ is less than the degree of $latex {q(x)}$, and if $latex {q(x)}$ factors into $$q(x) = (x-r_1)^{k_1}(x-r_2)^{k_2} \dots (x-r_l)^{k_l} (x^2 + a_{1,1}x + a_{1,2})^{v_1} \ldots (x^2 + a_{m,1}x + a_{m,2})^{v_m}, $$
then $latex {R(x)}$ can be written as a sum of fractions of the form $latex {\dfrac{A}{(x-r)^k}}$ or $latex {\dfrac{Ax + B}{(x^2 + a_1x + a_2)^v}}$, where in particular

  • If $latex {(x-r)}$ appears in the denominator of $latex {R(x)}$, then there is a term $latex {\dfrac{A}{x – r}}$
  • If $latex {(x-r)^k}$ appears in the denominator of $latex {R(x)}$, then there is a collection of terms $$ \frac{A_1}{x-r} + \frac{A_2}{(x-r)^2} + \dots + \frac{A_k}{(x-r)^k} $$
  • If $latex {x^2 + ax + b}$ appears in the denominator of $latex {R(x)}$, then there is a term $latex {\dfrac{Ax + B}{x^2 + ax + b}}$
  • If $latex {(x^2 + ax + b)^v}$ appears in the denominator of $latex {R(x)}$, then there is a collection of terms $$ \frac{A_1x + B_1}{x^2 + ax + b} + \frac{A_2 x + B_2}{(x^2 + ax + b)^2} + \dots \frac{A_v x + B_v}{(x^2 + ax + b)^v} $$

where in each of these, the capital $latex {A}$ and $latex {B}$ represent some constants that can be solved for through basic algebra.

I state this result this way because it is the one that leads to integrals that we can evaluate. But in principle, this theorem can be restated in a couple different ways.

Let’s parse this theorem through an example – the classic example, after the fold.


Posted in Brown University, Expository, Math 170, Mathematics, Teaching | Tagged , , , , | Leave a comment

A response to FtYoU’s question on Reddit

FtYou writes

Hello everyone ! There is a concept I have a hard time getting my head wrap around. If you have a Vector Space V and a subspace W, I understand that you can find the least square vector approximation from any vector in V to a vector in W. And this correspond to the projection of V to the subspace W. Now , for data fitting … Let’s suppose you have a bunch of points (xi, yi) where you want to fit a set a regressors so you can approximate yi by a linear combination of the regressors lets say ( 1, x, x2 … ). What Vector space are we talking about ? If we consider the Vector space of function R -> R, in what subspace are we trying to map these vectors ?

I have a hard time merging these two concepts of projecting to a vector space and fitting the data. In the latter case what vector are we using ? The functions ? If so I understand the choice of regressors ( which constitute a basis for the vector space ) But what’s the role of the (xi,yi) ?

I want to point out that I understand completely how to build the matrices to get Y = AX and solving using least square approx. What I miss is the big picture. The linear algebra picture. Thanks for any help !

We’ll go over this by closely examining and understanding an example. Suppose we have the data points $latex {(x_i, y_i)}$

$latex \displaystyle \begin{cases} (x_1, y_1) = (-1,8) \\ (x_2, y_2) = (0,8) \\ (x_3, y_3) = (1,4) \\ (x_4, y_4) = (2,16) \end{cases}, $

and we have decided to try to find the best fitting quadratic function. What do we mean by best-fitting? We mean that we want the one that approximates these data points the best. What exactly does that mean? We’ll see that before the end of this note – but in linear algebra terms, we are projecting on to some sort of vector space – we claim that projection is the ”best-fit” possible.


Posted in Expository, Mathematics | Tagged , , , , , , | Leave a comment

Friendly Introduction to Sieves with a Look Towards Progress on the Twin Primes Conjecture

This is an extension and background to a talk I gave on 9 October 2013 to the Brown Graduate Student Seminar, called `A friendly intro to sieves with a look towards recent progress on the twin primes conjecture.’ During the talk, I mention several sieves, some with a lot of detail and some with very little detail. I also discuss several results and built upon many sources. I’ll provide missing details and/or sources for additional reading here.

Furthermore, I like this talk, so I think it’s worth preserving.

1. Introduction

We talk about sieves and primes. Long, long ago, Euclid famously proved the infinitude of primes ($latex {\approx 300}$ B.C.). Although he didn’t show it, the stronger statement that the sum of the reciprocals of the primes diverges is true:

$latex \displaystyle \sum_{p} \frac{1}{p} \rightarrow \infty, $

where the sum is over primes.

Proof: Suppose that the sum converged. Then there is some $latex {k}$ such that

$latex \displaystyle \sum_{i = k+1}^\infty \frac{1}{p_i} < \frac{1}{2}. $

Suppose that $latex {Q := \prod_{i = 1}^k p_i}$ is the product of the primes up to $latex {p_k}$. Then the integers $latex {1 + Qn}$ are relatively prime to the primes in $latex {Q}$, and so are only made up of the primes $latex {p_{k+1}, \ldots}$. This means that

$latex \displaystyle \sum_{n = 1}^\infty \frac{1}{1+Qn} \leq \sum_{t \geq 0} \left( \sum_{i > k} \frac{1}{p_i} \right) ^t < 2, $

where the first inequality is true since all the terms on the left appear in the middle (think prime factorizations and the distributive law), and the second inequality is true because it’s bounded by the geometric series with ratio $latex {1/2}$. But by either the ratio test or by limit comparison, the sum on the left diverges (aha! Something for my math 100 students), and so we arrive at a contradiction.

Thus the sum of the reciprocals of the primes diverges. $latex \diamondsuit$


Posted in Expository, Math.NT, Mathematics | Tagged , , , , , , , , , , , , | Leave a comment

Response to bnelo12’s question on Reddit (or the Internet storm on $1 + 2 + \ldots = -1/12$)

bnelo12 writes (slightly paraphrased)

Can you explain exactly how $latex {1 + 2 + 3 + 4 + \ldots = – \frac{1}{12}}$ in the context of the Riemann $latex {\zeta}$ function?

We are going to approach this problem through a related problem that is easier to understand at first. Many are familiar with summing geometric series

$latex \displaystyle g(r) = 1 + r + r^2 + r^3 + \ldots = \frac{1}{1-r}, $

which makes sense as long as $latex {|r| < 1}$. But if you’re not, let’s see how we do that. Let $latex {S(n)}$ denote the sum of the terms up to $latex {r^n}$, so that

$latex \displaystyle S(n) = 1 + r + r^2 + \ldots + r^n. $

Then for a finite $latex {n}$, $latex {S(n)}$ makes complete sense. It’s just a sum of a few numbers. What if we multiply $latex {S(n)}$ by $latex {r}$? Then we get

$latex \displaystyle rS(n) = r + r^2 + \ldots + r^n + r^{n+1}. $

Notice how similar this is to $latex {S(n)}$. It’s very similar, but missing the first term and containing an extra last term. If we subtract them, we get

$latex \displaystyle S(n) – rS(n) = 1 – r^{n+1}, $

which is a very simple expression. But we can factor out the $latex {S(n)}$ on the left and solve for it. In total, we get

$latex \displaystyle S(n) = \frac{1 – r^{n+1}}{1 – r}. \ \ \ \ \ (1)$

This works for any natural number $latex {n}$. What if we let $latex {n}$ get arbitrarily large? Then if $latex {|r|<1}$, then $latex {|r|^{n+1} \rightarrow 0}$, and so we get that the sum of the geometric series is

$latex \displaystyle g(r) = 1 + r + r^2 + r^3 + \ldots = \frac{1}{1-r}. $

But this looks like it makes sense for almost any $latex {r}$, in that we can plug in any value for $latex {r}$ that we want on the right and get a number, unless $latex {r = 1}$. In this sense, we might say that $latex {\frac{1}{1-r}}$ extends the geometric series $latex {g(r)}$, in that whenever $latex {|r|<1}$, the geometric series $latex {g(r)}$ agrees with this function. But this function makes sense in a larger domain then $latex {g(r)}$.

People find it convenient to abuse notation slightly and call the new function $latex {\frac{1}{1-r} = g(r)}$, (i.e. use the same notation for the extension) because any time you might want to plug in $latex {r}$ when $latex {|r|<1}$, you still get the same value. But really, it’s not true that $latex {\frac{1}{1-r} = g(r)}$, since the domain on the left is bigger than the domain on the right. This can be confusing. It’s things like this that cause people to say that

$latex \displaystyle 1 + 2 + 4 + 8 + 16 + \ldots = \frac{1}{1-2} = -1, $

simply because $latex {g(2) = -1}$. This is conflating two different ideas together. What this means is that the function that extends the geometric series takes the value $latex {-1}$ when $latex {r = 2}$. But this has nothing to do with actually summing up the $latex {2}$ powers at all.

So it is with the $latex {\zeta}$ function. Even though the $latex {\zeta}$ function only makes sense at first when $latex {\text{Re}(s) > 1}$, people have extended it for almost all $latex {s}$ in the complex plane. It just so happens that the great functional equation for the Riemann $latex {\zeta}$ function that relates the right and left half planes (across the line $latex {\text{Re}(s) = \frac{1}{2}}$) is

$latex \displaystyle \pi^{\frac{-s}{2}}\Gamma\left( \frac{s}{2} \right) \zeta(s) = \pi^{\frac{s-1}{2}}\Gamma\left( \frac{1-s}{2} \right) \zeta(1-s), \ \ \ \ \ (2)$

where $latex {\Gamma}$ is the gamma function, a sort of generalization of the factorial function. If we solve for $latex {\zeta(1-s)}$, then we get

$latex \displaystyle \zeta(1-s) = \frac{\pi^{\frac{-s}{2}}\Gamma\left( \frac{s}{2} \right) \zeta(s)}{\pi^{\frac{s-1}{2}}\Gamma\left( \frac{1-s}{2} \right)}. $

If we stick in $latex {s = 2}$, we get

$latex \displaystyle \zeta(-1) = \frac{\pi^{-1}\Gamma(1) \zeta(2)}{\pi^{\frac{-1}{2}}\Gamma\left( \frac{-1}{2} \right)}. $

We happen to know that $latex {\zeta(2) = \frac{\pi^2}{6}}$ (this is called Basel’s problem) and that $latex {\Gamma(\frac{1}{2}) = \sqrt \pi}$. We also happen to know that in general, $latex {\Gamma(t+1) = t\Gamma(t)}$ (it is partially in this sense that the $latex {\Gamma}$ function generalizes the factorial function), so that $latex {\Gamma(\frac{1}{2}) = \frac{-1}{2} \Gamma(\frac{-1}{2})}$, or rather that $latex {\Gamma(\frac{-1}{2}) = -2 \sqrt \pi.}$ Finally, $latex {\Gamma(1) = 1}$ (on integers, it agrees with the one-lower factorial).

Putting these together, we get that

$latex \displaystyle \zeta(-1) = \frac{\pi^2/6}{-2\pi^2} = \frac{-1}{12}, $

which is what we wanted to show. $latex {\diamondsuit}$

The information I quoted about the Gamma function and the zeta function’s functional equation can be found on Wikipedia or any introductory book on analytic number theory. Evaluating $latex {\zeta(2)}$ is a classic problem that has been in many ways, but is most often taught in a first course on complex analysis or as a clever iterated integral problem (you can prove it with Fubini’s theorem). Evaluating $latex {\Gamma(\frac{1}{2})}$ is rarely done and is sort of a trick, usually done with Fourier analysis.

As usual, I have also created a paper version. You can find that here.

Posted in Expository, Math.NT, Mathematics | Tagged , , , , | 4 Comments

An Intuitive Overview of Taylor Series

This is a note written for my fall 2013 Math 100 class, but it was not written “for the exam,” nor does anything on here subtly hint at anything on any exam. But I hope that this will be helpful for anyone who wants to get a basic understanding of Taylor series. What I want to do is try to get some sort of intuitive grasp on Taylor series as approximations of functions. By intuitive, I mean intuitive to those with a good grasp of functions, the basics of a first semester of calculus (derivatives, integrals, the mean value theorem, and the fundamental theorem of calculus) – so it’s a mathematical intuition. In this way, this post is a sort of follow-up of my earlier note, An Intuitive Introduction to Calculus.

PLEASE NOTE that my math compiler and my markdown compiler sometimes compete, and sometimes repeated derivatives are too high or too low by one pixel.

We care about Taylor series because they allow us to approximate other functions in predictable ways. Sometimes, these approximations can be made to be very, very, very accurate without requiring too much computing power. You might have heard that computers/calculators routinely use Taylor series to calculate things like $latex {e^x}$ (which is more or less often true). But up to this point in most students’ mathematical development, most mathematics has been clean and perfect; everything has been exact algorithms yielding exact answers for years and years. This is simply not the way of the world.

Here’s a fundamental fact to both mathematics and life: almost anything worth doing is probably pretty hard and pretty messy.

For a very recognizable example, let’s think about finding zeroes of polynomials. Finding roots of linear polynomials is very easy. If we see $latex {5 + x = 0}$, we see that $latex {-5}$ is the zero. Similarly, finding roots of quadratic polynomials is very easy, and many of us have memorized the quadratic formula to this end. Thus $latex {ax^2 + bx + c = 0}$ has solutions $latex {x = \frac{-b \pm \sqrt{b^2 – 4ac}}{2a}}$. These are both nice, algorithmic, and exact. But I will guess that the vast majority of those who read this have never seen a “cubic polynomial formula” for finding roots of cubic polynomials (although it does exist, it is horrendously messy – look up Cardano’s formula). There is even an algorithmic way of finding the roots of quartic polynomials. But here’s something amazing: there is no general method for finding the exact roots of 5th degree polynomials (or higher degree).

I don’t mean We haven’t found it yet, but there may be one, or even You’ll have to use one of these myriad ways – I mean it has been shown that there is no general method of finding exact roots of degree 5 or higher polynomials. But we certainly can approximate them arbitrarily well. So even something as simple as finding roots of polynomials, which we’ve been doing since we were in middle school, gets incredibly and unbelievably complicated.

So before we hop into Taylor series directly, I want to get into the mindset of approximating functions with other functions.

1. Approximating functions with other functions

We like working with polynomials because they’re so easy to calculate and manipulate. So sometimes we try to approximate complicated functions with polynomials, a problem sometimes called
polynomial interpolation”.

Suppose we wanted to approximate $latex {\sin(x)}$. The most naive approximation that we might do is see that $latex {\sin(0) = 0}$, so we might approximate $latex {\sin(x)}$ by $latex {p_0(x) = 0}$. We know that it’s right at least once, and since $latex {\sin(x)}$ is periodic, it’s going to be right many times. I write $latex {p_0}$ to indicate that this is a degree $latex {0}$ polynomial, that is, a constant polynomial. Clearly though, this is a terrible approximation, and we can do better.


Posted in Expository, Mathematics, Teaching | Tagged , , , , , , , , | 23 Comments

An intuitive introduction to calculus

This is a post written for my fall 2013 Math 100 class but largely intended for anyone with knowledge of what a function is and a desire to know what calculus is all about. Calculus is made out to be the pinnacle of the high school math curriculum, and correspondingly is thought to be very hard. But the difficulty is bloated, blown out of proportion. In fact, the ideas behind calculus are approachable and even intuitive if thought about in the right way.

Many people managed to stumble across the page before I’d finished all the graphics. I’m sorry, but they’re all done now! I was having trouble interpreting how WordPress was going to handle my gif files – it turns out that they automagically resize them if you don’t make them of the correct size, which makes them not display. It took me a bit to realize this. I’d like to mention that this actually started as a 90 minute talk I had with my wife over coffee, so perhaps an alternate title would be “Learning calculus in 2 hours over a cup of coffee.”

So read on if you would like to understand what calculus is, or if you’re looking for a refresher of the concepts from a first semester in calculus (like for Math 100 students at Brown), or if you’re looking for a bird’s eye view of AP Calc AB subject material.

1. An intuitive and semicomplete introduction to calculus

We will think of a function $ {f(\cdot)}$ as something that takes an input $ {x}$ and gives out another number, which we’ll denote by $ {f(x)}$. We know functions like $ {f(x) = x^2 + 1}$, which means that if I give in a number $ {x}$ then the function returns the number $ {f(x) = x^2 + 1}$. So I put in $ {1}$, I get $ {1^2 + 1 = 2}$, i.e. $ {f(1) = 2}$. Primary and secondary school overly conditions students to think of functions in terms of a formula or equation. The important thing to remember is that a function is really just something that gives an output when given an input, and if the same input is given later then the function spits the same output out. As an aside, I should mention that the most common problem I’ve seen in my teaching and tutoring is a fundamental misunderstanding of functions and their graphs

For a function that takes in and spits out numbers, we can associate a graph. A graph is a two-dimensional representation of our function, where by convention the input is put on the horizontal axis and the output is put on the vertical axis. Each axis is numbered, and in this way we can identify any point in the graph by its coordinates, i.e. its horizontal and vertical position. A graph of a function $ {f(x)}$ includes a point $ {(x,y)}$ if $ {y = f(x)}$.

The graph of the function $ x^2 + 1$ is in blue. The emphasized point appears on the graph because it is of the form $ (x, f(x))$. In particular, this point is $ (1, 2)$.

Thus each point on the graph is really of the form $ {(x, f(x))}$. A large portion of algebra I and II is devoted to being able to draw graphs for a variety of functions. And if you think about it, graphs contain a huge amount of information. Graphing $ {f(x)= x^2 + 1}$ involves drawing an upwards-facing parabola, which really represents an infinite number of points. That’s pretty intense, but it’s not what I want to focus on here.

1.1. Generalizing slope – introducing the derivative

You might recall the idea of the ‘slope’ of a line. A line has a constant ratio of how much the $ {y}$ value changes for a specific change in $ {x}$, which we call the slope (people always seem to remember rise over run). In particular, if a line passes through the points $ {(x_1, y_1)}$ and $ {(x_2, y_2)}$, then its slope will be the vertical change $ {y_2 – y_1}$ divided by the horizontal change $ {x_2 – x_1}$, or $ {\dfrac{y_2 – y_1}{x_2 – x_1}}$.

The graph of a line appears in blue. The two points $ (0,1)$ and $ (1,3)$ are shown on the line. The horizontal red line shows the horizontal change. The vertical red line shows the vertical change. The ‘slope’ of the blue line is the length of the vertical red line divided by the length of the horizontal red line.

So if the line is given by an equation $ {f(x) = \text{something}}$, then the slope from two inputs $ {x_1}$ and $ {x_2}$ is $ {\dfrac{f(x_2) – f(x_1)}{x_2 – x_1}}$. As an aside, for those that remember things like the ‘standard equation’ $ {y = mx + b}$ or ‘point-slope’ $ {(y – y_0) = m(x – x_0)}$ but who have never thought or been taught where these come from: the claim that lines are the curves of constant slope is saying that for any choice of $ {(x_1, y_1)}$ on the line, we expect $ {\dfrac{y_2 – y_1}{x_2 – x_1} = m}$ a constant, which I denote by $ {m}$ for no particularly good reason other than the fact that some textbook author long ago did such a thing. Since we’re allowing ourselves to choose any $ {(x_1, y_1)}$, we might drop the subscripts – since they usually mean a constant – and rearrange our equation to give $ {y_2 – y = m(x_2 – x)}$, which is what has been so unkindly drilled into students’ heads as the ‘point-slope form.’ This is why lines have a point-slope form, and a reason that it comes up so much is that it comes so naturally from the defining characteristic of a line, i.e. constant slope.

But one cannot speak of the ‘slope’ of a parabola.

The parabola $ f(x) = x^2 + 1$ is shows in blue. Slope is a measure of how much the function $ f(x)$ changes when $ x$ is changed. Some tangent lines to the parabola are shown in red. The slope of each line seems like it should be the ‘slope’ of the parabola when the line touches the parabola, but these slopes are different.

Intuitively, we look at our parabola $ {x^2 + 1}$ and see that the ‘slope,’ or an estimate of how much the function $ {f(x)}$ changes with a change in $ {x}$, seems to be changing depending on what $ {x}$ values we choose. (This should make sense – if it didn’t change, and had constant slope, then it would be a line). The first major goal of calculus is to come up with an idea of a ‘slope’ for non-linear functions. I should add that we already know a sort of ‘instantaneous rate of change’ of a nonlinear function. When we’re in a car and we’re driving somewhere, we’re usually speeding up or slowing down, and our pace isn’t usually linear. Yet our speedometer still manages to say how fast we’re going, which is an immediate rate of change. So if we had a function $ {p(t)}$ that gave us our position at a time $ {t}$, then the slope would give us our velocity (change in position per change in time) at a moment. So without knowing it, we’re familiar with a generalized slope already. Now in our parabola, we don’t expect a constant slope, so we want to associate a ‘slope’ to each input $ {x}$. In other words, we want to be able to understand how rapidly the function $ {f(x)}$ is changing at each $ {x}$, analogous to how the slope $ {m}$ of a line $ {g(x) = mx + b}$ tells us that if we change our input by an amount $ {h}$ then our output value will change by $ {mh}$.

How does calculus do that? The idea is to get closer and closer approximations. Suppose we want to find the ‘slope’ of our parabola at the point $ {x = 1}$. Let’s get an approximate answer. The slope of the line coming from inputs $ {x = 1}$ and $ {x = 2}$ is a (poor) approximation. In particular, since we’re working with $ {f(x) = x^2 + 1}$, we have that $ {f(2) = 5}$ and $ {f(1) = 2}$, so that the ‘approximate slope’ from $ {x = 1}$ and $ {x = 2}$ is $ {\frac{5 – 2}{2 – 1} = 3}$. But looking at the graph,

The parabola $ x^2 + 1$ is shown in blue, and the line going through the points $ (1,2)$ and $ (2,5)$ is shown. The line immediately goes above and crosses the parabola, so it seems like this line is rising faster (changing faster) than the parabola. It’s too steep, and the slope is too high to reflect the ‘slope’ of the parabola at the indicated point.

we see that it feels like this slope is too large. So let’s get closer. Suppose we use inputs $ {x = 1}$ and $ {x = 1.5}$. We get that the approximate slope is $ {\frac{3.25 – 2}{1.5 – 1} = 2.5}$. If we were to graph it, this would also feel too large. So we can keep choosing smaller and smaller changes, like using $ {x = 1}$ and $ {x = 1.1}$, or $ {x = 1}$ and $ {x = 1.01}$, and so on. This next graphic contains these approximations, with chosen points getting closer and closer to $ {1}$.

The parabola $ x^2 + 1$ is shown in blue. Two points are chosen on the parabola and the line between them is drawn in red. As the points get closer to each other, the red line indicates the rate of growth of the parabola at the point $ (1,2)$ better and better. So the slope of the red lines seems to be getting closer to the ‘slope’ of the parabola at $ (1,2)$.

Let’s look a little closer at the values we’re getting for our slopes when we use $ {1}$ and $ {2, 1.5, 1.1, 1.01, 1.001}$ as our inputs. We get

$ \displaystyle \begin{array}{c|c} \text{second input} & \text{approx. slope} \\ \hline 2 & 3 \\ 1.5 & 2.5 \\ 1.1 & 2.1 \\ 1.01 & 2.01 \\ 1.001 & 2.001 \end{array} $

It looks like the approximate slopes are approaching $ {2}$. What if we plot the graph with a line of slope $ {2}$ going through the point $ {(1,2)}$?

The parabola $ x^2 + 1$ is shown in blue. The line in red has slope $ 2$ and goes through the point $ (1,2)$. We got this line by continuing the successive approximations done above. It looks like it accurately indicates the ‘slope’ of the parabola at $ (1,2)$.

It looks great! Let’s zoom in a whole lot.

When we zoom in, the blue parabola looks almost like a line, and the red line looks almost like the parabola! This is why we are measuring the ‘slope’ of the parabola in this fashion – when we zoom in, it looks more and more like a line, and we are getting the slope of that line.

That looks really close! In fact, what I’ve been allowing as the natural feeling slope, or local rate of change, is really the line tangent to the graph of our function at the point $ {(1, f(1))}$. In a calculus class, you’ll spend a bit of time making sense of what it means for the approximate slopes to ‘approach’ $ {2}$. This is called a ‘limit,’ and the details are not important to us right now. The important thing is that this let us get an idea of a ‘slope’ at a point on a parabola. It’s not really a slope, because a parabola isn’t a line. So we’ve given it a different name – we call this ‘the derivative.’ So the derivative of $ {f(x) = x^2 + 1}$ at $ {x = 1}$ is $ {2}$, i.e. right around $ {x = 1}$ we expect a rate of change of $ {2}$, so that we expect $ {f(1 + h) – f(1) \approx 2h}$. If you think about it, we’re saying that we can approximate $ {f(x) = x^2 + 1}$ near the point $ {(1, 2)}$ by the line shown in the graph above: this line passes through $ {(1,2)}$ and it’s slope is $ {2}$, what we’re calling the slope of $ {f(x) = x^2 + 1}$ at $ {x = 1}$.

Let’s generalize. We were able to speak of the derivative at one point, but how about other points? The rest of this post is below the ‘more’ tag below.


Posted in Brown University, Expository, Math 100, Mathematics | Tagged , , , , , , , , , , , , , | 7 Comments

Twenty Mathematicians, Two Hard Problems, One Week, IdeaLab2013

July has been an exciting and busy month for me. I taught number theory 3 hours a day, 5 days a week, for 3 weeks to (mostly) devoted and motivated high school students in the Summer@Brown program. In the middle, I moved to Massachusetts. Immediately after the Summer@Brown program ended, I was given the opportunity to return to ICERM to participate in an experimental program called an IdeaLab.

IdeaLab invited 20 early career mathematicians to come together for a week and to generate ideas on two very different problems: Tipping Points in Climate Systems and Efficient Fully Homomorphic Encryption. Although I plan on writing a bit more about each of these problems and the IdeaLab process in action (at least from my point of view), I should say something about what these are.

Models of Earth’s climate are used all the time, to give daily weather reports, to predict and warn about hurricanes, to attempt to understand the effects of anthropogenic sources of carbon on long-term climate. As we know from uncertainty about weather reports, these models aren’t perfect. In particular, they don’t currently predict sudden, abrupt changes called ‘Tippling points.’ But are tipping points possible? There have been warm periods following ice-ages in the past, so it seems that there might be tipping points that aren’t modelled in the system. Understanding these form the basis for the idea behind the Tipping Points in Climate Systems project. This project also forms another link in Mathematics of Planet Earth.

On the other hand, homomorphic encryption is a topic in modern cryptography. To encrypt a message is to make it hard or impossible for others to read it unless they have a ‘key.’ You might think that you wouldn’t want someone holding onto an encrypted data to be able to do anything with the data, and in most modern encryption algorithms this is the case. But what if we were able to give Google an encrypted dataset and ask them to perform a search on it? Is it possible to have a secure encryption that would allow Google to do some sort of search algorithm and give us the results, but without Google ever understanding the data itself? It may seem far-fetched, but this is exactly the idea behind the Efficient Fully Homomorphic Encryption group. Surprisingly enough, it is possible. But known methods are obnoxiously slow and infeasible. This is why the group was after ‘efficient’ encryption.

So 20 early career mathematicians from all sorts of areas of mathematics gathered to think about these two questions. For the rest of this post, I’d like to talk about the structure and my thoughts on the IdeaLab process. In later posts, I’ll talk about each of the two major topics and what sorts of ideas came out of the process.


Posted in Brown University, Expository, Mathematics, Story | Tagged , , , , , , , , , , , , , , | Leave a comment