# Notes from a talk on the Mean Value Theorem

1. Introduction

When I first learned the Mean Value Theorem and the Intermediate Value Theorem, I thought they were both intuitively obvious and utterly useless. In one of my courses in analysis, I was struck when, after proving the Mean Value Theorem, my instructor said that all of calculus was downhill from there. But it was a case of not being able to see the forest for the trees, and I missed the big picture.

I have since come to realize that almost every major (and often, minor) result of calculus is a direct and immediate consequence of the Mean Value Theorem and the Intermediate Value Theorem. In this talk, we will focus on the forest, the big picture, and see the Mean Value Theorem for what it really is: the true Fundamental Theorem of Calculus.

2. Warming up with standard results

Let’s remind ourselves of the major players. Suppose ${f}$ is a continuous function on an interval ${[a,b]}$, and suppose ${L}$ is any value between ${f(a)}$ and ${f(b)}$.

Then there exists a ${c \in [a,b]}$ such that ${f(c) = L}$. The Intermediate Value Theorem is very much like its name — it guarantees places where the function takes on each intermediate value. Suppose ${f}$ is a differentiable function on ${[a,b]}$

Then there exists a ${c \in (a,b)}$ such that ${\displaystyle \frac{f(b) – f(a)}{b-a} = f'(c)}$.

Both are best understood by drawing pictures. The IVT is visualized by accepting that we can’t get from one point to another without crossing a line. The MVT is visualized by accepting that we can’t get from one point to another without going in a particular direction… as long as our movement makes sense. An aside for readers: notice that we don’t actually require the function to be continuously differentiable, but instead only differentiable. It is intuition defying that the MVT works even with noncontinuously differentiable functions, so long as they can be differentiated.

In standard calculus courses, what sort of problems might we be asked to solve that has to do with these theorems? It is peculiar that the IVT seems relegated to showing that there are roots, and the MVT is relegated to demonstrating speed traps and, coincidentally, guaranteeing that there are no roots. For example, prototypical question number one:

Example 1 Show that the function ${f(x) = e^x + x^3 + x}$ has exactly one root, and that it occurs in ${[-1,0]}$.

Proof: Notice that ${f(-1) = e^{-1} – 2 < 0}$ and ${f(0) = 1 > 0}$. So by the IVT, there is a root. Then notice that ${f'(x) = e^x + 3x^2 + 1 > 0}$ always, which means that the function is always increasing (this is a secret use of the Mean Value Theorem). Notice that in this way, would could repeatedly bisect the interval and get a really good idea of where the only zero of ${f(x)}$ is, but this is considered beyond the level of typical classes. $\Box$

This is perhaps somewhat useful, but totally uninspired. It feels more like a trick then a monumentally important pair of theorems from calculus. You’ll notice that we actually used the following, based on the MVT.

If ${f'(x) > 0}$ in an interval, then ${f}$ is strictly increasing in that interval. Proof: Think about it via contradiction. If ${f}$ wasn’t always increasing, then there is some ${x > y}$ such that ${f(x) \leq f(y)}$. But then there is a ${c}$ such that ${f'(c) = \frac{f(x) – f(y)}{x – y} < 0}$, which is a contradiction. $\Box$

Similarly, we see that if ${f'(x) < 0}$, then ${f}$ is strictly decreasing. So if we have a max or a min, then the function is neither increasing nor decreasing at that point, and so heuristically If ${c}$ is an extremum of ${f}$, then ${f'(c) = 0}$. Aside: this is, of course, true. But it does not follow from our argument for the same potential problem as before. What if ${f}$ is not continuously differentiable? And on a more fundamental level, we actually use this result to prove Rolle’s Theorem, which we use to prove the Mean Value Theorem. But trees, big picture, right?

These are technically useful, but utterly uninspired. It seems obvious that having positive derivative means that the function is increasing. So what have we gained here? Answer: not a whole lot

There is another technically useful, classical application. If ${f}$ and ${g}$ are two functions such that ${f'(x) = g'(x)}$ for all ${x}$ in an interval, then ${f(x) = g(x) + c}$ for some constant ${c}$. This actually follows from the following, related idea: If ${f'(x) \equiv 0}$ in an interval, then ${f(x) \equiv c}$ in that interval. In particular, if ${f'(x) = g'(x)}$, then ${f'(x) – g'(x) \equiv 0}$, and this puts us in the second case. Let’s prove the second one. Proof: Suppose not. Then there are some ${x,y}$ such that ${f(x) \neq f(y)}$. But then by the mean value theorem, there is a ${c}$ such that ${f'(c) = \frac{f(x) – f(y)}{x – y} \neq 0}$, which is a contradiction. $\Box$

In other words, this says that any two functions with the same derivative are the same, up to a constant. Thought of differently, it says that antidifferentiation is unique up to addition by constants. This is actually very important, but it still doesn’t feel particularly interesting.

There is something that all of these examples do have that hints at a sort of power behind the MVT. We have a sort of “local-global duality,” in a manner of speaking. Just as in a piece of music, there are certain motifs that recur throughout mathematics. One of these is an interplay between global properties and local properties.

For example, being an extremum is global property. But having zero derivative is a local property. Within the Mean Value Theorem, we have a guarantee of a local point satisfying a non-local condition, a local point of a certain derivative that happens to be the slope a secant line from other points.

The last pair of lemmas demonstrate a striking interplay between local and global. Having the same derivative at each point is global (or lots and lots of local properties). Having the same value, up to a constant, is the same. But the proof works by showing that there would be a single local point that breaks the rule. Let’s make this more striking by mentioning a near-miss, the Devil’s Staircase.

Take the unit interval ${[0,1]}$. We will iteratively create a function there. On the first step, take out the middle third and assign the function the value of ${1/2}$ there. On the second step, take our the middle third of each untouched segment, and assign them the values ${1/4}$ and ${3/4}$. Repeat, taking out middle thirds and assigning the values halfway between the two neighboring parts. The limit of this process is a single function. Amazingly, it is continuous, and is almost-everywhere differentiable. But its derivative is ${0}$ wherever it is defined. There is no local-global niceness here, even though its derivative is almost-everywhere ${0}$. Whoa.

Aside: This function is a great counterexample and can be constructed in better ways that explicitly demonstrate some of its great properties. For instance, it’s possible to show it is the uniform limit of a sequence of continuous functions, and is therefore continuous. This was a deep type of problem that affected a lot of the classical mathematicians we associate with the formation of calculus, but this type of problem is completely excluded from introductory calculus courses. Perhaps for good reason, as it’s intuition breaking? Or perhaps that’s the point? It’s certainly quite complicated and demands much higher mathematical maturity, for better or worse.

Consider this the end of the first third of the talk.

3. Fundamental Theorem of Calculus

The Fundamental Theorem(s) of calculus are extreme examples of local-global duality. You might recall that they are actually two statements. Right now, let us define the symbol ${\displaystyle \int_a^b f(x) dx}$ to mean the area under ${f(x)}$ from ${a}$ to ${b}$.

For ${f}$ continuous, we have both

1. If ${F(x) := \displaystyle \int_0^x f(t) dt}$, then ${F'(x) = f(x)}$.
2. If ${F}$ is any function such that ${F'(x) = f(x)}$, then ${\displaystyle \int_a^b f(x) dx = F(b) – F(a)}$.

Let’s parse these claims. ${F(x)}$ is a global-style definition, about global behavior concerning ${f(x)}$. It is fundamentally not local, as earlier behavior of ${f}$ determines later values of ${F}$. What the FTC I says is that the derivative (very local) is exactly the value of the inner function. Aside: It says this for all ${x}$, so we go global–local–global. What FTC II says is that a global condition guarantees that the area under ${f}$ between two points (global behavior), is given by the values of ${F}$ at two points (local). We are passing between local and global all over the place!

I also claim that the first is the IVT, and the second is the MVT. Let’s prove it, right here, right now, no holds barred.

Our method of proof will be the “just do it” method. For the first, let’s differentiate ${F(x)}$.
\begin{align} F'(x) = \lim_{h \rightarrow 0} \frac{F(x+h) – F(x)}{h} = \lim_{h \rightarrow 0} \frac{\displaystyle \int_x^{x + h} f(t) dt}{h}, \end{align}
where by picture we note that the difference gives just the area from ${x}$ to ${x + h}$, as ${h \rightarrow 0}$. How are we to find this? Notice that if ${m}$ is the minimum of ${f(x)}$ and ${M}$ is the maximum of ${f(x)}$ on the interval ${[x, x+h]}$, then $$mh = m \int_x^{x+h} dt \leq \int_x^{x+h} f(t) dt \leq M\int_x^{x+h} dt = Mh.$$
Correspondingly, ${f(c) \int_x^{x + h} dt = f(c) h}$ takes on every value between ${mh}$ and ${Mh}$, including the value of the integral. So there is some ${c}$ in ${[x, x+h]}$ such that ${f(c)h = \displaystyle \int_x^{x + h} f(t) dt}$. This is the IVT! (Although this subresult is usually called the First Integral Mean Value Theorem — they’re all confused).

Further, as ${h \rightarrow 0}$, we see that ${c \rightarrow x}$ (as there’s no where else for it to go!). So $$\lim_{h \rightarrow 0} \dfrac{\int_x^{x+h} f(t) dt}{h} = \lim_{h \rightarrow 0} \dfrac{f(c)h}{h} = \lim_{h \rightarrow 0} f(c) = f(x).$$
And so we have proved the first fundamental theorem of calculus for continuous functions ${\diamondsuit}$

For the second, we have to be a little bit more careful. We call a function integrable if its Riemann sums converge, in which case we say that the limit of the Riemann sums is the value of the integral. By this, I mean $$\int_a^b f(x) dx = \lim_{j \rightarrow \infty} \sum_j [x_{j + 1} – x_j]f(\xi_j)$$
where ${\xi_j}$ is any number between ${x_j}$ and ${x_{j+1}}$, where ${x_j}$ is a partition of the interval ${[a,b]}$, and the limit in ${j}$ is actually as the width of the largest partition segment ${x_{j + 1} – x_j}$ goes to ${0}$. In other words, we add up a lot of little widths times heights.

Here’s the bit of magic. For ease, suppose ${b = 1}$ and ${a = 0}$. Then write down ${F(b) – F(a) = F(1) – F(0)}$. Let’s partition the interval into smaller pieces. For illustration, ${F(1) – F(0) = F(1) – F(0.9) + F(0.9) – F(0.8) + \dots + F(0.1) – F(0)}$. Each pair, like ${F(1) – F(0.9)}$, can be dissected using the MVT from $$F(1) – F(0.9) = f(c_{0.9})(1 – 0.9).$$
So we have that ${F(1) – F(0) = f(c_{0.9})\cdot 0.1 + f(c_{0.8}) \cdot 0.1 + \dots + f(c_0)\cdot 0.1}$. And this last sum has an interpretation! It is exactly a partition of ${[0,1]}$, with points chosed in each partition segment. It is a sum of widths times heights! It is a Riemann sum! Most importantly, the value is completely independent of how small we make the partition, as it’s always ${F(1) – F(0)}$. So as we choose smaller and smaller partitions, we see that the limit of the Riemann sums converge to ${F(1) – F(0)}$, and so $$\int_0^1 f(x) dx = F(1)- F(0).$$
And so we’ve proved the second Fundamental Theorem of Calculus, with the MVT and a little elbow grease.

(For a longer explanation of this, including pictures, see my other note)

Aside: to those unconvinced of the actual convergence here, you’re right. We didn’t show it. But this is what the additional written note is for. In the next subsection, not included in the talk, we actually handle the convergence

3.1. Convergence of the Riemann sum

When we have ${\displaystyle \sum_j w_j(f(c_j))}$, we did not show that the choices of ${c_j}$ did not matter, as one needs to do to show convergence. We need to use the continuity of ${f}$ somewhere, and this is the place. We are on an interval ${[a,b]}$. This interval is compact. A continuous function, considered over a compact interval, converges uniformly. So in particular, for any ${\epsilon > 0}$, we can guarantee a ${\delta > 0}$ such that ${|x_1 – x_2| < \delta \implies |f(x_1) – f(x_2)| < \epsilon}$. In particular, we can choose are partitions to be less than ${\delta}$ in width.

Then on each partition segment, we know that ${f(c_j) – \epsilon \leq f(\xi_j) \leq f(c_j) + \epsilon}$ where ${\xi_j}$ is chosen anywhere in that partition segment. We want the upper bound and the lower bound to converge to the same number. Equivalently, we want $$\lim_{j \rightarrow \infty} \sum w_j \left( f(c_j) + \epsilon \right) – \sum w_j \left( f(c_j) – \epsilon \right) \longrightarrow 0.$$
This reduces to \begin{align} \lim_{j \rightarrow \infty} \sum w_j 2\epsilon = 2\epsilon \sum w_j = 2\epsilon \left( b – a \right) \longrightarrow 0. \end{align} As we can choose any ${\epsilon > 0}$, we can make this limit go to ${0}$ as needed for actual, real convergence. So we really have proven the Fundamental Theorem of Calculus. ${\diamondsuit}$

4. Taylor’s Theorem

I hope to have convinced you that the MVT is deceptively useful, and that calculus is deceptively straightforward. It’s just that so much of the signal gets lost in the noise. Before I go on, I’d like to mention another note I wrote, also with beautiful gifs that I’m exceptionally proud of. See An Intuitive Overview of Taylor Series.

Before we move on to the other heavy hitting theorems, let’s refresh ourselves with a question. We know that on ${[a,b]}$, there is a ${c}$ such that ${f'(c)}$ has the slope of the secant line from ${a}$ to ${b}$. What about the converse? Given a ${c}$ in ${[a,b]}$, is there some ${\alpha, \beta \in [a,b]}$ such that the secant line from ${\alpha}$ to ${\beta}$ has slope ${f'(c)}$?

The answer is no. Considering ${x^2}$ on ${[0,1]}$ might give you a good indication why.

To end, there are many possibilities. Nowhere near all of them fit into the talk, and I’ll choose to talk about whichever one seems of greatest interest at the time. In no particular order, the MVT also gives

• l’Hopital’s Rule
• Cauchy’s Mean Value Theorem
• Taylor’s Theorem
• A higher order MVT

In fact, these four things are incredibly interconnected.

Suppose ${f(x), g(x) \rightarrow 0}$ as ${x \rightarrow a}$. Then if the limit on the right exists, we have $$\lim_{x \rightarrow a} \frac{f(x)}{g(x)} = \lim_{x \rightarrow a}\frac{f'(x)}{g'(x)}$$
Proof: Consider the function ${h(x) = f(x)g(b) – g(x)f(b)}$ for some ${b \neq a}$. Then ${h(a) = 0}$, and ${h(b) = 0}$. So the MVT guarantees that there is a ${c \in [a,b]}$ such that ${f'(c)g(b) = g'(c)f(b)}$. Rearranging, and taking ${b \rightarrow a}$ (which forces ${c \rightarrow a}$) gives the result.

Aside: It is an interesting question to ask why we demand the limit on the right to exist. This has to do with the partial converse above, which doesn’t hold. The problem is that even as ${b \rightarrow a}$, there might be values of ${c}$ in ${[a,b]}$ whose derivatives fluctuate wildly from the partial secant lines. But if the right limit converges, this means the ${c}$ values don’t fluctuate too wildly, and it’s set. This is another case where the dangers of using global properties to guarantee local values is dangerous, but effective. $\Box$

With l’Hopital’s rule, verifying one form of Taylor’s Theorem is very easy.

In particular, call the ${n}$th degree Taylor polynomial associated to the ${n}$ times differentiable function ${f}$, centered at ${a}$, to be $$P_n(x) = f(a) + f'(a)(x-a) + f”(a)\frac{(x-a)^2}{2!} + \cdots + f^{(n)}(a)\frac{(x-a)^n}{n!}.$$
Then we can check that ${P_n(x)}$ is “the best degree n approximator” of ${f(x)}$ by showing that $$\lim_{x \rightarrow a} \frac{f(x) – P_n(x)}{(x-a)^n} \rightarrow 0,$$
or equivalently that ${f(x) = P_n(x) + E(x)(x-a)^n}$, where ${E(x) \rightarrow 0}$ as ${x \rightarrow a}$. This is one form of Taylor’s Theorem, with the weakness that it doesn’t give an explicit error term. (We’ll get back to that later). How do we show that? We might apply l’Hopital’s Rule ${n}$ times.

Some would find this sort of direction strange, as a different, common approach is to prove l’Hopital’s Rule from Taylor’s Theorem. And one way of proving Taylor’s Theorem is from a stronger Mean Value Theorem, called Cauchy’s Mean Value Theorem.

I must take an aside and mention that I have always disliked Cauchy’s Mean Value Theorem. It is not intuitive to me. It is very powerful, and can do everything of the Mean Value Theorem and more. But while the regular Mean Value Theorem is so incredibly clear and geometric, the Cauchy Mean Value Theorem is not. In some sense, it is the Mean Value Theorem as in Riemann-Stieltjes integration. I will mention something about this at the end, in the references.

Returning to the topic at hand.

If ${f}$ and ${g}$ are both differentiable on ${[a,b]}$ and ${g \not \equiv 0}$ in a neighborhood of ${a}$, then there exists a ${c \in (a,b)}$ such that $$\frac{f(b) – f(a)}{g(b) – g(a)} = \frac{f'(c)}{g'(c)}.$$

As is so often, the proof is to choose a clever function and apply the normal Mean Value Theorem. For us right now, the clever function is ${F(x) = f(x) – f(a) – \frac{f(b) – f(a)}{g(b) – g(a)}(g(x) – g(a))}$. Then as ${F(a) = F(b) = 0}$, the normal mean value theorem applies, and immediately gives the Cauchy Mean Value Theorem. \qed

Let us provide an alternative proof of Taylor’s Theorem. In order to apply Cauchy’s Mean Value Theorem, you must cleverly choose the functions ${f}$ and ${g}$. Sometimes, you must be very clever. This is beautiful, but this has been a talk about big ideas and intuition, and so in a sense this does not belong. But so be it.

Fix an ${x}$, and call ${F(t) = f(t) + f'(t)(x-t) + \cdots + f^{(n)}(t)\frac{(x-t)^n}{n!}}$, which is degreen ${n}$ Taylor polynomial centered at ${t}$, which we’re thinking of as a function of ${t}$ (a bit odd). Then note that ${F(x) = f(x)}$, and ${F(a) = P_n(x)}$, the degree ${n}$ Taylor polynomial centered at ${a}$ and evaluated at ${x}$. So ${F(x) – F(a)}$ is exactly the remainder term from Taylor’s Theorem that we want to understand. Choose ${G(t) = (x-t)^{n+1}}$. Then ${G(x) = 0}$ and ${G(a) = (x-a)^{n+1}}$. Thus $$\frac{F(x) – F(a)}{G(x) – G(a)} = \frac{R_n(x)}{-(x-a)^{n+1}} = \frac{F'(c)}{G'(c)},$$
and writing it out and simplifying the numerator (which actually telescopes), yields a more precise form of the remainder of Taylor’s Theorem. This yields to so-called “Lagrange Form” of the remainder, $$R_n(x) = f^{(n+1)}(\xi)\frac{(x-a)^{n+1}}{(n+1)!}.$$
(I omitted the details – I hope you don’t mind).

For that matter, we could immediately use Cauchy’s Mean Value Theorem to provide another proof of l’Hopital’s Rule. We’ll prove it in the ${0/0}$ case. Suppose that ${a}$ is the point such that ${f(a) = g(a) = 0}$. Then For all ${b}$ near ${a}$, Cauchy’s Mean Value Theorem says that $$\frac{f(b)}{g(b)} = \frac{f'(c)}{g'(c)},$$
where ${c}$ is a point between ${a}$ and ${b}$. As ${b \rightarrow a}$, the left hand side looks like ${\lim_{b \rightarrow a} \dfrac{f(b)}{g(b)}}$, and so if the limit exists on the right hand side, it is equal to the left. This is so direct, partially because we already chose the clever function in order to prove Cauchy’s Mean Value Theorem.

There is a little known generalization of the Mean Value Theorem to higher derivatives (Gowers’s Blog is the only reference the author has seen). A reference for the rest of this talk is a currently unpublished note of mine (which is in the process of being submitted to an expository journal), which I largely steal this from.

Similar to the ordinary Mean Value Theorem, we begin with a generalized Rolle’s Theorem:

Let ${f}$ be ${k}$ times differentiable on ${[a,b]}$, and suppose that ${f(a) = f'(a) = \cdots = f^{(k-1)}(a) = 0}$, and suppose also that ${f(b) = 0}$. Then there is some ${\xi \in (a,b)}$ such that ${f^{(k)}(\xi) = 0}$.

The proof is almost immediate from Rolle’s Theorem. The idea is to use Rolle’s Theorem iteratively, each time moving up in the order of the derivative. Since ${f(a) = f(b)}$, Rolle’s Theorem guarantees ${\xi_1 \in (a,b)}$ such that ${f'(\xi_1) = 0}$. We may apply Rolle’s Theorem again, as now ${f'(a) = f'(\xi_1) = 0}$, so there exists some ${\xi_2 \in (a,\xi_1) \subset (a,b)}$ such that ${f”(\xi_2) = 0}$. Performing this a total of ${k}$ times yields ${\xi = \xi_n \in (a,b)}$ such that ${f^{(k)}(\xi) = 0}$. This proves the theorem.

With Rolle’s Theorem, one proves the Mean Value Theorem for a general differentiable ${f}$ on ${[a,b]}$ by subtracting the linear polynomial ${g}$ satisfying ${g(a) = f(a)}$ and ${g(b) = f(b)}$. Then the function ${h(x) = f(x) – g(x)}$ is ${0}$ at both ${a}$ and ${b}$, so that Rolle’s Theorem guarantees ${\xi}$ such that ${h'(\xi) = f'(\xi) – g'(\xi) = 0}$, which is exactly the Mean Value Theorem.

We will follow the same idea. Take the degree ${k}$ polynomial ${g}$ satisfying ${g^{(n)}(a) = f^{(n)}(a)}$ for ${n = 0, \ldots, k-1}$, and also ${g(b) = p(b)}$. What is that polynomial?

The degree ${k-1}$ Taylor polynomial ${P_{k-1}}$ associated to ${f}$ and centered at ${a}$ satisfies ${P_{k-1}^{(n)}(a) = f^{(n)}(a)}$ for ${n = 0, \ldots, k-1}$, but not necessarily the last condition. So we want to add something to ${P_{k-1}}$ that doesn’t affect the first ${k-1}$ derivatives at ${a}$, but which does affect the value at ${b}$. Notice that ${C(x) = c(x-a)^k}$ for a suitable constant ${c}$ has that its first ${k-1}$ derivatives at ${a}$ are ${0}$, and ${C(b) = c(b-a)^k \neq 0}$. Rearranging ${P_{k-1}(b) + C(b) = f(b)}$, we find ${c}$: $$c = \frac{f(b) – P_{k-1}(b)}{(b-a)^k}.$$ Choose ${g(x) = P_{k-1}(x) + C(x)}$. Now ${h(x) = f(x) – g(x)}$ satisfies the conditions for Theorem, so there is some ${\xi \in (a,b)}$ such that ${h^{(k)}(\xi) = 0}$, or equivalently ${f^{(k)}(\xi) = g^{(k)}(\xi)}$. This proves a general order Mean Value Theorem.

But we can say more. As ${g}$ is a degree ${k}$ polynomial, its ${k}$th derivative is constant, independent of ${\xi}$. It is given by $$g^{(k)}(x) = \frac{k!(f(b) – P_{k-1}(b))}{(b-a)^k}.$$
Rearranging ${f^{(k)}(\xi) = g^{(k)}(\xi)}$ yields $$f(b) = P_{k-1}(b) + f^{(k)}(\xi)\frac{(b-a)^k}{k!}, \tag{1}$$
which matches the result of Theorem, Taylor’s Theorem.

So just as the Mean Value Theorem is the first case of Taylor’s Theorem, the general form of Taylor’s Theorem is a general order Mean Value Theorem.

This entry was posted in Expository, Mathematics and tagged , , , , , , . Bookmark the permalink.

### 2 Responses to Notes from a talk on the Mean Value Theorem

1. Tim McGrath says:

As much as I appreciate the connections and the insights, it would take me weeks or months to fully grasp the concepts. A lot of calculus requires a great deal of effort. Granted, it was worth the investment to learn the expansion of sin (x), but other curious items in math, such as Pascal’s Triangle, Euler’s Identity, and the Basel Problem are just as mysterious and intriguing but far more accessible to the layman,

2. Deepak Suwalka says:

Thanks, It’s a good blog. I like the way you have described the mean value theorem. it’s really appreciable. but if you proved some examples and problems than it will be a great post.