An intuitive introduction to calculus
This is a post written for my fall 2013 Math 100 class but largely intended for anyone with knowledge of what a function is and a desire to know what calculus is all about. Calculus is made out to be the pinnacle of the high school math curriculum, and correspondingly is thought to be very hard. But the difficulty is bloated, blown out of proportion. In fact, the ideas behind calculus are approachable and even intuitive if thought about in the right way.
Many people managed to stumble across the page before I’d finished all the graphics. I’m sorry, but they’re all done now! I was having trouble interpreting how WordPress was going to handle my gif files – it turns out that they automagically resize them if you don’t make them of the correct size, which makes them not display. It took me a bit to realize this. I’d like to mention that this actually started as a 90 minute talk I had with my wife over coffee, so perhaps an alternate title would be “Learning calculus in 2 hours over a cup of coffee.”
So read on if you would like to understand what calculus is, or if you’re looking for a refresher of the concepts from a first semester in calculus (like for Math 100 students at Brown), or if you’re looking for a bird’s eye view of AP Calc AB subject material.
1. An intuitive and semicomplete introduction to calculus
We will think of a function $ {f(\cdot)}$ as something that takes an input $ {x}$ and gives out another number, which we’ll denote by $ {f(x)}$. We know functions like $ {f(x) = x^2 + 1}$, which means that if I give in a number $ {x}$ then the function returns the number $ {f(x) = x^2 + 1}$. So I put in $ {1}$, I get $ {1^2 + 1 = 2}$, i.e. $ {f(1) = 2}$. Primary and secondary school overly conditions students to think of functions in terms of a formula or equation. The important thing to remember is that a function is really just something that gives an output when given an input, and if the same input is given later then the function spits the same output out. As an aside, I should mention that the most common problem I’ve seen in my teaching and tutoring is a fundamental misunderstanding of functions and their graphs
For a function that takes in and spits out numbers, we can associate a graph. A graph is a two-dimensional representation of our function, where by convention the input is put on the horizontal axis and the output is put on the vertical axis. Each axis is numbered, and in this way we can identify any point in the graph by its coordinates, i.e. its horizontal and vertical position. A graph of a function $ {f(x)}$ includes a point $ {(x,y)}$ if $ {y = f(x)}$.

Thus each point on the graph is really of the form $ {(x, f(x))}$. A large portion of algebra I and II is devoted to being able to draw graphs for a variety of functions. And if you think about it, graphs contain a huge amount of information. Graphing $ {f(x)= x^2 + 1}$ involves drawing an upwards-facing parabola, which really represents an infinite number of points. That’s pretty intense, but it’s not what I want to focus on here.
1.1. Generalizing slope – introducing the derivative
You might recall the idea of the ‘slope’ of a line. A line has a constant ratio of how much the $ {y}$ value changes for a specific change in $ {x}$, which we call the slope (people always seem to remember rise over run). In particular, if a line passes through the points $ {(x_1, y_1)}$ and $ {(x_2, y_2)}$, then its slope will be the vertical change $ {y_2 – y_1}$ divided by the horizontal change $ {x_2 – x_1}$, or $ {\dfrac{y_2 – y_1}{x_2 – x_1}}$.

So if the line is given by an equation $ {f(x) = \text{something}}$, then the slope from two inputs $ {x_1}$ and $ {x_2}$ is $ {\dfrac{f(x_2) – f(x_1)}{x_2 – x_1}}$. As an aside, for those that remember things like the ‘standard equation’ $ {y = mx + b}$ or ‘point-slope’ $ {(y – y_0) = m(x – x_0)}$ but who have never thought or been taught where these come from: the claim that lines are the curves of constant slope is saying that for any choice of $ {(x_1, y_1)}$ on the line, we expect $ {\dfrac{y_2 – y_1}{x_2 – x_1} = m}$ a constant, which I denote by $ {m}$ for no particularly good reason other than the fact that some textbook author long ago did such a thing. Since we’re allowing ourselves to choose any $ {(x_1, y_1)}$, we might drop the subscripts – since they usually mean a constant – and rearrange our equation to give $ {y_2 – y = m(x_2 – x)}$, which is what has been so unkindly drilled into students’ heads as the ‘point-slope form.’ This is why lines have a point-slope form, and a reason that it comes up so much is that it comes so naturally from the defining characteristic of a line, i.e. constant slope.
But one cannot speak of the ‘slope’ of a parabola.

Intuitively, we look at our parabola $ {x^2 + 1}$ and see that the ‘slope,’ or an estimate of how much the function $ {f(x)}$ changes with a change in $ {x}$, seems to be changing depending on what $ {x}$ values we choose. (This should make sense – if it didn’t change, and had constant slope, then it would be a line). The first major goal of calculus is to come up with an idea of a ‘slope’ for non-linear functions. I should add that we already know a sort of ‘instantaneous rate of change’ of a nonlinear function. When we’re in a car and we’re driving somewhere, we’re usually speeding up or slowing down, and our pace isn’t usually linear. Yet our speedometer still manages to say how fast we’re going, which is an immediate rate of change. So if we had a function $ {p(t)}$ that gave us our position at a time $ {t}$, then the slope would give us our velocity (change in position per change in time) at a moment. So without knowing it, we’re familiar with a generalized slope already. Now in our parabola, we don’t expect a constant slope, so we want to associate a ‘slope’ to each input $ {x}$. In other words, we want to be able to understand how rapidly the function $ {f(x)}$ is changing at each $ {x}$, analogous to how the slope $ {m}$ of a line $ {g(x) = mx + b}$ tells us that if we change our input by an amount $ {h}$ then our output value will change by $ {mh}$.
How does calculus do that? The idea is to get closer and closer approximations. Suppose we want to find the ‘slope’ of our parabola at the point $ {x = 1}$. Let’s get an approximate answer. The slope of the line coming from inputs $ {x = 1}$ and $ {x = 2}$ is a (poor) approximation. In particular, since we’re working with $ {f(x) = x^2 + 1}$, we have that $ {f(2) = 5}$ and $ {f(1) = 2}$, so that the ‘approximate slope’ from $ {x = 1}$ and $ {x = 2}$ is $ {\frac{5 – 2}{2 – 1} = 3}$. But looking at the graph,

we see that it feels like this slope is too large. So let’s get closer. Suppose we use inputs $ {x = 1}$ and $ {x = 1.5}$. We get that the approximate slope is $ {\frac{3.25 – 2}{1.5 – 1} = 2.5}$. If we were to graph it, this would also feel too large. So we can keep choosing smaller and smaller changes, like using $ {x = 1}$ and $ {x = 1.1}$, or $ {x = 1}$ and $ {x = 1.01}$, and so on. This next graphic contains these approximations, with chosen points getting closer and closer to $ {1}$.

Let’s look a little closer at the values we’re getting for our slopes when we use $ {1}$ and $ {2, 1.5, 1.1, 1.01, 1.001}$ as our inputs. We get
$ \displaystyle \begin{array}{c|c} \text{second input} & \text{approx. slope} \\ \hline 2 & 3 \\ 1.5 & 2.5 \\ 1.1 & 2.1 \\ 1.01 & 2.01 \\ 1.001 & 2.001 \end{array} $
It looks like the approximate slopes are approaching $ {2}$. What if we plot the graph with a line of slope $ {2}$ going through the point $ {(1,2)}$?

It looks great! Let’s zoom in a whole lot.

That looks really close! In fact, what I’ve been allowing as the natural feeling slope, or local rate of change, is really the line tangent to the graph of our function at the point $ {(1, f(1))}$. In a calculus class, you’ll spend a bit of time making sense of what it means for the approximate slopes to ‘approach’ $ {2}$. This is called a ‘limit,’ and the details are not important to us right now. The important thing is that this let us get an idea of a ‘slope’ at a point on a parabola. It’s not really a slope, because a parabola isn’t a line. So we’ve given it a different name – we call this ‘the derivative.’ So the derivative of $ {f(x) = x^2 + 1}$ at $ {x = 1}$ is $ {2}$, i.e. right around $ {x = 1}$ we expect a rate of change of $ {2}$, so that we expect $ {f(1 + h) – f(1) \approx 2h}$. If you think about it, we’re saying that we can approximate $ {f(x) = x^2 + 1}$ near the point $ {(1, 2)}$ by the line shown in the graph above: this line passes through $ {(1,2)}$ and it’s slope is $ {2}$, what we’re calling the slope of $ {f(x) = x^2 + 1}$ at $ {x = 1}$.
Let’s generalize. We were able to speak of the derivative at one point, but how about other points? The rest of this post is below the ‘more’ tag below.