10.1 Discretization: turning calculus into arithmetic

Every numerical algorithm in this chapter rests on the same trick: replace continuous calculus — limits, integrals, derivatives — with arithmetic on a discrete grid of values. The trick has consequences. Approximating $f'(x)$ from a finite number of nearby samples introduces an error. The size of that error depends on the spacing $h$ and the order of the approximation; the choice of approximation pattern controls how rapidly the error shrinks as $h$ does. This first lesson sets up the vocabulary — truncation error, order of accuracy, roundoff — that recurs in every subsequent lesson.

Finite-difference approximations

The derivative of a smooth function $f$ at $x$ is the limit

f'(x) \;=\; \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}.

To compute it numerically, you have to stop taking the limit at some finite $h$ . The expression

D_+ f(x; h) \;\equiv\; \frac{f(x + h) - f(x)}{h}

is the forward difference approximation. Three obvious questions: how good is it? How does the answer scale with $h$ ? And what other choices are there?

A Taylor expansion gives the answer at once. Write

f(x + h) \;=\; f(x) + h\, f'(x) + \tfrac12 h^2 f''(x) + \tfrac16 h^3 f'''(x) + \cdots,

\frac{f(x + h) - f(x)}{h} \;=\; f'(x) + \tfrac12 h\, f''(x) + \tfrac16 h^2 f'''(x) + \cdots.

The forward difference equals the true derivative plus an error term of order $h$ . Doubling $h$ doubles the error. We call this a first-order approximation, and we write the error as $\mathcal{O}(h)$ .

A symmetric variant, the centred difference, halves the error in a striking way:

D_c f(x; h) \;\equiv\; \frac{f(x + h) - f(x - h)}{2 h}.

Taylor expansion of both terms and subtracting kills every even-degree derivative of $f$ . The leading error is now $\tfrac{1}{6} h^2 f'''(x)$ , i.e. $\mathcal{O}(h^2)$ . Halving $h$ now divides the error by four. This is a second-order approximation, qualitatively much better than first order at any reasonable step size.

There are higher-order schemes (the 5-point stencil $\frac{1}{12 h}[-f(x + 2h) + 8 f(x + h) - 8 f(x - h) + f(x - 2h)]$ is $\mathcal{O}(h^4)$ ), and one-sided variants for boundaries, and approximations to second derivatives like

f''(x) \;\approx\; \frac{f(x + h) - 2 f(x) + f(x - h)}{h^2},

which is $\mathcal{O}(h^2)$ . The wave-equation simulations in 10.3 use exactly this second-order centred difference in space, paired with a similar one in time.

Truncation error, in pictures

function:

evaluation point x = 1.00 step size log₁₀ h = -1.00 (h = 1.00e-1)

The log–log plot reveals the asymptotic convergence rates. The forward and backward differences are O(h) — straight lines of slope 1. The centred difference is O(h²) — slope 2, much faster decay. Together they show why higher-order methods are nearly always worth the extra work. But notice what happens at very small h: the errors stop decreasing and start *rising* again. That's *floating-point cancellation*: numerator and denominator both shrink, and limited machine precision (about 16 digits in double precision) leaves little signal. The minimum error sits near h ≈ √ε ≈ 1.5 × 10⁻⁸ for the first-order methods, and near h ≈ ε^(1/3) ≈ 6 × 10⁻⁶ for the centred difference.

The interactive shows absolute error versus step size $h$ on a log–log plot for three schemes. The slopes encode the orders directly: forward and backward differences have slope 1 (every decade of $h$ halves… no, tenths the error), centred difference has slope 2 (every decade of $h$ shrinks error by 100). The behaviour at very small $h$ is the subject of the next section.

▶ Centred difference vs exact derivative for sin(x) Worked Example

Approximate $f'(1)$ for $f(x) = \sin(x)$ using the centred difference with $h = 0.1$ . Compare to the exact value $f'(1) = \cos(1) \approx 0.540302$ .

D_c f(1;\, 0.1) = \frac{\sin(1.1) - \sin(0.9)}{2(0.1)} = \frac{0.891207 - 0.783327}{0.2} = \frac{0.107880}{0.2} = 0.53940.

Error: $|0.53940 - 0.54030| = 9.0 \times 10^{-4}$ .

For comparison, the forward difference gives $D_+ = (\sin(1.1) - \sin(1))/0.1 = (0.891207 - 0.841471)/0.1 = 0.49736$ , with error $4.3 \times 10^{-2}$ — nearly 50 times worse. The centred difference’s $\mathcal{O}(h^2)$ scaling versus the forward difference’s $\mathcal{O}(h)$ is already dramatic at $h = 0.1$ .

Roundoff: why small h is not always better

Smaller $h$ does not always mean smaller error. Look at the bottom of the interactive’s log–log plot: each curve eventually turns around and starts rising. This is roundoff error, and it is the part of numerical analysis you have to remember even when working at extremely high precision.

The issue: double-precision floats carry about 16 significant digits. If $f(x + h)$ and $f(x)$ agree to 14 digits because $h$ is small, the subtraction $f(x + h) - f(x)$ keeps only 2 significant digits. Dividing by $h$ produces a catastrophic cancellation: the absolute error in the numerator is roughly machine epsilon $\varepsilon \approx 10^{-16}$ times $|f|$ , and dividing by tiny $h$ amplifies that.

The total error is

\text{error} \;\sim\; \underbrace{C\, h}_{\text{truncation}} \;+\; \underbrace{\frac{\varepsilon\, |f|}{h}}_{\text{roundoff}}.

Minimising over $h$ :

h_{\text{opt}} \;\sim\; \sqrt{\varepsilon\, |f| / C} \;\sim\; 10^{-8}.

For the centred difference the truncation error is $\mathcal{O}(h^2)$ , and the corresponding optimal step is $h_{\text{opt}} \sim \varepsilon^{1/3} \sim 10^{-5}$ — larger but giving a smaller minimum error.

Pushing $h$ past these optima just hurts. The interactive’s “error stops decreasing” behaviour at small $h$ is exactly this.

Discretisation in space vs in time

Most simulations have both a spatial and a temporal step. The wave-equation finite-difference scheme uses $h$ for the grid spacing in $x$ and $\Delta t$ for the time step; the heat equation has the same. Choosing them independently is dangerous because the two affect stability (whether the scheme blows up) as well as accuracy. The von Neumann stability analysis of 10.3 connects the two.

What we use this for

Discretisation is the bottom layer of every numerical algorithm in the chapter:

ODE solvers in 10.2 replace $\dot x = f(t, x)$ with a recurrence on $x$ at sample times $t_n$ . The choice of recurrence corresponds to choosing how to approximate the derivative.
PDE simulators in 10.3 replace spatial and temporal derivatives with finite differences on a 2-D grid in $(x, t)$ .
Quadrature (numerical integration) is the same trick from the integral side: approximate $\int f(x) dx$ as a finite sum of $f$ -values. The trapezoidal and Simpson rules are the integration analogues of the forward and centred differences.

The next lesson develops ODE solvers built from these primitives.