Boltzmann 1872: Thermal Equilibrium of Gas Molecules

T.M. Jones

doi:10.5281/zenodo.xxxxxxx

Ludwig Boltzmann · 1872

Thermal Equilibrium of Gas Molecules

p. 299

From these equations it can again be proved that

must always decrease [immer abnehmen muss], unless $\htmlClass{sym sym-u}{u_1^2} = \htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-u}{u_3},\; \htmlClass{sym sym-u}{u_2^2} = \htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-u}{u_3},\; \htmlClass{sym sym-u}{u_3^2} = \htmlClass{sym sym-u}{u_2} \htmlClass{sym sym-u}{u_4} \dots$ (that is, unless all expressions multiplied by the coefficients $\htmlClass{sym sym-B}{B}$ in equation (35)) vanish. Equations (35) have an inconvenient feature: while they can be written compactly with summation formulae, they resist being written out fully and explicitly. For clarity, we therefore begin with the simplest case, then move stepwise toward the general one.

First let $p = 3$; the molecules can have only three kinetic energies, $\htmlClass{sym sym-eps}{\epsilon}$, $2\htmlClass{sym sym-eps}{\epsilon}$, and $3\htmlClass{sym sym-eps}{\epsilon}$. Then the system of equations (35) reduces to the following three equations:

\begin{aligned} \htmlClass{sym sym-sqrt}{\sqrt{1}}\,\htmlClass{sym sym-deriv}{\frac{du_1}{dt}} &= \htmlClass{sym sym-B}{B_{11}^{22}}\,(\htmlClass{sym sym-u}{u_2^2} - \htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-u}{u_3}) \\[4pt] \htmlClass{sym sym-sqrt}{\sqrt{2}}\,\htmlClass{sym sym-deriv}{\frac{du_2}{dt}} &= 2\htmlClass{sym sym-B}{B_{11}^{22}}\,(\htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-u}{u_3} - \htmlClass{sym sym-u}{u_2^2}) \\[4pt] \htmlClass{sym sym-sqrt}{\sqrt{3}}\,\htmlClass{sym sym-deriv}{\frac{du_3}{dt}} &= \htmlClass{sym sym-B}{B_{11}^{22}}\,(\htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-u}{u_3} - \htmlClass{sym sym-u}{u_2^2}) \end{aligned} \tag{36}

and the expression for $\htmlClass{sym sym-E}{E}$ becomes

\htmlClass{sym sym-E}{E} = \htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\,\htmlClass{sym sym-u}{u_2} \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2} + \htmlClass{sym sym-sqrt}{\sqrt{3}}\,\htmlClass{sym sym-u}{u_3} \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_3}

Differentiating gives

\htmlClass{sym sym-deriv}{\frac{dE}{dt}} = (\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1} + 1)\htmlClass{sym sym-deriv}{\frac{du_1}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{2}}(\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2} + 1)\htmlClass{sym sym-deriv}{\frac{du_2}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{3}}(\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_3} + 1)\htmlClass{sym sym-deriv}{\frac{du_3}{dt}}

p. 300

or, after rearranging,

\htmlClass{sym sym-deriv}{\frac{dE}{dt}} = \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-deriv}{\frac{du_1}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2} \htmlClass{sym sym-deriv}{\frac{du_2}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_3} \htmlClass{sym sym-deriv}{\frac{du_3}{dt}} + \left( \htmlClass{sym sym-deriv}{\frac{du_1}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-deriv}{\frac{du_2}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-deriv}{\frac{du_3}{dt}} \right)

The sum in parentheses vanishes by equations (36). Thus $\htmlClass{sym sym-deriv}{dE/dt}$ is obtained by multiplying the first equation by $\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1}$, the second by $\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2}$, the third by $\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_3}$, then adding. This yields

\htmlClass{sym sym-deriv}{\frac{dE}{dt}} = \htmlClass{sym sym-B}{B_{11}^{22}}\,(\htmlClass{sym sym-u}{u_2^2} - \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3})\bigl(\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1} + \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_3} - 2\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2}\bigr)

\htmlClass{sym sym-deriv}{\frac{dE}{dt}} = \htmlClass{sym sym-B}{B_{11}^{22}}\,(\htmlClass{sym sym-u}{u_2^2} - \htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-u}{u_3})\,\htmlClass{sym sym-log}{\log}\!\left(\frac{\htmlClass{sym sym-u}{u_1} \htmlClass{sym sym-u}{u_3}}{\htmlClass{sym sym-u}{u_2^2}}\right)

Consider the two factors multiplying $\htmlClass{sym sym-B}{B_{11}^{22}}$. When $\htmlClass{sym sym-u}{u_2^2} > \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3}$, the first factor is positive and the logarithm is negative. When $\htmlClass{sym sym-u}{u_2^2} < \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3}$, the first factor is negative and the logarithm is positive. In either case their product is always negative. Since $\htmlClass{sym sym-B}{B_{11}^{22}}$ is positive, $\htmlClass{sym sym-deriv}{dE/dt}$ is always negative or zero. Equality holds only when $\htmlClass{sym sym-u}{u_2^2} = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3}$.

It is also easy to see that $\htmlClass{sym sym-E}{E}$ cannot run to minus infinity. None of $\htmlClass{sym sym-u}{u_1},\htmlClass{sym sym-u}{u_2},\htmlClass{sym sym-u}{u_3}$ may be negative or imaginary. For positive $u$, the quantity $u\htmlClass{sym sym-log}{\log} u$ cannot be smaller than $-1/e$. Hence $\htmlClass{sym sym-E}{E}$ cannot be smaller than

-\frac{1 + \htmlClass{sym sym-sqrt}{\sqrt{2}} + \htmlClass{sym sym-sqrt}{\sqrt{3}}}{e}

where $e$ is the base of natural logarithms. Therefore, since its derivative cannot be positive, $\htmlClass{sym sym-E}{E}$ must move steadily toward a minimum where $\htmlClass{sym sym-deriv}{dE/dt} = 0$, namely where $\htmlClass{sym sym-u}{u_2^2} = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3}$.

The same proof does not carry over unchanged when $p > 3$. Here I treat only the case $p = 4$.

p. 301

\begin{aligned} \htmlClass{sym sym-sqrt}{\sqrt{1}}\htmlClass{sym sym-deriv}{\frac{du_1}{dt}} &= \htmlClass{sym sym-B}{B_{11}^{22}}(\htmlClass{sym sym-u}{u_2^2} - \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3}) + \htmlClass{sym sym-B}{B_{11}^{33}}(\htmlClass{sym sym-u}{u_3^2} - \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4}) \\[4pt] \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-deriv}{\frac{du_2}{dt}} &= \htmlClass{sym sym-B}{B_{11}^{22}}(\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3} - \htmlClass{sym sym-u}{u_2^2}) + (\htmlClass{sym sym-B}{B_{12}^{23}}+\htmlClass{sym sym-B}{B_{11}^{24}})(\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4} - \htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-u}{u_3}) \\[4pt] \htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-deriv}{\frac{du_3}{dt}} &= \htmlClass{sym sym-B}{B_{11}^{33}}(\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4} - \htmlClass{sym sym-u}{u_3^2}) + (\htmlClass{sym sym-B}{B_{12}^{23}}+\htmlClass{sym sym-B}{B_{11}^{24}})(\htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-u}{u_3} - \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4}) + 2\htmlClass{sym sym-B}{B_{22}^{44}}(\htmlClass{sym sym-u}{u_4^2} - \htmlClass{sym sym-u}{u_2^{2\,?}}) \\[4pt] \htmlClass{sym sym-sqrt}{\sqrt{4}}\htmlClass{sym sym-deriv}{\frac{du_4}{dt}} &= (\htmlClass{sym sym-B}{B_{12}^{23}}+\htmlClass{sym sym-B}{B_{11}^{24}})(\htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-u}{u_3} - \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4}) + \htmlClass{sym sym-B}{B_{11}^{33}}(\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4} - \htmlClass{sym sym-u}{u_2^{2\,?}}) \end{aligned} \tag{37}

The "?" marks are not typos — they appear in Boltzmann's original 1872 paper. He is working out notation on the fly, and the terms are ambiguous even in the source. We preserve them as historical artifacts.

For $\htmlClass{sym sym-E}{E}$ one finds

\htmlClass{sym sym-E}{E} = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2} + \htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-u}{u_3}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_3} + \htmlClass{sym sym-sqrt}{\sqrt{4}}\htmlClass{sym sym-u}{u_4}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_4}$$ $$\htmlClass{sym sym-deriv}{\frac{dE}{dt}} = \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-deriv}{\frac{du_1}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-deriv}{\frac{du_2}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_3}\htmlClass{sym sym-deriv}{\frac{du_3}{dt}} + \htmlClass{sym sym-sqrt}{\sqrt{4}}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_4}\htmlClass{sym sym-deriv}{\frac{du_4}{dt}}

If one substitutes here for the derivatives their values from equations (37), he obtains, with a suitable rearrangement of terms,

\begin{aligned} \htmlClass{sym sym-deriv}{\frac{dE}{dt}} = \htmlClass{sym sym-B}{B_{11}^{22}}(\htmlClass{sym sym-u}{u_2^2}-\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3})\htmlClass{sym sym-log}{\log}\frac{\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3}}{\htmlClass{sym sym-u}{u_2^2}} &+ \htmlClass{sym sym-B}{B_{11}^{33}}(\htmlClass{sym sym-u}{u_3^2}-\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4})\htmlClass{sym sym-log}{\log}\frac{\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4}}{\htmlClass{sym sym-u}{u_3^2}} \\[4pt] &+ (\htmlClass{sym sym-B}{B_{12}^{23}}+\htmlClass{sym sym-B}{B_{11}^{24}})(\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4} - \htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-u}{u_3})\htmlClass{sym sym-log}{\log}\frac{\htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-u}{u_3}}{\htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4}} \end{aligned}

p. 302

I remark that the change in the order of the summands, which is necessary here, is analogous to our previous transformation of definite integrals. From the above expression one sees at once that $\htmlClass{sym sym-deriv}{dE/dt}$ is again necessarily negative, unless simultaneously we have

\htmlClass{sym sym-u}{u_2^2} = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_3},\qquad \htmlClass{sym sym-u}{u_3^2} = \htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-u}{u_4},\qquad \htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-u}{u_3} = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-u}{u_4}

which can also be written

\htmlClass{sym sym-u}{u_3} = \frac{\htmlClass{sym sym-u}{u_2^2}}{\htmlClass{sym sym-u}{u_1}},\qquad \htmlClass{sym sym-u}{u_4} = \frac{\htmlClass{sym sym-u}{u_2^3}}{\htmlClass{sym sym-u}{u_1^2}}.

Likewise one finds in the general case that $\htmlClass{sym sym-deriv}{dE/dt}$ is necessarily negative so that $\htmlClass{sym sym-E}{E}$ must decrease unless

\htmlClass{sym sym-u}{u_3} = \frac{\htmlClass{sym sym-u}{u_2^2}}{\htmlClass{sym sym-u}{u_1}},\quad \htmlClass{sym sym-u}{u_4} = \frac{\htmlClass{sym sym-u}{u_2^3}}{\htmlClass{sym sym-u}{u_1^2}},\quad\ldots\tag{38}

Since $\htmlClass{sym sym-E}{E}$ cannot have a larger negative value than

-\frac{1+\htmlClass{sym sym-sqrt}{\sqrt{2}}+\htmlClass{sym sym-sqrt}{\sqrt{3}}+\dots+\htmlClass{sym sym-sqrt}{\sqrt{p}}}{e}\tag{39}

it must necessarily approach a minimum value for which equations (38) hold. Thus it continually approaches the distribution of states determined by equations (38).

We now have to prove that equations (38) uniquely determine the distribution of states. If we add together all the equations (35), we obtain

\frac{d}{dt}\bigl(\htmlClass{sym sym-u}{u_1} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-u}{u_2} + \htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-u}{u_3} + \dots + \htmlClass{sym sym-sqrt}{\sqrt{p}}\htmlClass{sym sym-u}{u_p}\bigr) = 0

hence

\htmlClass{sym sym-u}{u_1} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-u}{u_2} + \htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-u}{u_3} + \dots + \htmlClass{sym sym-sqrt}{\sqrt{p}}\htmlClass{sym sym-u}{u_p} = \htmlClass{sym sym-a}{a} \tag{40}

p. 303

In a similar way we find that

\htmlClass{sym sym-u}{u_1} + 2\htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-u}{u_2} + 3\htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-u}{u_3} + \dots + p\htmlClass{sym sym-sqrt}{\sqrt{p}}\htmlClass{sym sym-u}{u_p} = \frac{\htmlClass{sym sym-b}{b}}{\htmlClass{sym sym-eps}{\epsilon}} \tag{41}

where $\htmlClass{sym sym-a}{a}$ and $\htmlClass{sym sym-b}{b}$ are constants. The meaning of these equations is obvious. In particular, $\htmlClass{sym sym-u}{u_1} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-u}{u_2} + \htmlClass{sym sym-sqrt}{\sqrt{3}}\htmlClass{sym sym-u}{u_3} + \dots = \htmlClass{sym sym-a}{a}$ is the total number of molecules in unit volume, while $\htmlClass{sym sym-b}{b}$ is their total kinetic energy. Equations (40) and (41) therefore tell us that these two quantities are constant.

Suppose that the two quantities $\htmlClass{sym sym-a}{a}$ and $\htmlClass{sym sym-b}{b}$ are given. Then we set the quotient $\htmlClass{sym sym-u}{u_2}/\htmlClass{sym sym-u}{u_1}$ equal to $\htmlClass{sym sym-y}{y}$. Equations (38) then reduce to

\htmlClass{sym sym-u}{u_3} = \htmlClass{sym sym-y}{y}^2 \htmlClass{sym sym-u}{u_1},\quad \htmlClass{sym sym-u}{u_4} = \htmlClass{sym sym-y}{y}^3 \htmlClass{sym sym-u}{u_1},\quad\ldots,\quad \htmlClass{sym sym-u}{u_p} = \htmlClass{sym sym-y}{y}^{p-1}\htmlClass{sym sym-u}{u_1}

If one substitutes these values into equations (40) and (41), then he finds easily:

\begin{aligned} 0 &=\; (p-1)\frac{\htmlClass{sym sym-b}{b}}{\htmlClass{sym sym-eps}{\epsilon}} - p\htmlClass{sym sym-a}{a}\; \htmlClass{sym sym-y}{y}^{p-1} + \bigl((p-2)\frac{\htmlClass{sym sym-b}{b}}{\htmlClass{sym sym-eps}{\epsilon}} - (p-1)\htmlClass{sym sym-a}{a}\bigr)\htmlClass{sym sym-y}{y}^{p-2} + \dots \\ &\quad + \bigl(3\htmlClass{sym sym-a}{a}-\frac{\htmlClass{sym sym-b}{b}}{\htmlClass{sym sym-eps}{\epsilon}}\bigr)3\htmlClass{sym sym-y}{y}^2 + \bigl(2\htmlClass{sym sym-a}{a}-\frac{\htmlClass{sym sym-b}{b}}{\htmlClass{sym sym-eps}{\epsilon}}\bigr)2\htmlClass{sym sym-y}{y} + \bigl(\htmlClass{sym sym-a}{a}-\frac{\htmlClass{sym sym-b}{b}}{\htmlClass{sym sym-eps}{\epsilon}}\bigr) \tag{42} \end{aligned}

Since all the $\htmlClass{sym sym-u}{u}$'s are necessarily positive, we see immediately that $\frac{\htmlClass{sym sym-b}{b}}{\htmlClass{sym sym-eps}{\epsilon}} - \htmlClass{sym sym-a}{a}$ must be positive while $\frac{\htmlClass{sym sym-b}{b}}{\htmlClass{sym sym-eps}{\epsilon}} - p\htmlClass{sym sym-a}{a}$ must be negative. Hence $\htmlClass{sym sym-b}{b}$ must lie between $\htmlClass{sym sym-eps}{\epsilon} \htmlClass{sym sym-a}{a}$ and $p\htmlClass{sym sym-eps}{\epsilon} \htmlClass{sym sym-a}{a}$. Hence, in equation (42) the coefficient of $\htmlClass{sym sym-y}{y}^{p-1}$ is positive, while the term independent of $\htmlClass{sym sym-y}{y}$ must be negative. The polynomial is therefore positive for $\htmlClass{sym sym-y}{y} = \infty$, and negative for $\htmlClass{sym sym-y}{y} = 0$; therefore there is one and only one positive root for $\htmlClass{sym sym-y}{y}$, since the series of coefficients changes sign only once. Negative or imaginary values for $\htmlClass{sym sym-y}{y}$ are of course meaningless. But from $\htmlClass{sym sym-y}{y}$ we can determine uniquely all the $\htmlClass{sym sym-u}{u}$'s and also all the $\htmlClass{sym sym-w}{w}$'s. Hence, whatever may be the initial distribution of states, there is one and only one distribution which it approaches with increasing time. This distribution depends only on the constants $\htmlClass{sym sym-a}{a}$ and $\htmlClass{sym sym-b}{b}$, the total number and total kinetic energy of the molecules (density and temperature of the gas).

p. 304

This theorem was proved first only for the case that the distribution of states is initially uniform. It must also hold, however, when this is not true, provided only that the molecules are distributed in such a way that they tend to become mixed as time progresses, so that the distribution becomes uniform after a very long time. This will always happen with the exception of certain special cases, for example, when the molecules move initially in a straight line and are reflected back in this straight line at the walls. Since we have established this for arbitrary $p$ and $\htmlClass{sym sym-eps}{\epsilon}$, we can immediately go to the case where $\frac{1}{p}$ and $\htmlClass{sym sym-eps}{\epsilon}$ become infinitesimal.

† For very large $p$, the expression (39) will be very large, of order $p$. In this case it is necessary to look for a smaller negative value that $\htmlClass{sym sym-E}{E}$ can never exceed. The quantity denoted here by $\htmlClass{sym sym-E}{E}$ differs by a constant from the one earlier so denoted. If we wish to obtain the quantity denoted by $\htmlClass{sym sym-E1}{E_1}$ in equation (17a), page 113, which again differs only by a constant from the other quantities denoted by this letter, then we must add to our present $\htmlClass{sym sym-E}{E}$, $-\frac{3\htmlClass{sym sym-log}{\log}\htmlClass{sym sym-eps}{\epsilon}}{2}(\htmlClass{sym sym-u}{u_1}+\htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-u}{u_2}+\dots)$. Therefore

\htmlClass{sym sym-E1}{E_1} = \htmlClass{sym sym-E}{E} - \frac{3\htmlClass{sym sym-log}{\log}\htmlClass{sym sym-eps}{\epsilon}}{2}(\htmlClass{sym sym-u}{u_1}+\htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-u}{u_2}+\dots) = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-log}{\log}\frac{\htmlClass{sym sym-u}{u_1}}{\htmlClass{sym sym-eps}{\epsilon}^{3/2}} + \htmlClass{sym sym-sqrt}{\sqrt{2}}\htmlClass{sym sym-u}{u_2}\htmlClass{sym sym-log}{\log}\frac{\htmlClass{sym sym-u}{u_2}}{\htmlClass{sym sym-eps}{\epsilon}^{3/2}} + \dots

It is clear now that $\htmlClass{sym sym-E1}{E_1}$ is a real and continuous function of the $\htmlClass{sym sym-u}{u}$'s for all real positive values of it. Furthermore, if we say that a negative quantity is smaller, the greater its numerical value is, then $\htmlClass{sym sym-E}{E}$ is not smaller than the expression (39), hence $\htmlClass{sym sym-E1}{E_1}$ is not smaller than

-\frac{1}{e}(1+\htmlClass{sym sym-sqrt}{\sqrt{2}}+\dots+\htmlClass{sym sym-sqrt}{\sqrt{p}}) - \frac{3}{2}\htmlClass{sym sym-a}{a}\htmlClass{sym sym-log}{\log}\htmlClass{sym sym-eps}{\epsilon}

Hence, $\htmlClass{sym sym-E1}{E_1}$ must have a minimum if the $\htmlClass{sym sym-u}{u}$'s run through all real positive values compatible with equations (40) and (41). One can then easily show that for this minimum none of the $\htmlClass{sym sym-u}{u}$'s can be equal to zero, so that the minimum cannot lie on the boundary of the space formed from the $\htmlClass{sym sym-u}{u}$'s, and consequently it can be found by applying the usual rules of differential calculus. If we add to the total differential of $\htmlClass{sym sym-E1}{E_1}$ that of the two equations (40) and (41), multiplying the former with the undetermined multiplier $\htmlClass{sym sym-lam}{\lambda}$, and the latter by $\htmlClass{sym sym-lam}{\mu}$, then we obtain

(\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1} + \htmlClass{sym sym-lam}{\lambda} + \htmlClass{sym sym-lam}{\mu})du_1 + (\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2} + \htmlClass{sym sym-lam}{\lambda} + 2\htmlClass{sym sym-lam}{\mu})\htmlClass{sym sym-sqrt}{\sqrt{2}}du_2 + \dots = 0

At the minimum, the factor of each differential must vanish, whence on elimination of $\htmlClass{sym sym-lam}{\lambda}$ and $\htmlClass{sym sym-lam}{\mu}$ one obtains

\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2} - \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_1} = \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_3} - \htmlClass{sym sym-log}{\log} \htmlClass{sym sym-u}{u_2} = \dots

or $\htmlClass{sym sym-u}{u_2}/\htmlClass{sym sym-u}{u_1} = \htmlClass{sym sym-u}{u_3}/\htmlClass{sym sym-u}{u_2} = \htmlClass{sym sym-u}{u_4}/\htmlClass{sym sym-u}{u_3} = \dots$, which we recognize to be the same as equations (38). These equations therefore determine the smallest value that $\htmlClass{sym sym-E1}{E_1}$ can have when the $\htmlClass{sym sym-u}{u}$'s take all possible values consistent with equations (40) and (41). However, since the $\htmlClass{sym sym-u}{u}$'s are actually subject to equations (40) and (41) during the entire process, this is the smallest value of $\htmlClass{sym sym-E1}{E_1}$ during the entire process. In order to calculate it, we set again $\htmlClass{sym sym-u}{u_2} = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-y}{y},\; \htmlClass{sym sym-u}{u_3} = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-y}{y}^2,\dots$ We know that we then find from equations (38), (40) and (41) a unique positive value for $\htmlClass{sym sym-y}{y}$, which must correspond to the actual minimum of $\htmlClass{sym sym-E1}{E_1}$. This minimum value of $\htmlClass{sym sym-E1}{E_1}$ is therefore

\htmlClass{sym sym-E1}{E_1} = \frac{\htmlClass{sym sym-b}{b}}{2}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-y}{y} + \htmlClass{sym sym-a}{a}\htmlClass{sym sym-log}{\log}\left(\frac{\htmlClass{sym sym-u}{u_1}}{\htmlClass{sym sym-eps}{\epsilon}^{3/2}}\right)

$\htmlClass{sym sym-E1}{E_1}$ cannot have a smaller value than this. This value remains finite for infinitesimal $\htmlClass{sym sym-eps}{\epsilon}$ and infinite $p$. Taking account of equations (43), we see that it reduces to $\htmlClass{sym sym-a}{a}\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-C}{C} - \htmlClass{sym sym-b}{b}\htmlClass{sym sym-h}{h}$ or, since $\htmlClass{sym sym-a}{a} = 2\int \htmlClass{sym sym-C}{C}$ and $\htmlClass{sym sym-b}{b} = 2/\htmlClass{sym sym-h}{h}$, one can write for it $\frac{1}{2}\htmlClass{sym sym-a}{a}(\htmlClass{sym sym-log}{\log} \htmlClass{sym sym-C}{C} - 1)$ which is a finite quantity. Hence $\htmlClass{sym sym-E1}{E_1}$ cannot be minus infinity. On the other hand, it may be plus infinity. We still have to show that in that case there cannot be thermal equilibrium. This proof, as well as an explicit discussion of the exceptional case where $\lim_{\tau\to0} \frac{\htmlClass{sym sym-eps}{\epsilon}}{\tau}[\dots]$ comes out to be different according as $\htmlClass{sym sym-eps}{\epsilon}/\tau$ or $\tau/\htmlClass{sym sym-eps}{\epsilon}$ vanishes, will not be discussed further here.

p. 305

We have first:

\htmlClass{sym sym-w}{w_k} = \htmlClass{sym sym-sqrt}{\sqrt{k}}\htmlClass{sym sym-u}{u_k} = \htmlClass{sym sym-u}{u_1}\htmlClass{sym sym-sqrt}{\sqrt{k}}\htmlClass{sym sym-y}{y}^{k-1}

For infinitesimal $\htmlClass{sym sym-eps}{\epsilon}$ we can again set:

\htmlClass{sym sym-eps}{\epsilon} = dx,\quad k\htmlClass{sym sym-eps}{\epsilon} = x,\quad \htmlClass{sym sym-y}{y} = e^{-\htmlClass{sym sym-h}{h}\htmlClass{sym sym-eps}{\epsilon}},\quad \frac{1}{\htmlClass{sym sym-sqrt}{\sqrt{\epsilon}}} = \htmlClass{sym sym-C}{C} \tag{43}$$ $$\htmlClass{sym sym-y}{y}^k = e^{-\htmlClass{sym sym-h}{h}k\htmlClass{sym sym-eps}{\epsilon}} = e^{-\htmlClass{sym sym-h}{h}x}

and obtain

\htmlClass{sym sym-w}{w_k} = \htmlClass{sym sym-C}{C}\,\htmlClass{sym sym-sqrt}{\sqrt{x}}\,e^{-\htmlClass{sym sym-h}{h}x}\,dx

which is again the Maxwell distribution. Likewise one can convince himself that the sum which we have here denoted by $\htmlClass{sym sym-E}{E}$ reduces, aside from a constant additive term, to the integral in equation (17a); we therefore obtain by this method all the results that we earlier found by transformations of definite integrals, but it is the advantage of being much simpler and clearer. One only has to accept the abstraction that a molecule may have only a finite number of kinetic energies as a transition stage.

If one sets the time derivatives in equations (35) equal to zero, he obtains the conditions that the distribution of states does not change with time but is stationary. The condition that the distribution be stationary is obtained by setting $\frac{\partial \htmlClass{sym sym-f}{f}(x,t)}{\partial t} = 0$ in equation (16). This gives:

\begin{aligned} 0 = \int_0^\infty \int_0^{x+x'} &\Big[ \htmlClass{sym sym-f}{f}(\xi)\htmlClass{sym sym-f}{f}(x+x'-\xi) - \htmlClass{sym sym-f}{f}(x)\htmlClass{sym sym-f}{f}(x') \Big] \\ &\times \htmlClass{sym sym-sqrt}{\sqrt{\frac{xx'}{\xi(x+x'-\xi)}}} \; \htmlClass{sym sym-psi}{\psi}(x,x',\xi) \; d\xi \, dx' \end{aligned}

p. 306

A solution of this equation is

\htmlClass{sym sym-f}{f}(x) = \htmlClass{sym sym-C}{C}\htmlClass{sym sym-sqrt}{\sqrt{x}}\,e^{-\htmlClass{sym sym-h}{h}x}

which is the Maxwell distribution. From what has been said previously it follows that there are infinitely many other solutions, which are not useful however since $\htmlClass{sym sym-f}{f}(x)$ comes out negative or imaginary for some values of $x$. Hence, it follows very clearly that Maxwell's attempt to prove a priori that his solution is the only one must fail, since it is not the only one but rather it is the only one that gives purely positive probabilities, and therefore it is the only one that is useful.

Boltzmann 1872 / 2025 · reader's edition · every symbol explained for normal humans
after eight LLMs and one ringleader · the glossary is finally complete

Reader's Companion

What Boltzmann actually means — for normal people

Home

🧠 THE BIG PICTURE

⚡ Boltzmann's secret weapon

The discretization trick. Boltzmann can't handle continuous energies directly — the math is too hard. So he cheats: he chops energy into discrete bins (ε, 2ε, 3ε, ...), proves the H-theorem for this artificial system, then takes a limit (ε→0, p→∞) to recover the continuous case. This is the first time anyone did this. It's now standard. In 1872 it was radical.

You're reading a paper where the author invents a new branch of mathematics to solve a physics problem.

📉 What "H" actually is

Boltzmann doesn't call it H — that came later. He calls it E. It's a number that measures how spread out the molecules are among different energy levels. When all molecules have the same energy, E is small. When they're evenly distributed, E is large. Boltzmann proves E always decreases until it hits a minimum. That minimum is thermal equilibrium. Modern physicists flip the sign and call it entropy (S), which increases. So Boltzmann's E is actually -S.

This sign flip has confused students for 150 years. You're not alone.

⏳ The paradox

Boltzmann's equations come from Newton's laws, which are time-reversible. But his conclusion (E always decreases) is not reversible. How can reversible equations produce irreversible behavior? Boltzmann knows this is a problem. His answer: it's probabilistic, not deterministic. He doesn't fully resolve it here — that takes another 20 years.

Loschmidt's objection (1876) will make Boltzmann miserable. But he's right.

🔢 THE DISCRETE MACHINE

ε — Energy quantum

The size of each energy bin. Boltzmann chooses this number; it's not physical. He makes ε small, then smaller, then lets it go to zero. It's a mathematical crutch. Thirty years later, Planck keeps ε finite and accidentally invents quantum mechanics.

Boltzmann's ε → 0. Planck's ε stays. History is weird.

🔢 p — Number of energy levels

Boltzmann chops the continuous energy axis into p discrete chunks. He proves the theorem for p=3, then p=4, then hand-waves "and so on" to arbitrary p. Then he lets p → ∞. This is the first time anyone treated a continuous distribution as the limit of a discrete one.

Today we call this "discretization" and do it without thinking. Boltzmann invented it.

uₖ — Population of level k

The fraction of molecules that have energy k·ε. Always positive. Changes over time as molecules collide. At equilibrium, these form a geometric progression: u₂/u₁ = u₃/u₂ = u₄/u₃ = ...

This geometric progression becomes the exponential decay of the Maxwell distribution in the continuous limit.

√k — Why is there a square root?

This is the part that trips everyone up. Boltzmann isn't being fancy — this comes from converting velocity space to energy space. In 1D, v = √(2E/m), so dv = dE/√(2mE) ∝ dE/√E. When you discretize, dE becomes ε and √E becomes √(kε). The √k is the ghost of the velocity Jacobian. It's a phase-space volume factor.

Every time you see √k, whisper: "dv/dε".

log — Where does this come from?

Boltzmann imagines N total molecules, with N·uₖ in level k. The number of ways to arrange them is N! / Π (N·uₖ)!. Take the log, use Stirling's approximation (log N! ≈ N log N - N), divide by N, drop constants, and you get Σ uₖ log uₖ. That's why the logarithm appears. It's counting microstates.

This is the statistical definition of entropy, written here for the first time.

E — Boltzmann's H-function

E = Σ √k · uₖ · log uₖ. This is the discrete entropy (up to a sign and constants). Boltzmann proves dE/dt ≤ 0. This is the H-theorem, though he doesn't call it that yet. It's the first mathematical proof of irreversibility from reversible dynamics.

Modern notation: H = ∫ f log f dv. Boltzmann's E is actually -H (sign flip!).

E₁ — The fixed version

As p gets huge and ε tiny, E misbehaves — it wants to go to negative infinity. Boltzmann realizes he forgot a term: the constant from Stirling's approximation that he dropped earlier. He adds it back as -(3/2) log ε · Σ √k uₖ. This new quantity, E₁, stays finite in the continuous limit. E₁ is the real ancestor of ∫ f log f dv.

This is in the long footnote. Most readers skip it. Don't.

dE/dt — The proof

Boltzmann computes the time derivative of E, substitutes the collision equations, and gets a sum of terms that look like (A - B)·log(B/A). Since log(B/A) is positive when B > A and negative when B < A, each term is ≤ 0. Multiply by positive collision coefficients, sum them up, and dE/dt ≤ 0. That's the whole H-theorem.

The pattern (x-y)·log(y/x) ≤ 0 does all the work.

💥 COLLISIONS

Bᵢⱼᵏˡ — Collision coefficients

A molecule with energy iε collides with one with energy jε. After collision, they have energies kε and lε. Bᵢⱼᵏˡ is the rate at which this happens. It's always positive. Crucially, Bᵢⱼᵏˡ = Bₖₗⁱʲ — this is detailed balance, a consequence of time-reversibility. Boltzmann doesn't emphasize it, but the whole proof depends on it.

Detailed balance is why reversible equations can prove an irreversible theorem.

ψ — The continuous collision kernel

When ε → 0 and p → ∞, the discrete B coefficients become a continuous function ψ(x,x',ξ). It encodes the probability that a collision between molecules with energies x and x' produces a molecule with energy ξ. This is the collision kernel of the Boltzmann equation.

ψ is determined by the intermolecular potential. Boltzmann leaves it general.

❓ The mystery marks (u₂^{2?})

Those aren't typos. They're in the original 1872 paper. Boltzmann is working out the notation for collision terms on the fly, and he's not sure what indices to write. In one place he writes u₂^{2?} — literally with a question mark. We've preserved it. Even geniuses get confused writing papers.

Boltzmann, 1872: "Wait, is that u₂² or u₂^2? ... eh, I'll fix it later." He never did.

⚖️ CONSERVATION & EQUILIBRIUM

a — Total molecules

The sum Σ √k·uₖ (with weights!) is constant over time. This is conservation of particle number. Boltzmann writes it as u₁ + √2 u₂ + √3 u₃ + ... = a. The √k weights are from the velocity→energy Jacobian — they're already baked into uₖ.

This is why the Maxwell distribution has a √x factor: it's the ghost of dv.

b — Total energy

The sum Σ k·√k·uₖ = b/ε is constant. This is conservation of energy. Together with a, it uniquely determines the equilibrium distribution. Boltzmann solves for the ratio y = u₂/u₁ and shows there's exactly one positive solution.

Descartes' rule of signs makes its only appearance in Boltzmann's oeuvre.

y — The equilibrium ratio

At equilibrium, u₂ = y·u₁, u₃ = y²·u₁, u₄ = y³·u₁, etc. This geometric progression is Boltzmann's discrete version of the exponential decay in the Maxwell distribution. In the continuous limit, y = e^{-hε} and h ∝ 1/T.

y is determined by a and b. One positive root. Always.

📈 THE CONTINUOUS LIMIT

wₖ — The bridge to continuum

Boltzmann defines wₖ = √k·uₖ. Why? Because in the continuous limit, wₖ becomes f(x)dx, the number of molecules with energy between x and x+dx. The √k absorbs the Jacobian, making the transition clean: wₖ = C√x e^{-hx} dx.

This is the discrete → continuous handshake.

h — Inverse temperature

In the continuous limit, y = e^{-hε}. As ε → 0, y → 1, but the exponential survives: y^k = e^{-h·kε} → e^{-hx}. h is proportional to 1/kT. This is the first appearance of the exponential factor in the Maxwell distribution derived from first principles.

Boltzmann doesn't write k. That's Planck's later addition.

C — Normalization

The constant that makes the total number of molecules come out right. In the discrete case, it's buried in u₁. In the continuous limit, C = a · (h/π)^{1/2} or something similar — Boltzmann doesn't compute it explicitly here.

f(x) — The continuous distribution

The number of molecules with kinetic energy x. The continuous analog of uₖ. At equilibrium, f(x) = C√x e^{-hx}. This is the Maxwell distribution in energy space.

🧮 THE VARIATIONAL PROOF (FOOTNOTE)

λ, μ — Lagrange multipliers

In the long footnote, Boltzmann does something completely different: he treats E₁ as a function to minimize subject to the constraints (constant a and b). He adds λ·(constraint a) + μ·(constraint b) to E₁, sets derivatives to zero, and derives the same equilibrium condition (geometric progression). This is the first time anyone used variational methods in statistical mechanics. It's the birth of the maximum entropy principle — 75 years before Jaynes.

Boltzmann: dynamical proof AND variational proof. He really wanted to be sure.

📊 Maximum entropy principle

The footnote contains a buried gem: Boltzmann shows that the equilibrium distribution is the one that maximizes something (actually minimizes E₁, which is negative entropy) subject to constraints. This is the fundamental principle of statistical mechanics, fully formed in 1872. It was ignored for a century.

Jaynes (1957) made it famous. Boltzmann did it first.

⚠️ WHAT BOLTZMANN LEAVES UNSOLVED

🔄 The reversibility objection

Boltzmann's equations are time-reversible. His conclusion (dE/dt ≤ 0) is not. How can this be? Boltzmann doesn't address this here — he's focused on the mathematics. Four years later, Loschmidt will point this out, and Boltzmann will have to add probability to his theory. It's not in this paper.

The 1872 paper is purely dynamical. The probabilistic interpretation comes later.

💨 The assumption

Boltzmann assumes collisions are uncorrelated — the number of collisions between molecules in states i and j is proportional to u_i·u_j. This is the "molecular chaos" assumption. He doesn't justify it; he just uses it. It's the hidden irreversible step.

Modern term: Stosszahlansatz. Boltzmann just calls it "equations (35)".

🧪 Other solutions

At the very end, Boltzmann admits there are infinitely many stationary solutions to his continuous equation besides Maxwell's. They're useless because they're negative or imaginary for some x. But they exist. This is a quiet confession: uniqueness requires extra physical conditions (positivity).