Notes on automatic differentiation#

Preliminaries#

Definition of normalised derivative:

(1)#\[x^{\left[ n\right]}\left( t \right) = \frac{1}{n!} x^{\left( n\right)}\left( t \right).\]

General Leibniz rule: given $a\left( t \right) = b\left( t \right) c\left( t \right)$, then

(2)#\[a^{\left[ n\right]}\left( t \right) = \sum_{j=0}^n b^{\left[ n - j\right]}\left( t \right) c^{\left[ j\right]}\left( t \right).\]

Basic arithmetic#

Addition and subtraction#

Given $a\left( t \right) = b\left( t \right) \pm c\left( t \right)$, trivially

(3)#\[a^{\left[ n \right]}\left( t \right) = b^{\left[ n \right]}\left( t \right) \pm c^{\left[ n \right]}\left( t \right).\]

Multiplication#

Given $a\left( t \right) = b\left( t \right) c\left( t \right)$, the derivative $a^{\left[ n \right]}\left( t \right)$ is given directly by the application of the general Leibniz rule (2).

Division#

Given $a\left( t \right) = \frac{b\left( t \right)}{c\left( t \right)}$, we can write

(4)#\[a\left( t \right) c\left( t \right) = b\left( t \right).\]

We can now apply the normalised derivative of order $n$ to both sides, use (2) and re-arrange to obtain:

(5)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{c^{\left[ 0 \right]}\left( t \right)}\left[ b^{\left[ n \right]}\left( t \right) - \sum_{j=1}^n a^{\left[ n - j \right]}\left( t \right) c^{\left[ j \right]}\left( t \right)\right].\]

Squaring#

Given $a\left( t \right) = b\left( t \right)^2$, the computation of $a^{\left[ n \right]}\left( t \right)$ is a special case of (2) in which we take advantage of the summation’s symmetry in order to halve the computational complexity:

(6)#\[\begin{split}a^{\left[ n \right]}\left( t \right) = \begin{cases} 2\sum_{j=0}^{\frac{n}{2}-1} b^{\left[ n - j \right]}\left( t \right) b^{\left[ j \right]}\left( t \right) + \left( b^{\left[ \frac{n}{2} \right]}\left( t \right) \right)^2 \mbox{ if $n$ is even}, \\ 2\sum_{j=0}^{\frac{n-1}{2}} b^{\left[ n - j \right]}\left( t \right) b^{\left[ j \right]}\left( t \right) \mbox{ if $n$ is odd}. \end{cases}\end{split}\]

Square root#

Given $a\left( t \right) =\sqrt{b\left( t \right)}$, we can write

(7)#\[a\left( t \right)^2 = b\left( t \right).\]

We can apply the normalised derivative of order $n$ to both sides, and, with the help of (6), we obtain:

(8)#\[\begin{split}b^{\left[ n \right]}\left( t \right) = \begin{cases} 2\sum_{j=0}^{\frac{n}{2}-1} a^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) + \left( a^{\left[ \frac{n}{2} \right]}\left( t \right) \right)^2 \mbox{ if $n$ is even}, \\ 2\sum_{j=0}^{\frac{n-1}{2}} a^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) \mbox{ if $n$ is odd}. \end{cases}\end{split}\]

We can then isolate $a^{\left[ n \right]}\left( t \right)$ to obtain:

(9)#\[\begin{split}a^{\left[ n \right]}\left( t \right) = \begin{cases} \frac{1}{2a^{\left[ 0 \right]}\left( t \right)} \left[ b^{\left[ n \right]}\left( t \right) - 2\sum_{j=1}^{\frac{n}{2}-1} a^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) - \left( a^{\left[ \frac{n}{2} \right]}\left( t \right) \right)^2 \right] \mbox{ if $n$ is even}, \\ \frac{1}{2a^{\left[ 0 \right]}\left( t \right)} \left[ b^{\left[ n \right]}\left( t \right) - 2\sum_{j=1}^{\frac{n-1}{2}} a^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) \right] \mbox{ if $n$ is odd}. \end{cases}\end{split}\]

Exponentiation#

Given $a\left( t \right) = b\left( t \right)^\alpha$, with $\alpha \neq 0$, we have

(10)#\[a^\prime\left( t \right) = \alpha b\left( t \right)^{\alpha - 1} b^\prime\left( t \right).\]

By multiplying both sides by $b\left( t \right)$ we obtain

(11)#\[\begin{split}\begin{aligned} b\left( t \right) a^\prime\left( t \right) & = b\left( t \right) \alpha b\left( t \right)^{\alpha - 1} b^\prime\left( t \right) \\ & = \alpha b^\prime\left( t \right) a\left( t \right). \end{aligned}\end{split}\]

We can now apply the normalised derivative of order $n-1$ to both sides, use (2) and re-arrange to obtain, for $n > 0$:

(12)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{n b^{\left[ 0 \right]}\left( t \right)} \sum_{j=0}^{n-1} \left[ n\alpha - j \left( \alpha + 1 \right) \right] b^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right).\]

Exponentials#

Natural exponential#

Given $a\left( t \right) = e^{b\left( t \right)}$, we have

(13)#\[a^\prime\left( t \right) = e^{b\left( t \right)}b^\prime\left( t \right) = a\left( t \right) b^\prime\left( t \right).\]

We can now apply the normalised derivative of order $n-1$ to both sides, use (1) and (2) and obtain, for $n > 0$:

(14)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{n} \sum_{j=1}^{n} j a^{\left[ n - j \right]}\left( t \right) b^{\left[ j \right]}\left( t \right).\]

Standard logistic function#

Given $a\left( t \right) = \operatorname{sig} {b\left( t \right)}$, where $\operatorname{sig}\left( x \right)$ is the standard logistic function

(15)#\[\operatorname{sig} \left( x \right) = \frac{1}{1+e^{-x}},\]

we have

(16)#\[a^\prime\left( t \right) = \operatorname{sig}{b\left( t \right)} \left[1 - \operatorname{sig}{b\left( t \right)} \right] b^\prime\left( t \right) = a\left( t \right) \left[1 - a\left( t \right) \right] b^\prime\left( t \right),\]

which, after the introduction of the auxiliary function

(17)#\[c\left( t \right) = a^2\left( t \right) ,\]

becomes

(18)#\[a^\prime\left( t \right) = \left[ a\left( t \right) - c\left( t \right) \right] b^\prime\left( t \right).\]

After applying the normalised derivative of order $n-1$ to both sides, we can use (1), (2) and (3) to obtain, for $n > 0$:

(19)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{n}\sum_{j=1}^{n} j \left[ a^{\left[ n - j \right]} \left( t \right)- c^{\left[ n - j \right]}\left( t \right)\right] b^{\left[ j \right]}\left( t \right).\]

Logarithms#

Natural logarithm#

Given $a\left( t \right) = \log b\left( t \right)$, we have

(20)#\[a^\prime\left( t \right) = \frac{b^\prime\left( t \right)}{b\left( t \right)},\]

or, equivalently,

(21)#\[b\left( t \right) a^\prime\left( t \right) = b^\prime\left( t \right).\]

We can now apply the normalised derivative of order $n-1$ to both sides, use (1) and (2) and re-arrange to obtain, for $n > 0$:

(22)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{n b^{\left[ 0 \right]}\left( t \right)} \left[ n b^{\left[ n \right]}\left( t \right) - \sum_{j=1}^{n-1} j b^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) \right].\]

Trigonometric functions#

Tangent#

Given $a\left( t \right) = \tan b\left( t \right)$, we have

(23)#\[a^\prime\left( t \right) = \left[ \tan^2 b\left( t \right) + 1 \right] b^\prime\left( t \right) = a^2\left( t \right)b^\prime\left( t \right) + b^\prime\left( t \right),\]

which, after the introduction of the auxiliary function

(24)#\[c\left( t \right) = a^2\left( t \right) ,\]

becomes

(25)#\[a^\prime\left( t \right) = c\left( t \right) b^\prime\left( t \right) + b^\prime\left( t \right).\]

After applying the normalised derivative of order $n-1$ to both sides, we can use (1), (2) and (3) to obtain, for $n > 0$:

(26)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{n}\sum_{j=1}^{n} j c^{\left[ n - j \right]}\left( t \right) b^{\left[ j \right]}\left( t \right) + b^{\left[ n \right]}\left( t \right).\]

Inverse trigonometric functions#

Inverse sine#

Given $a\left( t \right) = \arcsin b\left( t \right)$, we have

(27)#\[a^\prime\left( t \right) = \frac{b^\prime\left( t \right)}{\sqrt{1 - b^2\left( t \right) }},\]

or, equivalently,

(28)#\[a^\prime\left( t \right) \sqrt{1 - b^2\left( t \right) } = b^\prime\left( t \right).\]

We introduce the auxiliary function

(29)#\[c\left( t \right) = \sqrt{1 - b^2\left( t \right) },\]

so that (28) can be rewritten as

(30)#\[a^\prime\left( t \right) c\left( t \right) = b^\prime\left( t \right).\]

Applying the normalised derivative of order $n-1$ to both sides yields, via (1):

(31)#\[\left[a^\prime\left( t \right) c\left( t \right)\right]^{\left[ n - 1 \right]} = n b^{\left[ n \right]} \left( t \right).\]

We can now apply the general Leibniz rule (2) to the left-hand side and re-arrange the terms to obtain, for $n > 0$:

(32)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{n c^{\left[ 0 \right]}\left( t \right)}\left[ n b^{\left[ n \right]}\left( t \right) - \sum_{j=1}^{n-1} j c^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) \right].\]

Inverse cosine#

The derivation is identical to the inverse sine, apart from a sign change. Given $a\left( t \right) = \arccos b\left( t \right)$, the final result is, for $n > 0$:

(33)#\[a^{\left[ n \right]}\left( t \right) = -\frac{1}{n c^{\left[ 0 \right]}\left( t \right)}\left[ n b^{\left[ n \right]}\left( t \right) + \sum_{j=1}^{n-1} j c^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) \right],\]

with $c\left( t \right)$ defined as:

(34)#\[c\left( t \right) = \sqrt{1 - b^2\left( t \right) }.\]

Inverse tangent#

Given $a\left( t \right) = \arctan b\left( t \right)$, we have

(35)#\[a^\prime\left( t \right) = \frac{b^\prime\left( t \right)}{1 + b^2\left( t \right) },\]

or, equivalently,

(36)#\[a^\prime\left( t \right) \left[1 + b^2\left( t \right) \right] = b^\prime\left( t \right).\]

We introduce the auxiliary function

(37)#\[c\left( t \right) = b^2\left( t \right),\]

so that (36) can be rewritten as

(38)#\[a^\prime\left( t \right) + a^\prime\left( t \right) c\left( t \right) = b^\prime\left( t \right).\]

Applying the normalised derivative of order $n-1$ to both sides yields, via (1) and (3):

(39)#\[n a^{\left[ n \right]} \left( t \right) + \left[a^\prime\left( t \right) c\left( t \right)\right]^{\left[ n - 1 \right]} = n b^{\left[ n \right]} \left( t \right).\]

With the help of the general Leibniz rule (2), after re-arranging we obtain, for $n > 0$:

(40)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{n \left[ c^{\left[ 0 \right]}\left( t \right) + 1 \right]}\left[ n b^{\left[ n \right]}\left( t \right) - \sum_{j=1}^{n-1} j c^{\left[ n - j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) \right].\]

Two-argument inverse tangent#

Given $a\left( t \right) = \operatorname{arctan2}\left( b\left( t \right), c\left( t \right) \right)$, we have

(41)#\[a^\prime\left( t \right) = \frac{c\left( t \right) b^\prime\left( t \right)-b\left( t \right)c^\prime \left( t \right)} {b^2\left( t \right)+c^2\left( t \right)}.\]

After the introduction of the auxiliary function

(42)#\[d\left( t \right) = b^2\left( t \right)+c^2\left( t \right),\]

(41) can be rewritten as

(43)#\[d\left( t \right)a^\prime\left( t \right) = c\left( t \right) b^\prime\left( t \right)-b\left( t \right)c^\prime \left( t \right).\]

We can now apply the normalised derivative of order $n-1$ to both sides, and, via (2), obtain, for $n > 0$:

(44)#\[\begin{split}\begin{aligned} a^{\left[ n \right]}\left( t \right) &= \frac{1}{nd^{\left[ 0 \right]}\left( t \right)}\left[\vphantom{\sum_{j=1}^{n-1}j\left( \right)} n\left( c^{\left[ 0 \right]}\left( t \right) b^{\left[ n \right]}\left( t \right) - b^{\left[ 0 \right]}\left( t \right) c^{\left[ n \right]}\left( t \right)\right) \right.\\ &\left. + \sum_{j=1}^{n-1}j\left( c^{\left[ n-j \right]}\left( t \right) b^{\left[ j \right]}\left( t \right) - b^{\left[ n-j \right]}\left( t \right) c^{\left[ j \right]}\left( t \right) - d^{\left[ n-j \right]}\left( t \right) a^{\left[ j \right]}\left( t \right) \right) \right]. \end{aligned}\end{split}\]

Hyperbolic functions#

Hyperbolic sine#

Given $a\left( t \right) = \sinh b\left( t \right)$, we have

(45)#\[a^\prime\left( t \right) = b^\prime\left( t \right) \cosh b\left( t \right).\]

We introduce the auxiliary function

(46)#\[c\left( t \right) = \cosh b\left( t \right),\]

so that (45) can be rewritten as

(47)#\[a^\prime\left( t \right) = c\left( t \right) b^\prime\left( t \right).\]

We can now apply the normalised derivative of order $n-1$ to both sides, and, via (2), obtain, for $n > 0$:

(48)#\[a^{\left[ n \right]}\left( t \right) = \frac{1}{n} \sum_{j=1}^{n} j c^{\left[ n - j \right]}\left( t \right) b^{\left[ j \right]}\left( t \right).\]

Hyperbolic cosine#

Given $a\left( t \right) = \cosh b\left( t \right)$, the process of deriving $a^{\left[ n \right]}\left( t \right)$ is identical to the hyperbolic sine. After the definition of the auxiliary function

(49)#\[s\left( t \right) = \sinh b\left( t \right),\]