Notes on automatic differentiation#

Preliminaries#

Definition of normalised derivative:

(1)#x[n](t)=1n!x(n)(t).

General Leibniz rule: given a(t)=b(t)c(t), then

(2)#a[n](t)=j=0nb[nj](t)c[j](t).

Basic arithmetic#

Addition and subtraction#

Given a(t)=b(t)±c(t), trivially

(3)#a[n](t)=b[n](t)±c[n](t).

Multiplication#

Given a(t)=b(t)c(t), the derivative a[n](t) is given directly by the application of the general Leibniz rule (2).

Division#

Given a(t)=b(t)c(t), we can write

(4)#a(t)c(t)=b(t).

We can now apply the normalised derivative of order n to both sides, use (2) and re-arrange to obtain:

(5)#a[n](t)=1c[0](t)[b[n](t)j=1na[nj](t)c[j](t)].

Squaring#

Given a(t)=b(t)2, the computation of a[n](t) is a special case of (2) in which we take advantage of the summation’s symmetry in order to halve the computational complexity:

(6)#a[n](t)={2j=0n21b[nj](t)b[j](t)+(b[n2](t))2 if n is even,2j=0n12b[nj](t)b[j](t) if n is odd.

Square root#

Given a(t)=b(t), we can write

(7)#a(t)2=b(t).

We can apply the normalised derivative of order n to both sides, and, with the help of (6), we obtain:

(8)#b[n](t)={2j=0n21a[nj](t)a[j](t)+(a[n2](t))2 if n is even,2j=0n12a[nj](t)a[j](t) if n is odd.

We can then isolate a[n](t) to obtain:

(9)#a[n](t)={12a[0](t)[b[n](t)2j=1n21a[nj](t)a[j](t)(a[n2](t))2] if n is even,12a[0](t)[b[n](t)2j=0n12a[nj](t)a[j](t)] if n is odd.

Exponentiation#

Given a(t)=b(t)α, with α0, we have

(10)#a(t)=αb(t)α1b(t).

By multiplying both sides by b(t) we obtain

(11)#b(t)a(t)=b(t)αb(t)α1b(t)=αb(t)a(t).

We can now apply the normalised derivative of order n1 to both sides, use (2) and re-arrange to obtain, for n>0:

(12)#a[n](t)=1nb[0](t)j=0n1[nαj(α+1)]b[nj](t)a[j](t).

Exponentials#

Natural exponential#

Given a(t)=eb(t), we have

(13)#a(t)=eb(t)b(t)=a(t)b(t).

We can now apply the normalised derivative of order n1 to both sides, use (1) and (2) and obtain, for n>0:

(14)#a[n](t)=1nj=1nja[nj](t)b[j](t).

Standard logistic function#

Given a(t)=sigb(t), where sig(x) is the standard logistic function

(15)#sig(x)=11+ex,

we have

(16)#a(t)=sigb(t)[1sigb(t)]b(t)=a(t)[1a(t)]b(t),

which, after the introduction of the auxiliary function

(17)#c(t)=a2(t),

becomes

(18)#a(t)=[a(t)c(t)]b(t).

After applying the normalised derivative of order n1 to both sides, we can use (1), (2) and (3) to obtain, for n>0:

(19)#a[n](t)=1nj=1nj[a[nj](t)c[nj](t)]b[j](t).

Logarithms#

Natural logarithm#

Given a(t)=logb(t), we have

(20)#a(t)=b(t)b(t),

or, equivalently,

(21)#b(t)a(t)=b(t).

We can now apply the normalised derivative of order n1 to both sides, use (1) and (2) and re-arrange to obtain, for n>0:

(22)#a[n](t)=1nb[0](t)[nb[n](t)j=1n1jb[nj](t)a[j](t)].

Trigonometric functions#

Tangent#

Given a(t)=tanb(t), we have

(23)#a(t)=[tan2b(t)+1]b(t)=a2(t)b(t)+b(t),

which, after the introduction of the auxiliary function

(24)#c(t)=a2(t),

becomes

(25)#a(t)=c(t)b(t)+b(t).

After applying the normalised derivative of order n1 to both sides, we can use (1), (2) and (3) to obtain, for n>0:

(26)#a[n](t)=1nj=1njc[nj](t)b[j](t)+b[n](t).

Inverse trigonometric functions#

Inverse sine#

Given a(t)=arcsinb(t), we have

(27)#a(t)=b(t)1b2(t),

or, equivalently,

(28)#a(t)1b2(t)=b(t).

We introduce the auxiliary function

(29)#c(t)=1b2(t),

so that (28) can be rewritten as

(30)#a(t)c(t)=b(t).

Applying the normalised derivative of order n1 to both sides yields, via (1):

(31)#[a(t)c(t)][n1]=nb[n](t).

We can now apply the general Leibniz rule (2) to the left-hand side and re-arrange the terms to obtain, for n>0:

(32)#a[n](t)=1nc[0](t)[nb[n](t)j=1n1jc[nj](t)a[j](t)].

Inverse cosine#

The derivation is identical to the inverse sine, apart from a sign change. Given a(t)=arccosb(t), the final result is, for n>0:

(33)#a[n](t)=1nc[0](t)[nb[n](t)+j=1n1jc[nj](t)a[j](t)],

with c(t) defined as:

(34)#c(t)=1b2(t).

Inverse tangent#

Given a(t)=arctanb(t), we have

(35)#a(t)=b(t)1+b2(t),

or, equivalently,

(36)#a(t)[1+b2(t)]=b(t).

We introduce the auxiliary function

(37)#c(t)=b2(t),

so that (36) can be rewritten as

(38)#a(t)+a(t)c(t)=b(t).

Applying the normalised derivative of order n1 to both sides yields, via (1) and (3):

(39)#na[n](t)+[a(t)c(t)][n1]=nb[n](t).

With the help of the general Leibniz rule (2), after re-arranging we obtain, for n>0:

(40)#a[n](t)=1n[c[0](t)+1][nb[n](t)j=1n1jc[nj](t)a[j](t)].

Two-argument inverse tangent#

Given a(t)=arctan2(b(t),c(t)), we have

(41)#a(t)=c(t)b(t)b(t)c(t)b2(t)+c2(t).

After the introduction of the auxiliary function

(42)#d(t)=b2(t)+c2(t),

(41) can be rewritten as

(43)#d(t)a(t)=c(t)b(t)b(t)c(t).

We can now apply the normalised derivative of order n1 to both sides, and, via (2), obtain, for n>0:

(44)#a[n](t)=1nd[0](t)[j=1n1j()n(c[0](t)b[n](t)b[0](t)c[n](t))+j=1n1j(c[nj](t)b[j](t)b[nj](t)c[j](t)d[nj](t)a[j](t))].

Hyperbolic functions#

Hyperbolic sine#

Given a(t)=sinhb(t), we have

(45)#a(t)=b(t)coshb(t).

We introduce the auxiliary function

(46)#c(t)=coshb(t),

so that (45) can be rewritten as

(47)#a(t)=c(t)b(t).

We can now apply the normalised derivative of order n1 to both sides, and, via (2), obtain, for n>0:

(48)#a[n](t)=1nj=1njc[nj](t)b[j](t).

Hyperbolic cosine#

Given a(t)=coshb(t), the process of deriving a[n](t) is identical to the hyperbolic sine. After the definition of the auxiliary function

(49)#s(t)=sinhb(t),

the final result, for n>0, is:

(50)#a[n](t)=1nj=1njs[nj](t)b[j](t).

Hyperbolic tangent#

Given a(t)=tanhb(t), the process of deriving a[n](t) is identical to the tangent, apart from a sign change. After the definition of the auxiliary function

(51)#c(t)=a2(t),

the final result, for n>0, is:

(52)#a[n](t)=b[n](t)1nj=1njc[nj](t)b[j](t).

Inverse hyperbolic functions#

Inverse hyperbolic sine#

Given a(t)=arsinhb(t), the process of deriving a[n](t) is identical to the inverse sine, apart from a sign change. After the definition of the auxiliary function

(53)#c(t)=1+b2(t),

the final result, for n>0, is:

(54)#a[n](t)=1nc[0](t)[nb[n](t)j=1n1jc[nj](t)a[j](t)].

Inverse hyperbolic cosine#

Given a(t)=arcoshb(t), the process of deriving a[n](t) is identical to the inverse hyperbolic sine, apart from a sign change. After the definition of the auxiliary function

(55)#c(t)=b2(t)1,

the final result, for n>0, is:

(56)#a[n](t)=1nc[0](t)[nb[n](t)j=1n1jc[nj](t)a[j](t)].

Inverse hyperbolic tangent#

Given a(t)=artanhb(t), the process of deriving a[n](t) is identical to the inverse tangent, apart from a sign change. After the definition of the auxiliary function

(57)#c(t)=b2(t),

the final result, for n>0, is:

(58)#a[n](t)=1n[1c[0](t)][nb[n](t)+j=1n1jc[nj](t)a[j](t)].

Special functions#

Error function#

Given a(t)=erfb(t), we have

(59)#a(t)=2πexp[b2(t)]b(t),

which, after the introduction of the auxiliary function

(60)#c(t)=exp[b2(t)],

becomes

(61)#a(t)=2πc(t)b(t).

After applying the normalised derivative of order n1 to both sides, we can use (1) and (2) to obtain, for n>0:

(62)#a[n](t)=1n2πj=1njc[nj](t)b[j](t).

Celestial mechanics#

Kepler’s eccentric anomaly#

The eccentric anomaly is the bivariate function E=E(e,M) implicitly defined by the trascendental equation

(63)#M=EesinE,

with e[0,1). Given a(t)=E(e(t),M(t)), we have

(64)#a(t)=Eee(t)+EMM(t),

where the partial derivatives are

(65)#{Ee=sinE1ecosE,EM=11ecosE.

Expanding the partial derivatives yields

(66)#a(t)=e(t)sina(t)+M(t)1e(t)cosa(t),

or, equivalently,

(67)#a(t)a(t)e(t)cosa(t)=e(t)sina(t)+M(t).

We can now introduce the auxiliary functions

(68)#{c(t)=e(t)cosa(t),d(t)=sina(t),

so that (67) can be rewritten as

(69)#a(t)a(t)c(t)=e(t)d(t)+M(t).

After applying the normalised derivative of order n1 to both sides, we can use (1) and (2) and re-arrange to obtain, for n>0:

(70)#a[n](t)=1n(1c[0](t))[n(e[n](t)d[0](t)+M[n](t))+j=1n1j(c[nj](t)a[j](t)+d[nj](t)e[j](t))].

Eccentric longitude#

The eccentric longitude is the trivariate function F=F(h,k,λ) implicitly defined by the trascendental equation

(71)#λ=F+hcosFksinF,

with h2+k2<1. Given a(t)=F(h(t),k(t),λ(t)), we have

(72)#a(t)=k(t)sina(t)h(t)cosa(t)+λ(t)1h(t)sina(t)k(t)cosa(t).

After the introduction of the auxiliary functions

(73)#{c(t)=h(t)sina(t),d(t)=k(t)cosa(t),e(t)=sina(t),f(t)=cosa(t),

we can then proceed in the same way as explained for the eccentric anomaly. The final result, for n>0, is:

(74)#a[n](t)=1n(1c[0](t)d[0](t)){n(k[n](t)e[0](t)h[n](t)f[0](t)+λ[n](t))+j=1n1j[a[j](t)(c[nj](t)+d[nj](t))+k[j](t)e[nj](t)h[j](t)f[nj](t)]}.

Time functions#

Time polynomials#

Given the time polynomial of order n

(75)#pn(t)=i=0naiti,

its derivative of order j is

(76)#(pn(t))(j)=i=jn(i)jaitij,

where (i)j is the falling factorial. The normalised derivative of order j is

(77)#(pn(t))[j]=1j!i=jn(i)jaitij,

which, with the help of elementary relations involving factorials and after re-arranging the indices, can be rewritten as

(78)#(pn(t))[j]=i=0nj(i+jj)ai+jti.