09.05.2023 Views

pdfcoffee

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Math Behind Deep Learning

(Note that for a fixed ∂∂pp ii all the terms in the sum are constant except the chosen one.)

Therefore, we have:

− ∂∂cc ii ln pp ii

− ∂∂(1 − cc ii) ln(1 − pp ii )

= − cc ii

− (1 − cc ii) ∂∂(1 − pp ii )

∂∂pp ii ∂∂pp ii

pp ii (1 − pp ii ) ∂∂pp ii

(Applying the partial derivative to the sum and considering that ln ′ (xx) = 1 xx .)

Therefore, we have:

∂∂∂∂

∂∂pp ii

= − cc ii

pp ii

+ (1 − cc ii)

(1 − pp ii )

Now let's compute the other part

eexx ii

σσ(xx jj ) =

∑ ii ee xx ii

.

The derivative is:

∂∂σσ(xx jj )

and

∂∂pp ii

∂∂ssssssssss ii

where p i

is the softmax function defined as

∂∂xx kk

= σσ(xx jj )(1 − σσ(xx jj ))

if jj = kk

∂∂σσ(xx jj )

∂∂xx kk

= −σσ(ee xx ii)σσ(ee xx kk) if jj ≠ kk .

1 for jj = kk,

Using the Kronecker delta δδ iiii = { we have:

0 ow

∂∂σσ(xx jj )

∂∂xx kk

= σσ(xx jj )(δδ iiii − σσ(xx jj ))

Therefore, considering that we are computing the partial derivative, all the

components are zeroed with the exception of only one, and we have:

∂∂pp ii

∂∂ssssssssss ii

= pp ii (1 − pp ii )

[ 562 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!