31.12.2014 Views

Neuron Networks and Fourier Series - CUMC

Neuron Networks and Fourier Series - CUMC

Neuron Networks and Fourier Series - CUMC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Neuron</strong> <strong>Networks</strong> <strong>and</strong> <strong>Fourier</strong> <strong>Series</strong><br />

Gene Foxwell, Carleton University, gene.foxwell@gmail.com<br />

July 30 2004<br />

1 Abstract<br />

In this paper we will give a brief abstraction of the idea of Artificial <strong>Neuron</strong><br />

<strong>Networks</strong> (ANN). Once this is done we will show how to generate the <strong>Fourier</strong><br />

Sine <strong>Series</strong> for curve which fits a set of given data points using an ANN. Finally,<br />

we shall end this paper with a proof that the <strong>Fourier</strong> Sine <strong>Series</strong> of any continous<br />

function can be approxomated by an ANN combined with the method shown.<br />

2 Artificial <strong>Neuron</strong>s<br />

In order to justify the abstraction used, we shall begin by examining a few<br />

important properties of Human <strong>Neuron</strong>s [1].<br />

Useful Properties of Human <strong>Neuron</strong>s<br />

1. <strong>Neuron</strong>s communicate thru excitory <strong>and</strong> inhibitory connections (or synapses)<br />

2. These synapses can be electrical or chemical. The chemical synapses are<br />

known to release Neurotransmitters which can change the properties of<br />

the synapse. i.e. make it more or less receptive to electrical stimulus.<br />

3. These signals are combined at the soma, changing the electrical properties<br />

of the <strong>Neuron</strong>(at rest the neuron has an electic potential created by<br />

pumping ions in <strong>and</strong> out thru the cell membrane) <strong>and</strong> are used by the cell<br />

to determine whether to send a signal out down to the dendrites.<br />

We can use the above facts above <strong>Neuron</strong>s can be used to create an abstract<br />

structure which has shown to be useful for many computational <strong>and</strong> theoretical<br />

purposes. Our abstraction shall be as follows:<br />

Abstraction of the Human <strong>Neuron</strong><br />

1


1. To each connection we assign a value w i ɛ R (which we shall refer to as the<br />

weight), where w i will represent how the i’th incomming synapse changes<br />

it’s input. So if x ɛ R is the input to the i’th connection, the synapse will<br />

change this signal to w i x.<br />

2. We then combine these signals using a simple summation. Thus, if ⃗x ɛ<br />

R n is the collection of inputs to a <strong>Neuron</strong>, <strong>and</strong> ⃗w ɛ R n is the collection<br />

of weights then we define the following to represent this property of the<br />

<strong>Neuron</strong> :<br />

s(⃗x) = ⃗x · ⃗w (1)<br />

3. We represent how the neuron responds to it’s signal by a function f :<br />

R → R Putting the properties we have so far together, we can represent<br />

a <strong>Neuron</strong> mathematically by a function N : R n → R which we define<br />

below:<br />

N(⃗x) = f(s(⃗x)) (2)<br />

Definition Throughout the remainder of this paper, by Artificial <strong>Neuron</strong> Network<br />

(ANN) we shall mean a directed graph G whose vertices shall represent<br />

abstract <strong>Neuron</strong>s as described above <strong>and</strong> whose edges shall represent the direction<br />

in which a signal is passed from one vertex to another.<br />

Definition The input layer of an ANN is the collection of <strong>Neuron</strong>s which can<br />

obtain a signal from a source outside of the ANN. i.e. they recieve input from<br />

the user.<br />

Definition The output layer of an ANN is the collection of <strong>Neuron</strong>s which are<br />

considered as part of the final result of the ANN. For example if the ANN was<br />

designed to predict the future direction of a fly, the output layer would the set<br />

of <strong>Neuron</strong>s which represented the final components of that direction.<br />

All ANN’s in this paper will be assumed to have no loops. Hence they can<br />

evaluated by simply evaluating the input layers first <strong>and</strong> propagating the signal<br />

from the input layer according to the structure of the graph until the output<br />

layer is reached.<br />

3 Representing Functions<br />

Consider the problem of scientist who has collected a sample set of data. Furthermore,<br />

let us suppose that the data was collected in such a way that the<br />

scientist would like to have an anaylitic curve which fits the data points. As<br />

with any measurement, the scientist can only be sure of the value of the point<br />

within some error tolerance ε. i.e. we would like a C 1 function such that for<br />

2


each of our data points (x ∗ , y ∗ ) we have ‖q(x ∗ )−y ∗ ‖ < ε for some predetermined<br />

ε. We shall show that an ANN can be used to solve this problem.<br />

To do this, we shall begin by considering a few methods via which the function<br />

would traditionaly be represented.<br />

1. Taylor <strong>Series</strong>:<br />

q(x) =<br />

∞∑<br />

n=1<br />

q (n) (x)<br />

x n (3)<br />

n!<br />

The problem with basing a solution to the problem given in this section<br />

based on this representation is clear. Knowledge of the function is explicitly<br />

required in order to calculate the coefficients of the series. As we do<br />

not have any information other then a few points, this does not offer a<br />

very intuitive way to look at the problem.<br />

2. Power <strong>Series</strong><br />

q(x) =<br />

∞∑<br />

a n x n (4)<br />

n=1<br />

In contrast to the situation with Taylor <strong>Series</strong>, knowledge of the function<br />

itself is not explicitly required in order to calculate the coeffecients of the<br />

series. Our problem with this representation is more subtle. Suppose that<br />

our imaginary scientist was required to deal directly with numbers in data<br />

set which were very large, 3.0 × 10 6 for example, then as n becomes large<br />

it is easy to see that the calculations involved in evaluating the resulting<br />

function will become unwieldly <strong>and</strong> subject to computer rounding errors.<br />

It is concievable that a clever programmer could get around this problem<br />

with some fancy code, but rather then place the burden of fixing this<br />

problem on the person implementing the solution, it would be prefferred<br />

if we could avoid the problem as much as possible. With this idea in mind<br />

we turn to our next representation.<br />

3. <strong>Fourier</strong> Sine <strong>Series</strong><br />

q(x) =<br />

∞∑<br />

n=1<br />

a n sin( nπx<br />

L ) (5)<br />

We will notice that our scientist, having not been endowed with any supernatural<br />

powers, can only have collected a finite amount of data. Provided<br />

the scientist has indexed this data using positive indices, we can safely<br />

assume that the indices of the data points all lie in the range [0, L]. Furthermore,<br />

as ‖sin(x)‖ < 1 ∀x ɛ R, this choice of series allows us to keep<br />

the size of the numbers being returned by the computer with a reasonable<br />

size.<br />

3


Our choice to represent the function by an infinite series was made because it<br />

actually suggests an intuitive way to solve the problem using ANN’s. To see<br />

why, recall that for a tolerance ε > 0 we know that ∃N ɛ Z + such that ∀k > N<br />

<strong>and</strong> ∀x ɛ [0, L] we have ‖q(x) − ∑ k<br />

n=1 a nsin( nπx<br />

L<br />

)‖ < ε.<br />

So our solution would be to build an ANN whose graph consisted of two<br />

layers, an input layer <strong>and</strong> an output layer. The output layer will be assumed to<br />

have one <strong>Neuron</strong> N 0 in it whose representation shall be the identity function.<br />

i.e. N o (⃗x) = s(⃗x). By giving the input layer k neurons, <strong>and</strong> then giving the<br />

k’th neuron the representation N k (x) = sin( kπs(x)<br />

L<br />

) we can reproduce the series<br />

approxomation by choosing suitable weights. The problem now reduces to one<br />

of finding the weights.<br />

4 Method Of Gradient Descent<br />

Assume that we begin this task by using the easiest possible method to implement<br />

- we guess <strong>and</strong> trust that luck will make our guess the correct one. How far<br />

off is this guess We will define an error metric in the following manner. First,<br />

recall from the data of the problem that we have a set of data points (x, y) to<br />

begin with. Now, A be the collection of all pairs (x, y ∗ ) such that x comes from<br />

the input of the original data set <strong>and</strong> y ∗ is the result of evaluating our ANN<br />

with the input x given to each of our input <strong>Neuron</strong>s. Assuming that we have<br />

N inputs, we can evaluate the error using the metric below (which works out<br />

to be nothing more then the average error).<br />

E(y, y ∗ ) = 1 N<br />

N∑<br />

(y i − yi ∗ ) 2 (6)<br />

At this point, we shall recall the remaining fact about Human <strong>Neuron</strong>s which<br />

we have not yet explicity built into our model, that is the fact the synapses<br />

can communicate - changing thier effect on the signal that is recieved. As we<br />

wish for the system to move closer to a correct answer (as much as this is<br />

possible given the data) it is suggested that we use a method which will move<br />

the weights in such a way as to be closer to the true answer. This is where<br />

the Gradient Descent Algorithm comes in - a common algorithm found in the<br />

training of ANN’s in computer science. To use it, we should first calculate how<br />

the error changes if we change one of the input weights. (Note: In the following<br />

calculation ⃗x = (x 1 , ..., x N ) is the vector of results from the input <strong>Neuron</strong>s).<br />

i=1<br />

4


E(y, y ∗ ) = 1 N<br />

N∑<br />

(y i − N o (⃗x)) 2<br />

i=1<br />

∂E<br />

= ∂E ∂N o<br />

∂w i ∂N o ∂w i<br />

∂E<br />

∂N o<br />

= (−1)(y i − N o (⃗x))<br />

∂N o<br />

∂w i<br />

= x i<br />

Combining all this together we get,<br />

∂E<br />

∂w i<br />

= (−1)(y i − N o (⃗x))x i (7)<br />

It is useful to recall then because of the way we designed the ANN, equation 7<br />

can actually be written as ∂E<br />

∂w i<br />

= (−1)(y i − N o (⃗x))sin( iπx<br />

L<br />

). Using this, we can<br />

now make use of the Gradient Descent Algorithm (shown below) to train the<br />

network.<br />

5 Gradient Descent Algorithm<br />

Given a set of data points (x i , y i ) <strong>and</strong> a tolerance ε<br />

1. Intilize all weights to r<strong>and</strong>om values<br />

2. Repeat until ∆E < ε<br />

(a) For each weight w i set ∆w i := 0<br />

(b) For each data point (x i , y i )<br />

i. Set the inputs to x i<br />

ii. Compute the outputs<br />

iii. For each weight w i , let ∆w i = ∆w i + (y i − N o (⃗x))sin( iπx<br />

L )<br />

(c) For each weight set w i = w i + α∆w i<br />

Of key importance in the above algorithm is the value α which crops up on<br />

the last line. If this value is set too high, the algorithm will overstep the ε - ball<br />

around the target data point. Setting it too low will cause the algorithm to run<br />

for far too long. A value of 1 2ε tends to work well in most cases.<br />

5


6 The Main Theorem<br />

Now, we have shown that given a set of data points, we can generate the fourier<br />

series for a function which will approxomate the data set, as well be a continous<br />

function. What we would also like, is that given a function <strong>and</strong> enough data<br />

points on the function that we can use this method to generate the fourier sine<br />

series of the given function. This is exactly what we show in the theorem below.<br />

Theorem 1. The fourier sine series for any continous function q : [0, L] → R<br />

can be approxomated by an ANN.<br />

Proof. Let ε > 0 be arbitrary. On [0, L] we know that q(x) = ∑ ∞<br />

n=1 a nsin( nπx<br />

L ).<br />

Since [0, L] is compact, q([0, L]) := A is also compact. Choose S := {x 1 , ...x N } ⊂<br />

A such that if x ′ ɛ A then ∃x ′′ ɛ S with ‖x ′ − x ′′ ‖ < ε. We can do this since<br />

A is compact. Now, choose N 0 ɛ Z + such that ‖ ∑ N 0<br />

n=1 a nsin( nπx<br />

L ) − q(x)‖ < ε 8 .<br />

Construct an ANN in the manner discussed in the paper so far with N 0 input<br />

<strong>Neuron</strong>s. Define D to be the set {(x, y)ɛ[0, L] × R | ∃y ɛ S with q(x) = y}.<br />

Using the Gradient Descent Algorithm train this network with a tolerance of ε 8<br />

using D as the set of data points. This will give us a function,<br />

∑N 0<br />

q ′ (x) = a n sin( nπx<br />

L )<br />

n=1<br />

Now, if for x ∗ ɛ [0, L] we have q(x ∗ ) ɛ S then the following inequality holds:<br />

∑N 0<br />

‖q ′ (x ∗ ) − q(x ∗ )‖ = ‖ w n sin( nπx∗<br />

N 0<br />

L ) − ∑<br />

a n sin( nπx∗<br />

N 0<br />

L ) + ∑<br />

a n sin( nπx∗<br />

L<br />

) − q(x∗ )‖<br />

n=1<br />

≤ ε 8 + ε 8 = ε 4<br />

Since [0, L] is compact, both q ′ (x) <strong>and</strong> q(x) are uniformally continous. Hence,<br />

we can choose a δ > 0 such that if ‖x − y‖ < δ then ‖q(x) − q(y)‖ < ε 2<br />

. Also we<br />

can find {x 1 , ..., x M } := B in [0, L] such that if x ′ ɛ [0, L] ∃ x ′′ ɛ B with ‖x ′ −x ′′ ‖<br />

< min{δ, ε 8 }. Define T := {(x 1, q ′ (x 1 )), ..., (x M , q ′ (x M ))}. Using a training set<br />

of T ∪ D train the network using a tolerance of min{δ, ε 8<br />

}. Note that for each<br />

y ɛ [0, L] we can find a point in our training set such that ‖q ′ (y) − q ′ (x)‖ < ε 2<br />

due to our choice of tolerances. Finally for each y ɛ [0, L] we have:<br />

n=1<br />

‖q ′ (y) − q(y)‖ = ‖q ′ (y) − q ′ (x ∗ ) + q ′ (x ∗ ) − q(y)‖<br />

≤<br />

ε 2 + (x ∗ ) − q(y)‖<br />

‖q′<br />

‖q ′ (x ∗ ) − q(y)‖ = ‖q ′ (x ∗ ) − q(x ∗ ) + q(x ∗ ) − q(y)‖<br />

≤ ε 4 + ε 4 = ε 2<br />

n=1<br />

Hence ‖q ′ (y) − q(y)‖ < ε<br />

6


7 Biblography<br />

[1] Dayan, Peter & Abbott, L.F.(2001) Theoritical Neuroscience<br />

7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!