Neuron Networks and Fourier Series - CUMC

Neuron Networks and Fourier Series 

Gene Foxwell, Carleton University, gene.foxwell@gmail.com 

July 30 2004 

1 Abstract 

In this paper we will give a brief abstraction of the idea of Artificial Neuron 

Networks (ANN). Once this is done we will show how to generate the Fourier 

Sine Series for curve which fits a set of given data points using an ANN. Finally, 

we shall end this paper with a proof that the Fourier Sine Series of any continous 

function can be approxomated by an ANN combined with the method shown. 

2 Artificial Neurons 

In order to justify the abstraction used, we shall begin by examining a few 

important properties of Human Neurons [1]. 

Useful Properties of Human Neurons 

1. Neurons communicate thru excitory and inhibitory connections (or synapses) 

2. These synapses can be electrical or chemical. The chemical synapses are 

known to release Neurotransmitters which can change the properties of 

the synapse. i.e. make it more or less receptive to electrical stimulus. 

3. These signals are combined at the soma, changing the electrical properties 

of the Neuron(at rest the neuron has an electic potential created by 

pumping ions in and out thru the cell membrane) and are used by the cell 

to determine whether to send a signal out down to the dendrites. 

We can use the above facts above Neurons can be used to create an abstract 

structure which has shown to be useful for many computational and theoretical 

purposes. Our abstraction shall be as follows: 

Abstraction of the Human Neuron 

1

1. To each connection we assign a value w i ɛ R (which we shall refer to as the 

weight), where w i will represent how the i’th incomming synapse changes 

it’s input. So if x ɛ R is the input to the i’th connection, the synapse will 

change this signal to w i x. 

2. We then combine these signals using a simple summation. Thus, if ⃗x ɛ 

R n is the collection of inputs to a Neuron, and ⃗w ɛ R n is the collection 

of weights then we define the following to represent this property of the 

Neuron : 

s(⃗x) = ⃗x · ⃗w (1) 

3. We represent how the neuron responds to it’s signal by a function f : 

R → R Putting the properties we have so far together, we can represent 

a Neuron mathematically by a function N : R n → R which we define 

below: 

N(⃗x) = f(s(⃗x)) (2) 

Definition Throughout the remainder of this paper, by Artificial Neuron Network 

(ANN) we shall mean a directed graph G whose vertices shall represent 

abstract Neurons as described above and whose edges shall represent the direction 

in which a signal is passed from one vertex to another. 

Definition The input layer of an ANN is the collection of Neurons which can 

obtain a signal from a source outside of the ANN. i.e. they recieve input from 

the user. 

Definition The output layer of an ANN is the collection of Neurons which are 

considered as part of the final result of the ANN. For example if the ANN was 

designed to predict the future direction of a fly, the output layer would the set 

of Neurons which represented the final components of that direction. 

All ANN’s in this paper will be assumed to have no loops. Hence they can 

evaluated by simply evaluating the input layers first and propagating the signal 

from the input layer according to the structure of the graph until the output 

layer is reached. 

3 Representing Functions 

Consider the problem of scientist who has collected a sample set of data. Furthermore, 

let us suppose that the data was collected in such a way that the 

scientist would like to have an anaylitic curve which fits the data points. As 

with any measurement, the scientist can only be sure of the value of the point 

within some error tolerance ε. i.e. we would like a C 1 function such that for 

2

each of our data points (x ∗ , y ∗ ) we have ‖q(x ∗ )−y ∗ ‖ < ε for some predetermined 

ε. We shall show that an ANN can be used to solve this problem. 

To do this, we shall begin by considering a few methods via which the function 

would traditionaly be represented. 

1. Taylor Series: 

q(x) = 

∞∑ 

n=1 

q (n) (x) 

x n (3) 

n! 

The problem with basing a solution to the problem given in this section 

based on this representation is clear. Knowledge of the function is explicitly 

required in order to calculate the coefficients of the series. As we do 

not have any information other then a few points, this does not offer a 

very intuitive way to look at the problem. 

2. Power Series 

q(x) = 

∞∑ 

a n x n (4) 

n=1 

In contrast to the situation with Taylor Series, knowledge of the function 

itself is not explicitly required in order to calculate the coeffecients of the 

series. Our problem with this representation is more subtle. Suppose that 

our imaginary scientist was required to deal directly with numbers in data 

set which were very large, 3.0 × 10 6 for example, then as n becomes large 

it is easy to see that the calculations involved in evaluating the resulting 

function will become unwieldly and subject to computer rounding errors. 

It is concievable that a clever programmer could get around this problem 

with some fancy code, but rather then place the burden of fixing this 

problem on the person implementing the solution, it would be prefferred 

if we could avoid the problem as much as possible. With this idea in mind 

we turn to our next representation. 

3. Fourier Sine Series 

q(x) = 

∞∑ 

n=1 

a n sin( nπx 

L ) (5) 

We will notice that our scientist, having not been endowed with any supernatural 

powers, can only have collected a finite amount of data. Provided 

the scientist has indexed this data using positive indices, we can safely 

assume that the indices of the data points all lie in the range [0, L]. Furthermore, 

as ‖sin(x)‖ < 1 ∀x ɛ R, this choice of series allows us to keep 

the size of the numbers being returned by the computer with a reasonable 

size. 

3

Our choice to represent the function by an infinite series was made because it 

actually suggests an intuitive way to solve the problem using ANN’s. To see 

why, recall that for a tolerance ε > 0 we know that ∃N ɛ Z + such that ∀k > N 

and ∀x ɛ [0, L] we have ‖q(x) − ∑ k 

n=1 a nsin( nπx 

L 

)‖ < ε. 

So our solution would be to build an ANN whose graph consisted of two 

layers, an input layer and an output layer. The output layer will be assumed to 

have one Neuron N 0 in it whose representation shall be the identity function. 

i.e. N o (⃗x) = s(⃗x). By giving the input layer k neurons, and then giving the 

k’th neuron the representation N k (x) = sin( kπs(x) 

L 

) we can reproduce the series 

approxomation by choosing suitable weights. The problem now reduces to one 

of finding the weights. 

4 Method Of Gradient Descent 

Assume that we begin this task by using the easiest possible method to implement 

- we guess and trust that luck will make our guess the correct one. How far 

off is this guess We will define an error metric in the following manner. First, 

recall from the data of the problem that we have a set of data points (x, y) to 

begin with. Now, A be the collection of all pairs (x, y ∗ ) such that x comes from 

the input of the original data set and y ∗ is the result of evaluating our ANN 

with the input x given to each of our input Neurons. Assuming that we have 

N inputs, we can evaluate the error using the metric below (which works out 

to be nothing more then the average error). 

E(y, y ∗ ) = 1 N 

N∑ 

(y i − yi ∗ ) 2 (6) 

At this point, we shall recall the remaining fact about Human Neurons which 

we have not yet explicity built into our model, that is the fact the synapses 

can communicate - changing thier effect on the signal that is recieved. As we 

wish for the system to move closer to a correct answer (as much as this is 

possible given the data) it is suggested that we use a method which will move 

the weights in such a way as to be closer to the true answer. This is where 

the Gradient Descent Algorithm comes in - a common algorithm found in the 

training of ANN’s in computer science. To use it, we should first calculate how 

the error changes if we change one of the input weights. (Note: In the following 

calculation ⃗x = (x 1 , ..., x N ) is the vector of results from the input Neurons). 

i=1 

4

E(y, y ∗ ) = 1 N 

N∑ 

(y i − N o (⃗x)) 2 

i=1 

∂E 

= ∂E ∂N o 

∂w i ∂N o ∂w i 

∂E 

∂N o 

= (−1)(y i − N o (⃗x)) 

∂N o 

∂w i 

= x i 

Combining all this together we get, 

∂E 

∂w i 

= (−1)(y i − N o (⃗x))x i (7) 

It is useful to recall then because of the way we designed the ANN, equation 7 

can actually be written as ∂E 

∂w i 

= (−1)(y i − N o (⃗x))sin( iπx 

L 

). Using this, we can 

now make use of the Gradient Descent Algorithm (shown below) to train the 

network. 

5 Gradient Descent Algorithm 

Given a set of data points (x i , y i ) and a tolerance ε 

1. Intilize all weights to random values 

2. Repeat until ∆E < ε 

(a) For each weight w i set ∆w i := 0 

(b) For each data point (x i , y i ) 

i. Set the inputs to x i 

ii. Compute the outputs 

iii. For each weight w i , let ∆w i = ∆w i + (y i − N o (⃗x))sin( iπx 

L ) 

(c) For each weight set w i = w i + α∆w i 

Of key importance in the above algorithm is the value α which crops up on 

the last line. If this value is set too high, the algorithm will overstep the ε - ball 

around the target data point. Setting it too low will cause the algorithm to run 

for far too long. A value of 1 2ε tends to work well in most cases. 

5

6 The Main Theorem 

Now, we have shown that given a set of data points, we can generate the fourier 

series for a function which will approxomate the data set, as well be a continous 

function. What we would also like, is that given a function and enough data 

points on the function that we can use this method to generate the fourier sine 

series of the given function. This is exactly what we show in the theorem below. 

Theorem 1. The fourier sine series for any continous function q : [0, L] → R 

can be approxomated by an ANN. 

Proof. Let ε > 0 be arbitrary. On [0, L] we know that q(x) = ∑ ∞ 


L ). 

Since [0, L] is compact, q([0, L]) := A is also compact. Choose S := {x 1 , ...x N } ⊂ 

A such that if x ′ ɛ A then ∃x ′′ ɛ S with ‖x ′ − x ′′ ‖ < ε. We can do this since 

A is compact. Now, choose N 0 ɛ Z + such that ‖ ∑ N 0 


L ) − q(x)‖ < ε 8 . 

Construct an ANN in the manner discussed in the paper so far with N 0 input 

Neurons. Define D to be the set {(x, y)ɛ[0, L] × R | ∃y ɛ S with q(x) = y}. 

Using the Gradient Descent Algorithm train this network with a tolerance of ε 8 

using D as the set of data points. This will give us a function, 

∑N 0 

q ′ (x) = a n sin( nπx 

L ) 

n=1 

Now, if for x ∗ ɛ [0, L] we have q(x ∗ ) ɛ S then the following inequality holds: 

∑N 0 

‖q ′ (x ∗ ) − q(x ∗ )‖ = ‖ w n sin( nπx∗ 

N 0 

L ) − ∑ 

a n sin( nπx∗ 

N 0 

L ) + ∑ 

a n sin( nπx∗ 

L 

) − q(x∗ )‖ 

n=1 

≤ ε 8 + ε 8 = ε 4 

Since [0, L] is compact, both q ′ (x) and q(x) are uniformally continous. Hence, 

we can choose a δ > 0 such that if ‖x − y‖ < δ then ‖q(x) − q(y)‖ < ε 2 

. Also we 

can find {x 1 , ..., x M } := B in [0, L] such that if x ′ ɛ [0, L] ∃ x ′′ ɛ B with ‖x ′ −x ′′ ‖ 

< min{δ, ε 8 }. Define T := {(x 1, q ′ (x 1 )), ..., (x M , q ′ (x M ))}. Using a training set 

of T ∪ D train the network using a tolerance of min{δ, ε 8 

}. Note that for each 

y ɛ [0, L] we can find a point in our training set such that ‖q ′ (y) − q ′ (x)‖ < ε 2 

due to our choice of tolerances. Finally for each y ɛ [0, L] we have: 

n=1 

‖q ′ (y) − q(y)‖ = ‖q ′ (y) − q ′ (x ∗ ) + q ′ (x ∗ ) − q(y)‖ 

≤ 

ε 2 + (x ∗ ) − q(y)‖ 

‖q′ 

‖q ′ (x ∗ ) − q(y)‖ = ‖q ′ (x ∗ ) − q(x ∗ ) + q(x ∗ ) − q(y)‖ 

≤ ε 4 + ε 4 = ε 2 

n=1 

Hence ‖q ′ (y) − q(y)‖ < ε 

6

7 Biblography 

[1] Dayan, Peter & Abbott, L.F.(2001) Theoritical Neuroscience 

7

Neuron Networks and Fourier Series - CUMC

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?