Robust data filtering in wind power systems - circe - Universidad de ...

Robust data filtering in wind power systems - circe - Universidad de ... Robust data filtering in wind power systems - circe - Universidad de ...

teide.cps.unizar.es
from teide.cps.unizar.es More from this publisher
25.07.2013 Views

Robust data filtering in wind power systems A. Llombart , C. Pueyo, J.M. Fandos J.J. Guerrero . Fundación CIRCE and Dept. of Electrical Engineering I3A and Dept. of Computer Science and Systems Engineering Universidad de Zaragoza Universidad de Zaragoza llombart@unizar.es jguerrer@unizar.es Abstract: The need to obtain automatic filters to treat wind data is clear due to the huge amount of data available in any wind farm analysis. In addition, in spite of some alarms being registered by the wind farm’s SCADA system, it is difficult to use the data due to its low quality. In this paper, the problems to carry out this work are stated and a new technique to filter the data is presented. The robust statistical technique widely used in other fields as computer vision systems, is adapted to deal with this problem with promising results. The results achieved are tested with real data from wind farms in the Ebro valley in Spain. Keywords: wind energy, data filtering, robust statistics, least median of squares 1 Introduction The cost effective operation and maintenance of wind farms requires accurate and informed condition monitoring of wind turbines. One such technique is to analyse over time the power performance of the turbines. For the latest wind turbines, e.g. >800kW, there is limited data available to assess the long-term power performance of these machines. In addition, in Spain there are concerns in relation to the long-term operation in sites with highly complex terrain and high levels of turbulence. To assess the efficiency of a wind turbine the relationship between the mean wind speed (taken either at the meteorological mast or at the nacelle) and the power generated by each wind turbine must be found. But the best method to do this is not clear. There are some factors that complicate this relationship: 1. The time delay in the wind speed propagation and how this affects the correlation between mast and turbine wind speed; 2. The topography of the wind farm site; 3. The presence of nearby obstacles. Several studies have been carried out on this topic. In [1-3] some methods to estimate wind turbine power are proposed. In [5] different ways to characterize the power curve are discussed and tested. And, in [6] a method of monitoring and control of the electrical power production of wind turbines is specified. All these methods need good data in order to obtain the parameters proposed. In fact, this is one of the most critical steps. As was stated in [9], experience has shown that real measurements are prone to wrong data, therefore it is essential to have some sort of data check to protect the models against the influence of erroneous measurements. They use a range check, a stationary check and a confidence check with predefined confidence bands to detect wrong data. There are many possible circumstances that can affect the quality of data: • Sensor accuracy • EMI • Information processing errors • Storage faults • Faults in the communication systems • Alarms in the wind turbine • etc Some of these factors introduce an error component in the data value. Others make the data disappear, and, depending on the measurement system, the data is replaced by top or bottom values. It is not usual to find a huge amount of wrong data, but they exist and they could seriously affect the results of either a modelling process or a wind resource assessment. Focusing the problem on wind power characterization, the most important data to consider are: • Wind speed • Power production Wind speed can be affected by shade effect due to the presence of obstacles. This effect makes the wind speed decrease. In figure 1 this phenomenon can clearly be observed (data in zone 1). Considering the power production, if the value of a power data point is the mean value of the power produced during ten minutes by a wind turbine, it must be assured that the

<strong>Robust</strong> <strong>data</strong> <strong>filter<strong>in</strong>g</strong> <strong>in</strong> w<strong>in</strong>d <strong>power</strong> <strong>systems</strong><br />

A. Llombart , C. Pueyo, J.M. Fandos J.J. Guerrero .<br />

Fundación CIRCE and Dept. of Electrical Eng<strong>in</strong>eer<strong>in</strong>g I3A and Dept. of Computer Science and Systems Eng<strong>in</strong>eer<strong>in</strong>g<br />

<strong>Universidad</strong> <strong>de</strong> Zaragoza <strong>Universidad</strong> <strong>de</strong> Zaragoza<br />

llombart@unizar.es jguerrer@unizar.es<br />

Abstract:<br />

The need to obta<strong>in</strong> automatic filters to treat w<strong>in</strong>d <strong>data</strong><br />

is clear due to the huge amount of <strong>data</strong> available <strong>in</strong><br />

any w<strong>in</strong>d farm analysis. In addition, <strong>in</strong> spite of some<br />

alarms be<strong>in</strong>g registered by the w<strong>in</strong>d farm’s SCADA<br />

system, it is difficult to use the <strong>data</strong> due to its low<br />

quality. In this paper, the problems to carry out this<br />

work are stated and a new technique to filter the <strong>data</strong><br />

is presented. The robust statistical technique wi<strong>de</strong>ly<br />

used <strong>in</strong> other fields as computer vision <strong>systems</strong>, is<br />

adapted to <strong>de</strong>al with this problem with promis<strong>in</strong>g<br />

results. The results achieved are tested with real <strong>data</strong><br />

from w<strong>in</strong>d farms <strong>in</strong> the Ebro valley <strong>in</strong> Spa<strong>in</strong>.<br />

Keywords: w<strong>in</strong>d energy, <strong>data</strong> <strong>filter<strong>in</strong>g</strong>, robust<br />

statistics, least median of squares<br />

1 Introduction<br />

The cost effective operation and ma<strong>in</strong>tenance of w<strong>in</strong>d<br />

farms requires accurate and <strong>in</strong>formed condition<br />

monitor<strong>in</strong>g of w<strong>in</strong>d turb<strong>in</strong>es. One such technique is to<br />

analyse over time the <strong>power</strong> performance of the<br />

turb<strong>in</strong>es.<br />

For the latest w<strong>in</strong>d turb<strong>in</strong>es, e.g. >800kW, there is<br />

limited <strong>data</strong> available to assess the long-term <strong>power</strong><br />

performance of these mach<strong>in</strong>es. In addition, <strong>in</strong> Spa<strong>in</strong><br />

there are concerns <strong>in</strong> relation to the long-term<br />

operation <strong>in</strong> sites with highly complex terra<strong>in</strong> and<br />

high levels of turbulence.<br />

To assess the efficiency of a w<strong>in</strong>d turb<strong>in</strong>e the<br />

relationship between the mean w<strong>in</strong>d speed (taken<br />

either at the meteorological mast or at the nacelle)<br />

and the <strong>power</strong> generated by each w<strong>in</strong>d turb<strong>in</strong>e must<br />

be found. But the best method to do this is not clear.<br />

There are some factors that complicate this<br />

relationship:<br />

1. The time <strong>de</strong>lay <strong>in</strong> the w<strong>in</strong>d speed<br />

propagation and how this affects the<br />

correlation between mast and turb<strong>in</strong>e w<strong>in</strong>d<br />

speed;<br />

2. The topography of the w<strong>in</strong>d farm site;<br />

3. The presence of nearby obstacles.<br />

Several studies have been carried out on this topic. In<br />

[1-3] some methods to estimate w<strong>in</strong>d turb<strong>in</strong>e <strong>power</strong><br />

are proposed. In [5] different ways to characterize the<br />

<strong>power</strong> curve are discussed and tested. And, <strong>in</strong> [6] a<br />

method of monitor<strong>in</strong>g and control of the electrical<br />

<strong>power</strong> production of w<strong>in</strong>d turb<strong>in</strong>es is specified.<br />

All these methods need good <strong>data</strong> <strong>in</strong> or<strong>de</strong>r to<br />

obta<strong>in</strong> the parameters proposed. In fact, this is one of<br />

the most critical steps.<br />

As was stated <strong>in</strong> [9], experience has shown that<br />

real measurements are prone to wrong <strong>data</strong>, therefore<br />

it is essential to have some sort of <strong>data</strong> check to<br />

protect the mo<strong>de</strong>ls aga<strong>in</strong>st the <strong>in</strong>fluence of erroneous<br />

measurements. They use a range check, a stationary<br />

check and a confi<strong>de</strong>nce check with pre<strong>de</strong>f<strong>in</strong>ed<br />

confi<strong>de</strong>nce bands to <strong>de</strong>tect wrong <strong>data</strong>.<br />

There are many possible circumstances that can<br />

affect the quality of <strong>data</strong>:<br />

• Sensor accuracy<br />

• EMI<br />

• Information process<strong>in</strong>g errors<br />

• Storage faults<br />

• Faults <strong>in</strong> the communication <strong>systems</strong><br />

• Alarms <strong>in</strong> the w<strong>in</strong>d turb<strong>in</strong>e<br />

• etc<br />

Some of these factors <strong>in</strong>troduce an error component<br />

<strong>in</strong> the <strong>data</strong> value. Others make the <strong>data</strong> disappear,<br />

and, <strong>de</strong>pend<strong>in</strong>g on the measurement system, the <strong>data</strong><br />

is replaced by top or bottom values.<br />

It is not usual to f<strong>in</strong>d a huge amount of wrong<br />

<strong>data</strong>, but they exist and they could seriously affect the<br />

results of either a mo<strong>de</strong>ll<strong>in</strong>g process or a w<strong>in</strong>d<br />

resource assessment.<br />

Focus<strong>in</strong>g the problem on w<strong>in</strong>d <strong>power</strong><br />

characterization, the most important <strong>data</strong> to consi<strong>de</strong>r<br />

are:<br />

• W<strong>in</strong>d speed<br />

• Power production<br />

W<strong>in</strong>d speed can be affected by sha<strong>de</strong> effect due to the<br />

presence of obstacles. This effect makes the w<strong>in</strong>d<br />

speed <strong>de</strong>crease. In figure 1 this phenomenon can<br />

clearly be observed (<strong>data</strong> <strong>in</strong> zone 1). Consi<strong>de</strong>r<strong>in</strong>g the<br />

<strong>power</strong> production, if the value of a <strong>power</strong> <strong>data</strong> po<strong>in</strong>t<br />

is the mean value of the <strong>power</strong> produced dur<strong>in</strong>g ten<br />

m<strong>in</strong>utes by a w<strong>in</strong>d turb<strong>in</strong>e, it must be assured that the


w<strong>in</strong>d turb<strong>in</strong>e has worked properly dur<strong>in</strong>g those ten<br />

m<strong>in</strong>utes. Otherwise the <strong>data</strong> doesn’t represent the<br />

normal work<strong>in</strong>g of the turb<strong>in</strong>e (figure 1, <strong>data</strong> <strong>in</strong> zone<br />

2). Sometimes the effect of the different problems is<br />

not so clear. In Figure 1, <strong>data</strong> <strong>in</strong> zone 3 and zone 4<br />

are probably bad, but this cannot be assured without<br />

consi<strong>de</strong>r<strong>in</strong>g more <strong>in</strong>formation.<br />

<strong>power</strong> (kW)<br />

900<br />

800<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

1<br />

3<br />

0 5 10 15 20 25<br />

w<strong>in</strong>d speed (m/s)<br />

Figure 1: Orig<strong>in</strong>al set of <strong>data</strong> consi<strong>de</strong>red<br />

It could be consi<strong>de</strong>red that the problem of <strong>data</strong> <strong>power</strong><br />

quality can be solved accurately. There are alarm<br />

records <strong>in</strong> the SCADA system of the w<strong>in</strong>d farm. But<br />

usually these records do not exist for the whole<br />

period <strong>in</strong> question and other times the accuracy of the<br />

alarms recor<strong>de</strong>d cannot be assured.<br />

In figure 2 it can be observed that some <strong>data</strong> en<br />

zones 1 and 2 have not been well filtered. So, <strong>de</strong>spite<br />

consi<strong>de</strong>r<strong>in</strong>g the alarms register some wrong <strong>data</strong> still<br />

rema<strong>in</strong>.<br />

<strong>power</strong> (kW)<br />

900<br />

800<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

0 5 10 15 20 25<br />

w<strong>in</strong>d speed (m/s)<br />

Figure 2: Data set after <strong>filter<strong>in</strong>g</strong> those affected by the<br />

alarms recor<strong>de</strong>d at the SCADA system<br />

There is a huge amount of <strong>data</strong> <strong>filter<strong>in</strong>g</strong> techniques,<br />

but their application <strong>in</strong> w<strong>in</strong>d <strong>systems</strong> it is not well<br />

referenced. Moreover, <strong>data</strong> com<strong>in</strong>g from w<strong>in</strong>d<br />

4<br />

2<br />

<strong>systems</strong> is not regular, i.e., the changes <strong>in</strong> the mean<br />

values of the w<strong>in</strong>d speed are not easily characterized<br />

and this is reflected <strong>in</strong> the <strong>power</strong> <strong>data</strong> so the <strong>filter<strong>in</strong>g</strong><br />

of that <strong>data</strong> is not easy, therefore, most people <strong>in</strong> the<br />

w<strong>in</strong>d sector filter the <strong>data</strong> manually.<br />

The large volumes of <strong>data</strong> recor<strong>de</strong>d by the<br />

SCADA system means that <strong>filter<strong>in</strong>g</strong> this <strong>data</strong> us<strong>in</strong>g<br />

the current method is hugely time consum<strong>in</strong>g (40<br />

man-hours for the example presented). The use of the<br />

method proposed <strong>in</strong> this paper can reduce the time<br />

required to 20% of this.<br />

In this paper the behaviour of <strong>data</strong> com<strong>in</strong>g from<br />

w<strong>in</strong>d <strong>systems</strong> is outl<strong>in</strong>ed. To achieve this objective,<br />

both the behaviour of the different variables to be<br />

measured and the relationship between the variability<br />

of the <strong>power</strong> and the w<strong>in</strong>d speed are analyzed.<br />

We present an approach to filter w<strong>in</strong>d <strong>data</strong> us<strong>in</strong>g<br />

the Least Median of Squares (LMedS) method. It<br />

allows <strong>de</strong>tect<strong>in</strong>g and reject<strong>in</strong>g bad measures<br />

simultaneously with the w<strong>in</strong>d <strong>power</strong> curve<br />

estimation, work<strong>in</strong>g <strong>in</strong> a robust and reliable way.<br />

We validate the robust method for <strong>power</strong> curve<br />

characterization <strong>in</strong> real scenarios, compar<strong>in</strong>g it with<br />

other methods. We use cru<strong>de</strong> <strong>data</strong> of <strong>power</strong><br />

generation and w<strong>in</strong>d speed, where spurious<br />

measurements are numerous. The results are tested<br />

us<strong>in</strong>g real <strong>data</strong> from w<strong>in</strong>d farms <strong>in</strong> the Ebro valley <strong>in</strong><br />

Spa<strong>in</strong>.<br />

2 Parameters and measurement<br />

<strong>systems</strong><br />

The different <strong>data</strong> nee<strong>de</strong>d to characterize the<br />

behaviour of a w<strong>in</strong>d turb<strong>in</strong>e are:<br />

• Power production<br />

• W<strong>in</strong>d speed<br />

• W<strong>in</strong>d <strong>de</strong>nsity (temperature and pressure)<br />

• W<strong>in</strong>d direction (optional, used only <strong>in</strong> some<br />

mo<strong>de</strong>ls)<br />

• Humidity (optional, used only <strong>in</strong> some<br />

mo<strong>de</strong>ls)<br />

The most problematic parameters are the w<strong>in</strong>d speed<br />

and the <strong>power</strong> production. This is due to the random<br />

nature and the quick variation of the w<strong>in</strong>d speed that<br />

<strong>in</strong>troduces some fluctuations <strong>in</strong> the relationship<br />

between both parameters. Other parameters such as<br />

temperature, pressure, humidity etc., are not<br />

problematic.<br />

To measure w<strong>in</strong>d speed and direction<br />

anemometers and w<strong>in</strong>dvanes are used. These are<br />

positioned on a meteorological mast at a height quite<br />

close to that of the nacelle. In or<strong>de</strong>r to mo<strong>de</strong>l the<br />

w<strong>in</strong>d speed with the height some more measurements<br />

are taken at heights of 10 meters below the first and<br />

normally 10 meters above the ground level. If these<br />

measurements are taken at the nacelle the <strong>in</strong>struments<br />

are located on the back part of it.


The <strong>power</strong> production is registered by the <strong>power</strong><br />

meter at the turb<strong>in</strong>e and then is recor<strong>de</strong>d <strong>in</strong> the<br />

SCADA system of the w<strong>in</strong>d farm.<br />

The characterization of the <strong>power</strong> curve implies<br />

establish<strong>in</strong>g the relationship between <strong>power</strong><br />

production and w<strong>in</strong>d speed, therefore the quality of<br />

both <strong>data</strong> sets must be assured. There are two<br />

possibilities of consi<strong>de</strong>r<strong>in</strong>g the w<strong>in</strong>d speed:<br />

• W<strong>in</strong>d speed taken at the meteorological<br />

mast<br />

• W<strong>in</strong>d speed taken at the nacelle<br />

In [5 ,6] it is <strong>de</strong>monstrated that the w<strong>in</strong>d speed taken<br />

at the nacelle is more suitable for obta<strong>in</strong><strong>in</strong>g the<br />

relationship mentioned above. So this is the only case<br />

that is consi<strong>de</strong>red <strong>in</strong> this paper.<br />

3 Mo<strong>de</strong>ls to be consi<strong>de</strong>red<br />

As expla<strong>in</strong>ed <strong>in</strong> section 4, the robust statistical<br />

method needs a mo<strong>de</strong>l <strong>in</strong> or<strong>de</strong>r to work. So, we have<br />

consi<strong>de</strong>red two different mo<strong>de</strong>ls to estimate the<br />

<strong>power</strong> curve as a function of the w<strong>in</strong>d speed.<br />

The first one consi<strong>de</strong>rs a polynomial regression<br />

and a w<strong>in</strong>d speed partition. The <strong>power</strong> curve of a<br />

w<strong>in</strong>d turb<strong>in</strong>e has three clearly <strong>de</strong>f<strong>in</strong>ed parts. Firstly,<br />

between 0 m/s and the cut-<strong>in</strong> w<strong>in</strong>d speed the<br />

production is zero, then the <strong>power</strong> output <strong>in</strong>creases<br />

follow<strong>in</strong>g, <strong>in</strong> theory, a cubic relationship with w<strong>in</strong>d<br />

speed until the rated w<strong>in</strong>d speed, and f<strong>in</strong>ally,<br />

<strong>de</strong>pend<strong>in</strong>g on the w<strong>in</strong>d turb<strong>in</strong>e mo<strong>de</strong>l, the <strong>power</strong><br />

output is regulated to stay almost constant until the<br />

cut-out or furl<strong>in</strong>g w<strong>in</strong>d speed. This three-part shape<br />

suggests that different polynomial relationships<br />

should be used for different w<strong>in</strong>d speed ranges.<br />

We consi<strong>de</strong>r a third or<strong>de</strong>r polynomial to adjust the<br />

<strong>power</strong> (P) <strong>in</strong> function of the w<strong>in</strong>d speed (V) <strong>in</strong> each<br />

part.<br />

P +<br />

3<br />

= a0<br />

+ a1V<br />

+ L a3V<br />

(1)<br />

For an i<strong>de</strong>al <strong>power</strong> curve, the partition<strong>in</strong>g of the<br />

curve for the purposes of fitt<strong>in</strong>g a polynomial is fairly<br />

clear. However, if a turb<strong>in</strong>e is a significant distance<br />

from the meteorological tower, the effective <strong>power</strong><br />

curve observed may not be so clearly partitioned and<br />

may differ from turb<strong>in</strong>e to turb<strong>in</strong>e <strong>de</strong>pend<strong>in</strong>g on the<br />

distance.<br />

The second mo<strong>de</strong>l is based on the b<strong>in</strong> method,<br />

which is outl<strong>in</strong>ed <strong>in</strong> the IEC 61400-12 standard. In<br />

this case the <strong>data</strong> are grouped us<strong>in</strong>g b<strong>in</strong>s of 0.5 m/s of<br />

size. We consi<strong>de</strong>r a cont<strong>in</strong>uous polygonal curve<br />

consi<strong>de</strong>r<strong>in</strong>g <strong>in</strong> each b<strong>in</strong> a l<strong>in</strong>ear relation between the<br />

<strong>power</strong> and the w<strong>in</strong>d speed. At the moment<br />

partition<strong>in</strong>g by w<strong>in</strong>d direction has not been<br />

consi<strong>de</strong>red because there are not <strong>data</strong> of w<strong>in</strong>d<br />

direction from the anemometer at the turb<strong>in</strong>e, but<br />

only from the meteorological tower, where the w<strong>in</strong>d<br />

speed does not predict so well the <strong>power</strong> of the w<strong>in</strong>d<br />

turb<strong>in</strong>e [5].<br />

4 The robust statistic method<br />

We have unknown wrong <strong>data</strong> po<strong>in</strong>ts (outliers) <strong>in</strong><br />

<strong>data</strong>sets with redundant <strong>data</strong>, the outliers must be<br />

filtered out <strong>in</strong> or<strong>de</strong>r to compute the best solution. This<br />

circumstance is similar to some situations <strong>in</strong><br />

computer vision <strong>systems</strong> applications where, robust<br />

techniques have be<strong>in</strong>g used with very good results,<br />

and are actually mandatory <strong>in</strong> practice.<br />

4.1 <strong>Robust</strong> estimation<br />

Usually we have many measurements and an<br />

estimation method can be used to process all of them,<br />

exploit<strong>in</strong>g redundancy to get a better result. The<br />

classical method is to use the Least Mean of Squares<br />

(LMS) estimator. The LMS method assumes that all<br />

the measurements can be <strong>in</strong>terpreted with the same<br />

mo<strong>de</strong>l, which makes it to be very sensitive to out of<br />

norm <strong>data</strong>, or outliers. It has a breakdown of 0% of<br />

spurious <strong>data</strong>, which means that a sole outlier can<br />

<strong>de</strong>stroy the curve fitt<strong>in</strong>g [4]. LMS m<strong>in</strong>imizes the sum<br />

of the squares over all the measurements and, if a<br />

measurement is far away from the correct value, its<br />

squares error prevail <strong>in</strong> the summation, and therefore<br />

prevails <strong>in</strong> the fitt<strong>in</strong>g. In figure 3 we can see a simple<br />

example where a l<strong>in</strong>e is fitted from noisy po<strong>in</strong>ts.<br />

Us<strong>in</strong>g LMS a sole outlier po<strong>in</strong>t can <strong>de</strong>stroy the fitt<strong>in</strong>g<br />

4<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

LMS l<strong>in</strong>e<br />

0<br />

0 2 4 6 8 10 12<br />

Figure 3: Fitt<strong>in</strong>g of a l<strong>in</strong>e from noisy po<strong>in</strong>ts by LMS<br />

method.<br />

Several people have tried to make this estimator<br />

robust by replac<strong>in</strong>g the square by someth<strong>in</strong>g else like<br />

the absolute value, but not touch<strong>in</strong>g the summation<br />

sign. However, the key issue is to prevent outliers<br />

from hav<strong>in</strong>g any <strong>in</strong>fluence on the result.<br />

<strong>Robust</strong> estimators provi<strong>de</strong> well foun<strong>de</strong>d methods<br />

to <strong>de</strong>tect the outliers, obta<strong>in</strong><strong>in</strong>g trustworthy results<br />

even if a certa<strong>in</strong> amount of <strong>data</strong> is contam<strong>in</strong>ated [4].<br />

From the exist<strong>in</strong>g robust estimation methods [7, 8],<br />

we have chosen the Leeds method. Compar<strong>in</strong>g with<br />

the LMS method, the LMedS method replaces the<br />

sum by the median which is very robust, but


unfortunately it has no analytical solution, and an<br />

iterative solution is required.<br />

The LMedS method makes a search <strong>in</strong> the range<br />

of solutions obta<strong>in</strong>ed from subsets of m<strong>in</strong>imum<br />

number of <strong>data</strong>. Let us suppose that a polynomial<br />

regression with 4 coefficients is used to fit the <strong>power</strong><br />

curve. If we need a m<strong>in</strong>imum of 4 measures to adjust<br />

the mo<strong>de</strong>l for the <strong>power</strong> curve, and there are a total of<br />

n measurements, then the space of solutions will be<br />

obta<strong>in</strong>ed from the comb<strong>in</strong>ations of n elements taken 4<br />

at a time, giv<strong>in</strong>g m possible solutions:<br />

n!<br />

m = . (2)<br />

4!<br />

( n − 4)!<br />

The algorithm to obta<strong>in</strong> an estimate with this method<br />

can be summarized as follows:<br />

1) Calculate the m subsets of the m<strong>in</strong>imum<br />

number of measurements required to fit<br />

your curve.<br />

2) For each subset S, we compute a <strong>power</strong><br />

curve <strong>in</strong> closed form PS.<br />

3) For each solution PS, the median MS of<br />

the squares of the residue with respect to<br />

all the measurements is computed.<br />

4) We store the solution PS which gives the<br />

least median MS.<br />

When hav<strong>in</strong>g many measurements, this exhaustive<br />

search is computationally too expensive, and a<br />

solution <strong>in</strong> practice is to select randomly enough<br />

subsets, of the m<strong>in</strong>imum number of measurements<br />

required to adjust the <strong>power</strong> curve, to warrant a<br />

reasonable probability of not fail<strong>in</strong>g. Thus, the first<br />

step is substituted by a Monte-Carlo technique to<br />

randomly select m subsets of 4 measurements.<br />

A selection of m subsets is good if at least <strong>in</strong> one<br />

subset all the measurements are good. If Pns is the<br />

probability that a measurement is not spurious, and<br />

Pm is the assumed probability of miss<strong>in</strong>g the<br />

computation, i.e., not reach<strong>in</strong>g a good solution, the<br />

number of subsets to consi<strong>de</strong>r can be computed as:<br />

log Pm<br />

m = . (3)<br />

4<br />

log ( 1 − Pns<br />

)<br />

For example, if we accept a small probability of<br />

fail Pm = 0.001, with an estimation of the probability<br />

of good measurements Pns = 75%, the number of<br />

subsets m should be 19.<br />

In the example of figure 3, the LMedS method<br />

obta<strong>in</strong>s a good l<strong>in</strong>e if the outlier po<strong>in</strong>t is not randomly<br />

selected <strong>in</strong> at least one of the subsets of two po<strong>in</strong>ts.<br />

4.2 Rejection of wrong <strong>data</strong><br />

The search <strong>in</strong> the space of solutions us<strong>in</strong>g the median<br />

gives a robust solution where the spurious <strong>data</strong> or<br />

outliers have no <strong>in</strong>fluence. Besi<strong>de</strong>s that, we can<br />

<strong>de</strong>tect easily the outliers as those of higher residue<br />

assum<strong>in</strong>g the noise for the <strong>in</strong>liers is Gaussian. The<br />

standard <strong>de</strong>viation of the error can be estimated from<br />

the least median of the residues MS as [4]:<br />

[ ( ) ] S M n 4 5 1 +<br />

48 . 1 ˆ −<br />

= σ . (4)<br />

This allows the <strong>de</strong>f<strong>in</strong>ition of a threshold to select the<br />

outliers from the <strong>in</strong>liers. Tak<strong>in</strong>g, for example, a<br />

probability of 99% of accept<strong>in</strong>g a measure be<strong>in</strong>g<br />

good, the threshold will be fixed at 2.57σˆ . A more<br />

<strong>de</strong>tailed explanation of this method can be seen <strong>in</strong><br />

[4].<br />

5 Experimental results<br />

5.1 Methodology<br />

To test the quality of the <strong>filter<strong>in</strong>g</strong> method proposed, a<br />

year of historical <strong>data</strong> from a w<strong>in</strong>d farm has been<br />

used.<br />

The w<strong>in</strong>d farm chosen to perform the different<br />

tests is situated <strong>in</strong> Aragon (Spa<strong>in</strong>). Due to<br />

confi<strong>de</strong>ntiality issues we are not allowed to present<br />

the <strong>data</strong> <strong>in</strong> <strong>de</strong>tail. Therefore <strong>in</strong> or<strong>de</strong>r to un<strong>de</strong>rstand the<br />

type of terra<strong>in</strong> and to present the situation of the w<strong>in</strong>d<br />

turb<strong>in</strong>es whose <strong>data</strong> is used <strong>in</strong> the study, a simplified<br />

map is presented (Fig. 4).<br />

Figure 4. A simplified view of the w<strong>in</strong>d farm<br />

In this map, it can be seen that the terra<strong>in</strong> is<br />

extremely complex, the turb<strong>in</strong>es are positioned <strong>in</strong> two<br />

l<strong>in</strong>es (follow<strong>in</strong>g, more or less, the straight l<strong>in</strong>es<br />

represented <strong>in</strong> the map) and the prevail<strong>in</strong>g w<strong>in</strong>d<br />

direction is <strong>in</strong>dicated by the big arrow <strong>in</strong> the top left<br />

corner.


The analysis has been divi<strong>de</strong>d <strong>in</strong>to different tests:<br />

1 Filter<strong>in</strong>g us<strong>in</strong>g the alarm records <strong>in</strong> the SCADA<br />

system<br />

This <strong>filter<strong>in</strong>g</strong> method consists of mark<strong>in</strong>g as ‘wrong<br />

<strong>data</strong>’, all those <strong>data</strong> po<strong>in</strong>ts that are affected by the<br />

alarm record of the SCADA system.<br />

2 Filter<strong>in</strong>g us<strong>in</strong>g the alarm records at the SCADA<br />

system plus a classical statistical method<br />

Once the alarm records have been consi<strong>de</strong>red, the<br />

<strong>data</strong> are divi<strong>de</strong>d <strong>in</strong> b<strong>in</strong>s. The mean value (μ) and the<br />

standard <strong>de</strong>viation (σ) are calculated for each b<strong>in</strong>.<br />

Then, all the <strong>data</strong> that are bigger than μ + 3σ or<br />

smaller than μ – 3σ are consi<strong>de</strong>red as wrong <strong>data</strong>.<br />

3 Filter<strong>in</strong>g us<strong>in</strong>g the alarm records at the SCADA<br />

system plus a robust statistical method<br />

Once the alarm records have been consi<strong>de</strong>red, the<br />

robust method, expla<strong>in</strong>ed <strong>in</strong> section 4, is applied.<br />

4 Filter<strong>in</strong>g us<strong>in</strong>g only a classical statistical method<br />

Tak<strong>in</strong>g <strong>in</strong>to account the orig<strong>in</strong>al <strong>data</strong> set, the mean<br />

value (μ) and the standard <strong>de</strong>viation (σ) are<br />

calculated for each b<strong>in</strong>. Then, all the <strong>data</strong> that are<br />

bigger than μ + 3σ or smaller than μ – 3σ are<br />

consi<strong>de</strong>red as wrong <strong>data</strong>.<br />

5 Filter<strong>in</strong>g us<strong>in</strong>g only a robust statistical method<br />

Tak<strong>in</strong>g <strong>in</strong>to account the orig<strong>in</strong>al <strong>data</strong> set, the robust<br />

method, expla<strong>in</strong>ed <strong>in</strong> section 4, is applied.<br />

In both tests with robust <strong>filter<strong>in</strong>g</strong>, the threshold taken<br />

<strong>in</strong> practice to reject outliers is 3σˆ .<br />

5.2 Results<br />

The different methods expla<strong>in</strong>ed <strong>in</strong> section 5.1 have<br />

been applied, and the results obta<strong>in</strong>ed are shown <strong>in</strong><br />

different figures.<br />

The results obta<strong>in</strong>ed us<strong>in</strong>g the first method are<br />

shown <strong>in</strong> figure 2. As has been expla<strong>in</strong>ed <strong>in</strong> section 1<br />

this method it is not suitable to carry out the task<br />

required.<br />

The second method proposed gets much better<br />

results (see figure 5), but there are some <strong>data</strong> that are<br />

clearly wrong, and they have not been <strong>de</strong>tected.<br />

These wrong <strong>data</strong> belongs ma<strong>in</strong>ly to zones 1 and 2 of<br />

figure 1.<br />

<strong>power</strong> (kW)<br />

900<br />

800<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

0 5 10 15 20 25<br />

w<strong>in</strong>d speed (m/s)<br />

Figure 5: Data set after apply<strong>in</strong>g the <strong>filter<strong>in</strong>g</strong><br />

method 2<br />

The third method which <strong>in</strong>clu<strong>de</strong>s the robust method<br />

fits quite well with the objective proposed (figure 6).<br />

<strong>power</strong> (kW)<br />

900<br />

800<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

0 5 10 15 20 25<br />

w<strong>in</strong>d speed (m/s)<br />

Figure 6: Data set after apply<strong>in</strong>g the <strong>filter<strong>in</strong>g</strong><br />

method 3<br />

Once the suitability of the proposed robust statistical<br />

method has been <strong>de</strong>monstrated, we want to show the<br />

special robustness of it. To do so, the classical<br />

statistical filter and the robust filter have been applied<br />

over the orig<strong>in</strong>al <strong>data</strong> set. The results are shown <strong>in</strong><br />

figures 7 and 8 respectively.


<strong>power</strong> (kW)<br />

900<br />

800<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

0 5 10 15 20 25<br />

w<strong>in</strong>d speed (m/s)<br />

Figure 7: Data set after apply<strong>in</strong>g <strong>filter<strong>in</strong>g</strong> method 4<br />

<strong>power</strong> (kW)<br />

900<br />

800<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

0 5 10 15 20 25<br />

w<strong>in</strong>d speed (m/s)<br />

Figure 8: Data set after apply<strong>in</strong>g the <strong>filter<strong>in</strong>g</strong><br />

method 5<br />

It is clear that the method proposed behaves well<br />

even <strong>in</strong> extremely bad conditions <strong>in</strong> which the<br />

classical method does not work so well.<br />

Conclusions<br />

Characterisation of the w<strong>in</strong>d turb<strong>in</strong>e <strong>power</strong> curves <strong>in</strong><br />

relation to the w<strong>in</strong>d speed recor<strong>de</strong>d <strong>in</strong> real scenarios<br />

is a problem that must be solved <strong>in</strong> or<strong>de</strong>r to optimise<br />

the operation and ma<strong>in</strong>tenance of the w<strong>in</strong>d farm.<br />

To fit the mo<strong>de</strong>ls some <strong>data</strong> must be used and<br />

their quality must be assured. It has been shown that<br />

spurious <strong>data</strong> is numerous <strong>in</strong> w<strong>in</strong>d <strong>power</strong> <strong>data</strong> sets, so<br />

the application of automatic <strong>filter<strong>in</strong>g</strong> techniques is<br />

essential to <strong>de</strong>al with this problem.<br />

The Least Median Squared (LMedS) method has<br />

been proposed <strong>in</strong> or<strong>de</strong>r to filter the cru<strong>de</strong> <strong>data</strong> taken<br />

from a real w<strong>in</strong>d farm. The results obta<strong>in</strong>ed are good<br />

and the method has shown a very good robustness for<br />

reject<strong>in</strong>g bad <strong>data</strong>.<br />

The proposed technique elim<strong>in</strong>ates the need to<br />

perform various <strong>filter<strong>in</strong>g</strong> steps usually done manually<br />

to reject outliers, reduc<strong>in</strong>g the time required for the<br />

process to 20%.<br />

Acknowledgement<br />

This work is be<strong>in</strong>g carried out thanks to the Spanish<br />

M<strong>in</strong>isterio <strong>de</strong> Ciencia y Tecnologia (projects:<br />

DPI2003–09731 and CIT-020500-2005-30).<br />

References<br />

[1] S. Li, D. C. Wunsch, E. A. O’Hair and M. G.<br />

Giesselmann, “W<strong>in</strong>d turb<strong>in</strong>e <strong>power</strong> estimation<br />

by neural networks with kalman filter tra<strong>in</strong><strong>in</strong>g<br />

on a SIMD parallel mach<strong>in</strong>e”, International<br />

Jo<strong>in</strong>t Conference on Neural Networks, Vol. 5,<br />

pp 3430 – 3434, July 1999<br />

[2] S. Li, D. C. Wunsch, E. A. O’Hair and M. G.<br />

Giesselmann, “Us<strong>in</strong>g Neural Networks to<br />

Estimate W<strong>in</strong>d Turb<strong>in</strong>e Power Generation”,<br />

IEEE trans. On Energy Conversion, Vol. 16, Nº<br />

3, pp 276-282, September 2001<br />

[3] S. Kélouwani, K. Agbossou, “Nonl<strong>in</strong>ear Mo<strong>de</strong>l<br />

I<strong>de</strong>ntification of W<strong>in</strong>d Turb<strong>in</strong>e with a Neural<br />

Network”, IEEE trans. On Energy Conversion,<br />

Vol. 19, Nº 3, pp 607-612, September 2004<br />

[4] P. Rousseeuw and A. Leroy, <strong>Robust</strong> Regression<br />

and Outlier Detection (John Wiley, New York,<br />

1987).<br />

[5] A. Llombart, S. J. Watson, D. Llombart and<br />

J.M. Fandos; “Power Curve Characterization I:<br />

improv<strong>in</strong>g the b<strong>in</strong> method” ICREPQ, 2005.<br />

[6] M. Sanz-Badía, F. J. Val, A. Llombart, 2001.<br />

Método para el control <strong>de</strong> producción en<br />

aerogeneradores eléctricos. Patent Nº<br />

ES2198212. Exten<strong>de</strong>d to PCT, Argent<strong>in</strong>a and<br />

Chile.<br />

[7] Z. Zhang, “Parameter Estimation Techniques: A<br />

tutorial with Application to Conic Fitt<strong>in</strong>g,"<br />

Rapport <strong>de</strong> recherche RR-2676, I.N.R.I.A.,<br />

Sophia-Antipolis, France (1995).<br />

[8] M. A. Fischler and R. C. Bolles, “Random<br />

Sample Consensus: A Paradigm for Mo<strong>de</strong>l<br />

Fitt<strong>in</strong>g with Applications to Image Analysis and<br />

Automated Cartography," Comm. of the ACM<br />

vol. 24, pp. 381-395, 1981.<br />

[9] T. S. Nielsen, H. Madsen and J. Toft<strong>in</strong>g,<br />

“Experiences with statistical methods for w<strong>in</strong>d<br />

<strong>power</strong> prediction ," <strong>in</strong> Proceed<strong>in</strong>gs od the<br />

European W<strong>in</strong>d Energy Conference. 1999 Nice,<br />

France, pp1066-1069.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!