03.07.2013 Views

Quantile optimization for heavy-tailed distributions ... - CASTLE Lab

Quantile optimization for heavy-tailed distributions ... - CASTLE Lab

Quantile optimization for heavy-tailed distributions ... - CASTLE Lab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

QUANTILE OPTIMIZATION FOR HEAVY-TAILED<br />

DISTRIBUTIONS USING ASYMMETRIC SIGNUM FUNCTIONS<br />

JAE HO KIM ∗ , WARREN B. POWELL † , AND RICARDO A. COLLADO ‡<br />

Abstract. In this article, we present a provably convergent algorithm <strong>for</strong> computing the quantile<br />

of a continuous random variable that does not require the existence of expectation or storing all of the<br />

sample realizations. We then present an algorithm <strong>for</strong> optimizing the quantile of a random function<br />

satisfying some strict monotonicity and differentiability properties. This random function may be<br />

characterized by a <strong>heavy</strong>-<strong>tailed</strong> distribution where the expectation is not defined. The algorithm is<br />

illustrated in the context of electricity trading in the presence of storage, where electricity prices are<br />

known to be <strong>heavy</strong>-<strong>tailed</strong> with infinite variance.<br />

Key words. quantile <strong>optimization</strong>, stochastic gradient, signum functions, energy systems<br />

1. Introduction. For the past several decades, academic research in stochastic<br />

control and <strong>optimization</strong> has been focused mainly on maximizing or minimizing the<br />

expectation of a random process [12, 19, 21, 33, 34, 37, 38]. A separate line of research<br />

has focused on worst-case outcomes under the umbrella of robust <strong>optimization</strong><br />

[3]. However, when we are dealing with a non-stationary and complex or <strong>heavy</strong>-<strong>tailed</strong><br />

system, the expectation is difficult or impossible to compute and it is often meaningless<br />

to even try to approximate it. In those cases, it is rational to optimize our policy<br />

based on quantiles, which are zeroth order statistics derived directly from the probability<br />

distribution. The cumulative distribution function (CDF) and the inverse CDF,<br />

also known as the quantile function, can almost always be computed in a practical<br />

manner. Optimizing expectations is especially difficult in various financial markets<br />

including equities, commodities, and electricity markets, many of which are notorious<br />

<strong>for</strong> being non-stationary, extremely volatile, and <strong>heavy</strong>-<strong>tailed</strong> [2, 4, 5, 8]. For example,<br />

daily volatilities of 20-30% are common in electricity markets while the equity market<br />

also exhibits occasional <strong>heavy</strong>-<strong>tailed</strong> crashes as happened recently during the 1987<br />

meltdown, the LTCM crisis in 1998, the dot-com bubble burst in 2001, and of course,<br />

during the 2008 financial meltdown.<br />

A <strong>heavy</strong>-<strong>tailed</strong> distribution is defined by its structure of the decline in probabilities<br />

<strong>for</strong> large deviations. In a <strong>heavy</strong>-<strong>tailed</strong> environment, the usual statistical tools at our<br />

disposal can be tricked into producing erroneous results from observations of data in<br />

a finite sample and jumping to wrong conclusions. For example, suppose we want<br />

to compute the variance from a finite number of i.i.d. samples whose underlying<br />

distribution is <strong>heavy</strong>-<strong>tailed</strong> such that the second and higher-order moments are not<br />

finite. If we use the standard empirical <strong>for</strong>mula <strong>for</strong> computing the variance, we will<br />

get “a number,” thus giving the illusion of finiteness, but they typically show a high<br />

level of instability and grows steadily as we add more and more samples [43]. In this<br />

article, we present a framework <strong>for</strong> optimizing the median or any other quantile of<br />

a stochastic process subject to some strict monotonicity properties and show how it<br />

can be applied to optimizing a trading policy in the electricity markets. Since the<br />

quantile function always exists and is relatively stable even in a highly volatile and<br />

non-stationary environment, optimizing the quantile is a robust procedure, and almost<br />

∗email: jaek@princeton.edu<br />

† email: powell@princeton.edu / Author supported in part by a grant from SAP and AFOSR<br />

contract FA9550-08-1-0195.<br />

‡ email: rcollado@princeton.edu<br />

1


2 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

always a reasonable thing to do in a complex, volatile, and <strong>heavy</strong>-<strong>tailed</strong> environment<br />

or in a situation where the worst-case scenario is unknown or too extreme.<br />

For the past decade, many countries have deregulated their electricity markets.<br />

Electricity prices in deregulated markets are known to be very volatile and as a result<br />

participants in those markets face large risks. Recently, there has been a growing<br />

interest among electricity market participants about the possibility of using storage<br />

devices to mitigate the effect of the volatilities in electricity prices. However, storage<br />

devices have capacity limits and significant conversion losses, and it is unclear how one<br />

should use a storage device to maximize its value. Fortunately, unlike stock prices, it<br />

is well-known that electricity prices in deregulated markets exhibit reversion to the<br />

long-term “average,” allowing us to make directional bets [6, 7, 26, 32, 39]. The value<br />

of a storage device is determined by how good we are at making such bets-buying<br />

electricity when we believe the price will go up and selling electricity when we believe<br />

the price will go down. When using a storage device, we automatically lose 10-30% of<br />

the energy through conversion losses. The conversion loss can be seen as a transaction<br />

cost, and hence the storage is valuable only if the price process is volatile enough so<br />

that we can sell at a price that is sufficiently higher than the price we buy to make<br />

up <strong>for</strong> the loss due to conversion.<br />

The statistics community has long recognized that in the context of linear regression,<br />

minimizing the mean-squared or absolute-value error is an arbitrary choice. For<br />

certain applications, minimizing some quantile of the error terms instead can be a<br />

viable alternative. This approach is generally referred to as quantile regression, which<br />

minimizes the empirical expectation of a loss function on a given finite sample set via<br />

deterministic <strong>optimization</strong>. A comprehensive review of quantile regression and the<br />

relevant literature can be found in [20]. Meanwhile, the signal processing community<br />

has recognized that when the data is non-stationary and highly volatile, applying<br />

median filters [1] using order statistics yields better results than applying traditional<br />

filters based on minimizing empirical expectations.<br />

The method described in this article is different from the ones described in [20]<br />

or [1]. Instead of applying order statistics and deterministic algorithms on a given<br />

data set, we use an adaptation of the recursive algorithm pioneered by Robbins and<br />

Monro. In its seminal 1951 paper [36], Robbins and Monro addressed the problem of<br />

estimating the α-quantile of a distribution function by the use of response, no response<br />

data and a recursive algorithm that looks quite similar to the Newton’s method. Just<br />

like in the classical Newton’s method, Robbins and Monro’s method depends on a<br />

stepsize sequence with a set of properties that the authors named type 1/n. Crucial<br />

to this method is the existence of expectation of the related random variables used in<br />

the model. This quantile estimation problem is a simple application of a fundamental<br />

recursive method developed in [36] <strong>for</strong> stochastic root-finding which is considered by<br />

many to be the foundation of stochastic approximation. The Robbins and Monro<br />

recursive algorithm has spawned numerous research on stepsizes, stopping conditions<br />

and related root-finding methods, see [15, 16, 23, 24, 27]. Later, the method was<br />

generalized to the use of adaptive stepsizes with stochastic properties. See [22, 41] <strong>for</strong><br />

an overview of the development of adaptive stepsizes and extensions of the Robbins-<br />

Monro method.<br />

While general algorithms <strong>for</strong> estimating quantiles require storing the history of<br />

observations, our adaptation of Robbins and Monro method per<strong>for</strong>ms quantile <strong>optimization</strong><br />

without this requirement, making our implementation much more compact.<br />

The first contribution of this article is the development of an algorithm which avoids


QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 3<br />

the need to store the history of our process by replacing the stochastic gradient with<br />

the asymmetric signum function. Unlike the stochastic approximation algorithms that<br />

are derived from Robbins and Monro [36], we do not assume that the expectation of<br />

the random variable or random function exists. This is a fundamental difference<br />

that makes our algorithm applicable to <strong>heavy</strong>-<strong>tailed</strong> random variables and random<br />

functions whose mean and variance are not necessarily well-defined.<br />

Our next contribution is the development of a quantile <strong>optimization</strong> problem<br />

with an objective function that satisfies some strict monotonicity and differentiability<br />

properties. We use the quantile algorithm to develop a recursive method with a<br />

stochastic stepsize to solve our quantile <strong>optimization</strong> problem. Then we apply our<br />

<strong>optimization</strong> problem to trading electricity in the spot market in the presence of<br />

storage. This application is in a <strong>heavy</strong>-<strong>tailed</strong> environment where the expectation is<br />

not necessarily well defined and the application of other methods would fail.<br />

This article is organized as follows. In §2, we present a provably convergent<br />

algorithm that allows us to find any quantile of a continuous random variable. In §3,<br />

we present a framework <strong>for</strong> optimizing the quantile of a random function satisfying<br />

some strict monotonicity and differentiability properties that are defined in §3.1. In §4,<br />

we apply our quantile <strong>optimization</strong> method to trading electricity in the spot market<br />

in the presence of storage. In §5, we compare the trading strategy <strong>for</strong> maximum profit<br />

based on quantiles to the standard trading policy based on the hour of the day. In<br />

§6, we summarize our conclusions.<br />

2. Computing the <strong>Quantile</strong> of a Random Variable. In this section, we provide<br />

a simple, provably convergent algorithm that allows us to compute the quantile of<br />

a continuous random variable without storing the history of observations. Let (Ω, F)<br />

be a sample space with set of all possible outcomes Ω. Let X1, X2, . . . be a sequence<br />

of continuous i.i.d. random variables on Ω such that −∞ < Xn < ∞, ∀n ≥ 0.<br />

For z ∈ R and n ≥ 1, let Fn(z) = P [Xn ≤ z] be the right-continuous CDF of<br />

the random variable Xn. The i.i.d. property of (Xn) n≥1 guarantee us that all the<br />

CDF’s Fn are identical so we can simply denote them by F . For any α ∈ (0, 1), the<br />

α-quantile of Xn is defined by<br />

qα := inf {z ∈ R : P [Xn ≤ z] ≥ α}<br />

= inf {z ∈ R : F (z) ≥ α} .<br />

We can see in the last equality that qα is well defined due to the right-continuity of<br />

F .<br />

We consider a filtration F0 ⊆ · · · ⊆ Fn ⊆ · · · ⊆ F where each Fn is generated by<br />

the in<strong>for</strong>mation given up to time n:<br />

Fn = σ (x0, . . . , xn) ,<br />

here xi is the outcome of the random variable Xi, 1 ≤ i ≤ n. We assume that <strong>for</strong> each<br />

n ≥ 0, Xn is Fn-measurable but we do not assume that E [Xn] exists. For example,<br />

(Xn) n≥0 can be i.i.d. Cauchy random variables.<br />

Given Fn, the most common and straight<strong>for</strong>ward method <strong>for</strong> approximating qα<br />

is to sort (Xi) 0≤i≤n in increasing order and pick the nα th smallest number. However,<br />

in order to find the exact quantile using this method, we need an infinite amount<br />

of memory to store all (Xi) 0≤i≤n as n → ∞, which is not practical. We present an<br />

algorithm that only requires us to store one estimate of the quantile at any given time<br />

and show that it converges to the true quantile as n → ∞.


4 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

For α ∈ (0, 1) define the asymmetric signum function as<br />

⎧<br />

⎨ 1 − α, if u ≥ 0,<br />

sgnα(u) =<br />

⎩ −α, if u < 0.<br />

Theorem 2.1. Let Y0 ∈ R be some finite number and<br />

where γn−1 ≥ 0 a.s. and (γn) ∞<br />

Yn = Yn−1 − γn−1sgnα (Yn−1 − Xn) ,<br />

n=0<br />

F0 ⊆ · · · ⊆ Fn ⊆ · · · ⊆ F satisfying<br />

and<br />

Then,<br />

is a stochastic stepsize adapted to the filtration<br />

∞<br />

γn = ∞ a.s.<br />

n=0<br />

∞<br />

(γn) 2 < ∞ a.s.<br />

n=0<br />

lim<br />

n→∞ Yn<br />

a.s.<br />

= qα.<br />

Proof. We know that Yn is Fn-measurable and<br />

(Yn − qα) 2 = (Yn−1 − qα) 2 − 2γn−1 (Yn−1 − qα) · sgn (Yn−1 − Xn) + (γn−1) 2 λn,<br />

where<br />

and hence 0 ≤ λn ≤ 1. Since<br />

λn := sgn 2 α (Yn−1 − Xn)<br />

E [sgn α (Yn−1 − Xn) | Fn−1] = (1 − α) P [Xn ≤ Yn−1 | Fn−1] − αP [Xn > Yn−1 | Fn−1]<br />

we know that<br />

There<strong>for</strong>e,<br />

Next, define<br />

= (1 − α) P [Xn ≤ Yn−1 | Fn−1] − α (1 − P [Xn ≤ Yn−1 | Fn−1])<br />

= P [Xn ≤ Yn−1 | Fn−1] − α<br />

≤ 0 if and only if Yn−1 ≤ qα,<br />

E [(Yn−1 − qα) sgn α (Yn−1 − Xn) | Fn−1] ≥ 0. (2.1)<br />

<br />

E (Yn − qα) 2 <br />

| Fn−1 ≤ (Yn−1 − qα) 2 + (γn−1) 2 , ∀n ≥ 1. (2.2)<br />

Wn := (Yn − qα) 2 +<br />

∞<br />

(γm) 2 , ∀n ∈ N+. (2.3)<br />

m=n


Then, from (2.2),<br />

QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 5<br />

E [Wn+1 | Fn] ≤ Wn, ∀n ∈ N+,<br />

showing that (Wn) n≥1 is a positive supermartingale. Then, by the martingale convergence<br />

theorem,<br />

where<br />

Moreover, since<br />

lim<br />

n→∞ (Yn − qα) 2 = lim<br />

n→∞ Wn<br />

a.s.<br />

= W∞, (2.4)<br />

0 ≤ W∞ < ∞.<br />

0 ≤ E [Wn | F0] ≤ W0 < ∞, ∀n ∈ N+,<br />

by the dominated convergence theorem,<br />

<br />

E [W∞ | F0] = E lim<br />

N→∞ WN<br />

<br />

| F0<br />

Define<br />

= lim<br />

N→∞ E [WN | F0] . (2.5)<br />

βn−1,n := E [(Yn−1 − qα) · sgn α (Yn−1 − Xn) | Fn−1] .<br />

From (2.1), βn−1,n ≥ 0, and hence<br />

Next, from (2.3),<br />

β0,n = E [(Yn−1 − qα) · sgn α (Yn−1 − Xn) | F0]<br />

= E [E [(Yn−1 − qα) · sgn α (Yn−1 − Xn) | Fn−1] | F0]<br />

= E [βn−1,n | F0] ≥ 0.<br />

WN − (Y0 − qα) 2 = (YN − qα) 2 − (Y0 − qα) 2 +<br />

=<br />

N<br />

n=1<br />

= −2<br />

∞<br />

m=N<br />

(γm) 2<br />

<br />

(Yn − qα) 2 − (Yn−1 − qα) 2<br />

+<br />

N<br />

n=1<br />

∞<br />

m=N<br />

(γm) 2<br />

γn−1 (Yn−1 − qα) · sgn α (Yn−1 − Xn) +<br />

There<strong>for</strong>e, from (2.5),<br />

E [W∞ | F0] − (Y0 − qα) 2 = lim<br />

N→∞ E<br />

<br />

WN − (Y0 − qα) 2 <br />

| F0<br />

N<br />

= (γn−1) 2 λn +<br />

∞<br />

(γm) 2<br />

=<br />

n=1<br />

−2 lim<br />

N<br />

n=1<br />

N<br />

m=N<br />

N<br />

n=1<br />

(γn−1) 2 λn +<br />

γn−1E [(Yn−1 − qα) · sgnα (Yn−1 − Xn) | F0]<br />

N→∞<br />

n=1<br />

(γn−1) 2 λn +<br />

∞<br />

m=N<br />

(γm) 2 − 2<br />

∞<br />

n=1<br />

γn−1β0,n.<br />

∞<br />

m=N<br />

(γm) 2 .


6 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

Since the left-hand side is finite and<br />

we must have<br />

Since β0,n ≥ 0 and<br />

we conclude that<br />

N<br />

n=1<br />

(γn−1) 2 λn +<br />

∞<br />

m=N<br />

(γm) 2 ≤<br />

∞<br />

(γm) 2 < ∞,<br />

n=1<br />

∞<br />

γn−1β0,n < ∞.<br />

n=1<br />

∞<br />

γn−1 > ∞,<br />

n=1<br />

lim inf<br />

n→∞ β0,n = lim<br />

n→∞ E [βn−1,n | F0] = 0.<br />

There<strong>for</strong>e there is a subsequence (n∗ ) ⊆ (n) ∞<br />

n=0 such that<br />

lim<br />

n∗ β0,n∗ = lim<br />

→∞ n∗→∞ E [βn∗−1,n∗ | F0] = 0.<br />

However, since βn ∗ −1,n∗ is a non-negative random variable, this implies that βn ∗ −1,n∗<br />

goes to 0 with probability 1 as n ∗ → ∞. In other words,<br />

Since<br />

lim<br />

n∗→∞ βn∗−1,n∗ = lim<br />

n→∞ (Yn∗−1 − qα) E [sgnα (Yn∗−1 − Xn∗) | Fn∗−1] a.s.<br />

= 0.<br />

if and only if Yn ∗ −1 = qα,<br />

E [sgn α (Yn ∗ −1 − Xn ∗) | Fn ∗ −1] = 0<br />

lim<br />

n∗ a.s.<br />

Yn∗ = qα. (2.6)<br />

→∞<br />

This proves our goal but only <strong>for</strong> the subsequence (Yn∗). In order to get the convergence<br />

of the whole sequence (Yn) ∞<br />

n=0 we have to look back at equation (2.4) from<br />

which we obtain that<br />

where<br />

lim<br />

n→∞ |Yn − qα| a.s.<br />

= W∞, (2.7)<br />

0 ≤ W∞ < ∞.<br />

Let Y + n , Y − n be the the random variables<br />

Y + <br />

Yn<br />

n =<br />

Y + n−1 otherwise,<br />

if Yn ≥ qα,


and<br />

QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 7<br />

We can see from (2.7) that<br />

and<br />

Y − n =<br />

<br />

Yn<br />

Y − n−1 otherwise.<br />

if Yn < qα,<br />

lim<br />

n→∞ Y + a.s.<br />

n − qα = W∞,<br />

lim<br />

n→∞ qα − Y − n<br />

a.s.<br />

= W∞,<br />

a.s.<br />

= qα − √ W∞. By<br />

which implies that limn→∞ Y + a.s.<br />

n = qα + √ W∞ and limn→∞ Y − n<br />

(2.6) we can conclude that at least one of Y + n or Y − n converges almost surely to qα,<br />

which implies that W∞ = 0 and so<br />

Clearly,<br />

and so<br />

as we set to prove.<br />

lim<br />

n→∞ Y + n<br />

Yn =<br />

a.s. a.s.<br />

= qα = lim<br />

n→∞ Y − n .<br />

<br />

Y + n<br />

Y − n<br />

if Yn ≥ qα,<br />

otherwise,<br />

lim<br />

n→∞ Yn<br />

a.s.<br />

= qα,<br />

3. Optimizing the <strong>Quantile</strong> of a Random Function. In this section, we<br />

show how to optimize the quantile of an almost everywhere differentiable function.<br />

We show that a monotonicity condition in a random function and its derivative with<br />

respect to a sample realization allows us to optimize the quantile of the random<br />

function using a simple, provably convergent algorithm. We also show that the wellknown<br />

newsvendor problem has the a<strong>for</strong>ementioned monotonicity.<br />

3.1. Monotonicity Assumptions. Let Ω be the set of all possible outcomes<br />

and let X ⊆ R be a compact control space. We start by defining our control variable<br />

x ∈ X as a scalar <strong>for</strong> a clear and concise presentation of our concepts and the proof of<br />

convergence <strong>for</strong> our algorithm. Then, we generalize our process to the vector-valued<br />

case. Let f : X × Ω ↦→ R be a function satisfying the following conditions:<br />

(C1) f (·, ω) is convex and continuous <strong>for</strong> all ω ∈ Ω;<br />

(C2) f (·, ω) is differentiable almost everywhere <strong>for</strong> all ω ∈ Ω;<br />

(C3) f (x, ·) is a real-valued random variable <strong>for</strong> all x ∈ X .<br />

For example, f (x, ω) may be piecewise-linear with respect to x ∈ X with a finite<br />

number of break-points <strong>for</strong> all fixed ω ∈ Ω. Let f (x) denote the random variable<br />

whose specific sample realization is f (x, ω) . Also, let Qα denote the operator such<br />

that <strong>for</strong> any random variable X,<br />

QαX = inf {z ∈ R : P [X ≤ z] ≥ α}


8 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

gives the α-quantile of the random variable X.<br />

For x ∈ X let Fx(z) = P [f(x) ≤ z] be the right-continuous CDF of the random<br />

variable f(x). Then the α-quantile of f(x) is given by<br />

Qαf(x) = inf {z ∈ R : P [f(x) ≤ z] ≥ α}<br />

= inf {z ∈ R : Fx(z) ≥ α} ,<br />

where, as we saw be<strong>for</strong>e, Qαf(x) is guaranteed to exist due to the right-continuity of<br />

Fx.<br />

Let z ∗ := Qαf(x). If Fx is not continuous at z ∗ then the fact that Fx is nondecreasing<br />

and bounded help us see that<br />

P [f(x) < z ∗ ] = lim<br />

z→z ∗ − Fx(z) < lim<br />

z→z ∗ + Fx(z) = Fx(z ∗ ) = P [f(x) ≤ z ∗ ] . (3.1)<br />

From (3.1) and the definition of the probability of a random variable we conclude that<br />

z ∗ is in the image of f(x). Un<strong>for</strong>tunately if Fx is continuous at z ∗ we cannot readily<br />

conclude that z ∗ is in the image of f(x). This is an important property <strong>for</strong> us and<br />

thus we add it as a requirement <strong>for</strong> the function f:<br />

(C4) Qαf(x) is in the image of f(x) <strong>for</strong> every x ∈ X .<br />

The following propositions show the monotonicity assumptions that allow us to<br />

interchange the gradient ∂/∂x and the quantile operator Qα.<br />

Proposition 3.1. Assume that<br />

f (x, ω1) ≤ f (x, ω2) if and only if ∂<br />

∂x f (x, ω1) ≤ ∂<br />

∂x f (x, ω2) , (3.2)<br />

∀ω1, ω2 ∈ Ω and ∀x ∈ X . Then,<br />

Qα<br />

Proposition 3.2. Assume that<br />

∂ ∂<br />

f (x) =<br />

∂x ∂x Qαf (x) , ∀x ∈ X .<br />

f (x, ω1) ≤ f (x, ω2) if and only if ∂<br />

∂x f (x, ω1) ≥ ∂<br />

∂x f (x, ω2) , (3.3)<br />

∀ω1, ω2 ∈ Ω and ∀x ∈ X . Then,<br />

Let<br />

and<br />

Q1−α<br />

∂ ∂<br />

f (x) =<br />

∂x ∂x Qαf (x) , ∀x ∈ X .<br />

Proof. By condition (C4) of f(x) there is a sample realization ω ∗ ∈ Ω such that<br />

Qαf (x) = f (x, ω ∗ ) .<br />

A := {ω ∈ Ω | f (x, ω) ≤ f (x, ω ∗ )}<br />

B := {ω ∈ Ω | f (x, ω) > f (x, ω ∗ )} .


Then, if (3.2) holds,<br />

and hence<br />

QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 9<br />

∂<br />

∂x f (x, ω1) ≤ ∂<br />

∂x f (x, ω∗ ) < ∂<br />

∂x f (x, ω2) , ∀ω1 ∈ A, ω2 ∈ B,<br />

Qα<br />

On the other hand, if (3.3) holds,<br />

and hence<br />

∂ ∂<br />

f (x) =<br />

∂x ∂x f (x, ω∗ ) = ∂<br />

∂x Qαf (x) .<br />

∂<br />

∂x f (x, ω1) ≥ ∂<br />

∂x f (x, ω∗ ) > ∂<br />

∂x f (x, ω2) , ∀ω1 ∈ A, ω2 ∈ B,<br />

Q1−α<br />

∂ ∂<br />

f (x) =<br />

∂x ∂x f (x, ω∗ ) = ∂<br />

∂x Qαf (x) .<br />

We can illustrate Proposition 3.2 through the following example. Suppose X =<br />

[0, 1] and<br />

f (x) = (1 − x) U, ∀x ∈ X ,<br />

where U ∼ U (0, 1) is a uni<strong>for</strong>m random variable. Then, ∀ω1, ω2 ∈ Ω and ∀x ∈ X ,<br />

f (x, ω1) = (1 − x) U (ω1) ≤ (1 − x) U (ω2) = f (x, ω2)<br />

if and only if U (ω1) ≤ U (ω2) , and thus<br />

∂<br />

∂x f (x, ω1) = −U (ω1) ≥ −U (ω2) = ∂<br />

∂x f (x, ω2) ,<br />

satisfying (3.3). Next, Qαf (x) = α (1 − x) and hence<br />

On the other hand,<br />

∂<br />

∂x Qαf (x) = −α, ∀0 < α < 1.<br />

∂<br />

∂<br />

f (x) = −U and hence Q1−α f (x) = −α,<br />

∂x ∂x<br />

satisfying Proposition 3.1.<br />

Note that the classical stochastic gradient algorithm that allows us to optimize<br />

the expectation of a function works because the expectation operator is interchangeable<br />

with the differential operator as long as the expectation is finite. We do not<br />

need to satisfy this condition when optimizing quantiles. Our goal is to develop an<br />

algorithm that is analogous to the classical stochastic gradient algorithm that allows<br />

us to optimize the quantile of a function.


10 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

3.2. <strong>Quantile</strong> Optimization. We begin by stating the algorithm <strong>for</strong> quantile<br />

<strong>optimization</strong> and proving its convergence.<br />

From this point on we assume that the function f and α ∈ (0, 1) are such that<br />

argmin<br />

x∈X<br />

Qα f (x) (3.4)<br />

is a real number. For example, a way to accomplish (3.4) is by assuming that there is<br />

a metric on Ω that makes the CDF Fx(z) : X × R → R uni<strong>for</strong>mly continuous. Then<br />

we can conclude that Qαf(·) is a lower semi-continuous function and that (3.4) is well<br />

defined.<br />

Theorem 3.3. Let f : X × Ω → R be a function as defined be<strong>for</strong>e. Define<br />

x α := argmin<br />

x∈X<br />

Qα f (x) .<br />

Let x0 ∈ X be fixed. For n ∈ N+ and xn−1 ∈ X , let fn (xn−1) be the random variable<br />

whose distribution is that of f (xn−1). Define Fn as the sigma algebra on Ω generated<br />

by the in<strong>for</strong>mation given up to time n:<br />

Denote<br />

Let<br />

⎧<br />

⎨<br />

xn =<br />

⎩<br />

Fn = σ (f1(x0), . . . , fn(xn−1)) .<br />

f ′ n (xn−1) := ∂<br />

∂x fn (x, ω) |x=xn−1 .<br />

xn−1 − γn−1sgn α (f ′ n (xn−1)) , if (3.2) holds <strong>for</strong> f (x)<br />

xn−1 − γn−1sgn 1−α (f ′ n (xn−1)) , if (3.3) holds <strong>for</strong> f (x)<br />

where γn−1 ≥ 0 a.s. and (γn) ∞<br />

F1 ⊆ F2 ⊆ · · · ⊆ F satisfying<br />

and<br />

Then,<br />

n=0<br />

, ∀n ∈ N+,<br />

is a stochastic stepsize adapted to the filtration<br />

∞<br />

γn = ∞ a.s. (3.5)<br />

n=0<br />

∞<br />

(γn) 2 < ∞ a.s. (3.6)<br />

n=0<br />

lim<br />

n→∞ xn<br />

a.s. α<br />

= x .<br />

Proof. Notice first that by construction the sequence of random variables (fn (xn−1)) ∞<br />

n=1<br />

is adapted to the filtration F1 ⊆ F2 ⊆ · · · ⊆ F. Assume (3.2) holds, let x0 ∈ X and<br />

xn = xn−1 − γn−1sgn α (f ′ n (xn−1)) .


Then,<br />

QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 11<br />

(xn − x α ) 2 = (xn−1 − x α ) 2 − 2γn−1 (xn−1 − x α ) · sgn α (f ′ n (xn−1)) + (γn−1) 2 λn,<br />

where<br />

From assumption (3.2), if xn−1 ≤ x α ,<br />

and if xn−1 > x α ,<br />

There<strong>for</strong>e,<br />

λ α n := sgn 2 α (f ′ n (xn−1)) .<br />

Qαf ′ n (xn−1) = ∂<br />

∂x Qαf (x) |x=xn−1 ≤ 0<br />

Qαf ′ n (xn−1) = ∂<br />

∂x Qαf (x) |x=xn−1 > 0.<br />

E [sgn α (f ′ n (xn−1)) | Fn−1] = (1 − α) P [f ′ n (xn−1) ≥ 0 | Fn−1] − αP [f ′ n (xn−1) < 0 | Fn−1]<br />

and hence<br />

Then,<br />

= (1 − α) P [f ′ n (xn−1) ≥ 0 | Fn−1] − α (1 − P [f ′ n (xn−1) ≥ 0 | Fn−1])<br />

= P [f ′ n (xn−1) ≥ 0 | Fn−1] − α<br />

≤ 0 if and only if xn−1 ≤ x α ,<br />

E [(xn−1 − x α ) · sgn α (f ′ n (xn−1)) | Fn−1] ≥ 0.<br />

<br />

E (xn − x α ) 2 <br />

| Fn−1 ≤ (xn−1 − x α ) 2 + (γn−1) 2 , ∀n ≥ 1.<br />

Using the same proof technique used in section §2, we can show that<br />

Then,<br />

lim<br />

n→∞ xn<br />

a.s. α<br />

= x .<br />

Similarly, if assumption (3.3) holds instead, we can let x0 ∈ X and<br />

xn = xn−1 − γn−1sgn 1−α (f ′ n (xn−1)) .<br />

(xn − x α ) 2 = (xn−1 − x α ) 2 − 2γn−1 (xn−1 − x α ) · sgn 1−α (f ′ n (xn−1)) + (γn−1) 2 λn,<br />

where<br />

λ 1−α<br />

n<br />

Under assumption (3.3), if xn−1 ≤ x α ,<br />

:= sgn 2 1−α (f ′ n (xn−1)) .<br />

Q1−αf ′ n (xn−1) = ∂<br />

∂x Qαf (x) |x=xn−1 ≤ 0


12 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

and if xn−1 > x α ,<br />

There<strong>for</strong>e,<br />

Q1−αf ′ n (xn−1) = ∂<br />

∂x Qαf (x) |x=xn−1 > 0.<br />

E sgn1−α (f ′ ′<br />

n (xn−1)) | Fn−1 = αP [f n (xn−1) ≥ 0 | Fn−1] − (1 − α) P [f ′ n (xn−1) < 0 | Fn−1]<br />

and hence<br />

Again,<br />

and we can show that<br />

= αP [f ′ n (xn−1) ≥ 0 | Fn−1] − (1 − α) (1 − P [f ′ n (xn−1) ≥ 0 | Fn−1])<br />

= P [f ′ n (xn−1) ≥ 0 | Fn−1] − (1 − α)<br />

≤ 0 if and only if xn−1 ≤ x α ,<br />

E (xn−1 − x α ) · sgn1−α (f ′ <br />

n (xn−1)) | Fn−1 ≥ 0.<br />

<br />

E (xn − x α ) 2 <br />

| Fn−1 ≤ (xn−1 − x α ) 2 + (γn−1) 2 , ∀n ≥ 1,<br />

lim<br />

n→∞ xn<br />

a.s. α<br />

= x .<br />

While we have shown the proof <strong>for</strong> scalar x, the algorithm can be readily extended<br />

to the multivariable case as long as the monotonicity assumption holds in each of the<br />

variables and corresponding partial derivatives. For a simple illustration, suppose<br />

f ( −→ x , ω) is convex in −→ x = x 1 , x 2 , . . . , x m ∈ X <strong>for</strong> some m-dimensional compact<br />

space X ⊂ R m and fixed ω ∈ Ω.<br />

Theorem 3.4. For each 1 ≤ i ≤ m, assume that either<br />

∂<br />

∂xi f (−→ x , ω1) ≤ ∂<br />

∂xi f (−→ x , ω2) if and only if f ( −→ x , ω1) ≤ f ( −→ x , ω2) (3.7)<br />

holds true or<br />

∂<br />

∂xi f (−→ x , ω1) ≥ ∂<br />

∂xi f (−→ x , ω2) if and only if f ( −→ x , ω1) ≤ f ( −→ x , ω2) (3.8)<br />

holds true, ∀ω1, ω2 ∈ Ω and ∀ −→ x ∈ X . For some n ∈ N+ and xn−1 ∈ X , let fn ( −→ x n−1)<br />

be the Fn-measurable random variable whose distribution is that of f ( −→ x n−1). Then,<br />

if we let<br />

and<br />

x i n =<br />

⎧<br />

⎨<br />

⎩<br />

−→ x 0 = x 1 0, x 2 0, ..., x m 0<br />

xi n−1 − γi <br />

∂<br />

n−1sgnα x i n−1 − γ i n−1sgn 1−α<br />

<strong>for</strong> some −→ x 0 ∈ X<br />

∂x i fn ( −→ x ) |−→ x = −→ x n−1<br />

while the stochastic stepsizes γ i n ≥ 0 satisfy<br />

∂<br />

∂x i fn ( −→ x ) |−→ x = −→ x n−1<br />

∞<br />

n=0<br />

γ i n = ∞ a.s.<br />

, if (3.7) holds true,<br />

, if (3.8) holds true,


and<br />

∀1 ≤ i ≤ m, then<br />

where<br />

QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 13<br />

∞<br />

n=0<br />

i 2<br />

γn < ∞ a.s,<br />

lim<br />

−→ a.s.<br />

x n =<br />

−→ α<br />

x .<br />

n→∞<br />

−→ x α = argmin<br />

−→ x ∈X<br />

Qα f ( −→ x ) .<br />

3.3. Initialization and the Scaling of Stepsizes. We have shown that the<br />

algorithm <strong>for</strong> finding some quantile of a random variable and optimizing some quantile<br />

of a random function converges as long as the stepsize γn ≥ 0 a.s. satisfies (3.5)<br />

and (3.6). Extensive study on various stepsizes <strong>for</strong> optimizing the expectation of a<br />

random function has been done. A comprehensive review of the literature can be<br />

found in [9, 33]. The typical approach <strong>for</strong> finding the optimal stepsizes in the case of<br />

optimizing expectations is as follows. Suppose (Xi) 1≤i≤n are i.i.d random variables<br />

with finite variance and we want to find the best stepsizes (γi−1) 1≤i≤n that gives the<br />

best estimate <strong>for</strong> the E [Xi] . Then, the mean-squared error<br />

min E<br />

γ0,...,γn<br />

n<br />

i=1<br />

γi−1Xi − E [Xi]<br />

is minimized when γn−1 = 1<br />

n , ∀n. When (Xi) 1≤i≤n are not i.i.d, different stepsizes<br />

can give better results.<br />

However, when computing quantiles using asymmetric signum functions, it is<br />

difficult to determine what the optimal stepsizes should be even <strong>for</strong> the i.i.d case<br />

because it is unclear what we should try to minimize. We cannot minimize the<br />

expectation or variance in a <strong>heavy</strong>-<strong>tailed</strong> environment. Thus, instead of trying to<br />

identify stepsizes that are optimal in some sense, we present an ad-hoc method <strong>for</strong><br />

practical use. Let (Xi) 1≤i≤n be i.i.d random variables and assume we want to compute<br />

some α-quantile. However, suppose the α-quantile of Xi is 10 6 , <strong>for</strong> example. Then, if<br />

we use the following algorithm<br />

2<br />

Yn = Yn−1 − γn−1sgn α (Yn−1 − Xn) ,<br />

with Y0 = 0 and γn−1 = 1<br />

n , it will take prohibitively long <strong>for</strong> the algorithm to give us<br />

a reasonable estimate. Thus, finding a reasonable initialization point Y0 and stepsizes<br />

γn−1 is critical. We propose initially taking sample realizations (Xi) 1≤i≤k <strong>for</strong> some<br />

small k and sorting it in increasing order and pick the kαth smallest number and let<br />

it be Y0. Next, we let<br />

where<br />

σ = 1/2<br />

γn−1 := σ<br />

n<br />

(3.9)<br />

<br />

<br />

<br />

.75-quantile of the set (Xi) 1≤i≤k − .25-quantile of the set (Xi) 1≤i≤k


14 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

is a scaling factor.<br />

For the quantile <strong>optimization</strong> problem, first pick (xm) 1≤m≤M evenly dividing the<br />

compact space X <strong>for</strong> some small M. For each xm, we can generate (Fi (xm)) 1≤i≤k<br />

to find an approximation to the α-quantile of F (xm) . Then, we let our initialization<br />

point x0 be the one that gives the minimum α-quantile. That is, initialization requires<br />

generating M × k samples. Next, we let the stepsize to be given by equation (3.9)<br />

where we replace Xi with Fi (x0) . When optimizing expectations using the standard<br />

Robbins-Monro algorithm, the scaling factor has to strike a balance between the units<br />

of x ∈ X and the scale of the stochastic gradient, which can be highly volatile, ranging<br />

between large negative values and large positive values. However, when optimizing<br />

the quantile using our algorithm, the stepsizes have to be scaled based on x alone.<br />

If we have a rough idea of where the optimal x may fall, we can scale the stepsize<br />

accordingly.<br />

3.4. Newsvendor Problem. To develop an understanding of the properties<br />

of the algorithm, we begin by illustrating it in the context of a simple newsvendor<br />

problem [31]. Given a random demand D ∈ R+, we usually want to compute the<br />

optimal stocking quantity x ∈ R+ that maximizes the expected profit<br />

G (x) := E [p min {x, D} − cx]<br />

where p > 0 is the unit price, c > 0 is the unit cost, and p > c. For this simple<br />

problem, the optimal solution<br />

x ∗ = Q p−c D<br />

p<br />

is the p−c<br />

p -quantile of the random variable D, which can be computed using the<br />

algorithm shown in section §2. However, instead of maximizing the expected profit,<br />

suppose a manager wants to maximize some α-quantile of the profit<br />

This is equivalent to minimizing<br />

qα := Qα [p min {x, D} − cx] .<br />

q1−α := Q1−α [cx − p min {x, D}] .<br />

Let Ω be the set of all possible outcomes. For each ω ∈ Ω, let D (ω) be the sample<br />

realization of the demand and denote<br />

f (x, ω) : = cx − p min {x, D (ω)}<br />

⎧<br />

⎨ x (c − p) , if x ≤ D (ω) ,<br />

=<br />

⎩ cx − pD (ω) , if x > D (ω) .<br />

In order to en<strong>for</strong>ce compactness on the control set we restrict to x ∈ [0, M], where M<br />

is some number big enough to capture any feasible stocking quantity. As an example<br />

in the newsvendor problem we could set M to be an upper bound on the number of<br />

newspapers produced on any given day.<br />

Let X = [0, M]. Then,<br />

⎧<br />

∂<br />

⎨<br />

f (x, ω) =<br />

∂x ⎩<br />

c − p, if x < D (ω) ,<br />

c, if x > D (ω) .


QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 15<br />

In general the derivative ∂<br />

∂x f(x, ω) is not defined when x = D(ω) but notice that this<br />

still agrees with the condition (C2) of f(x, ·) being differentiable almost everywhere.<br />

Next, ∀ω1, ω2 ∈ Ω and ∀x ∈ X , we know that f (x, ω1) ≤ f (x, ω2) if and only if<br />

D (ω1) ≥ D (ω2) , which again is true if and only if<br />

∂<br />

∂x f (x, ω1) ≤ ∂<br />

∂x f (x, ω2) .<br />

Let (Dn) n≥1 be i.i.d random variables. If we let x0 = 0 and<br />

xn = xn−1 − 1<br />

⎧<br />

⎨<br />

=<br />

⎩<br />

where Dn is Fn-measurable, then<br />

lim<br />

n→∞ xn<br />

a.s.<br />

from Theorem 3.3. This implies<br />

n sgn α (f ′ n (xn−1))<br />

xn−1 + α<br />

n , if xn−1 ≤ Dn<br />

xn−1 − 1−α<br />

n , if xn−1 > Dn<br />

= argmax<br />

x∈X<br />

Qα [p min {x, D} − cx] ,<br />

argmaxQα<br />

[p min {x, D} − cx] = QαD.<br />

x∈X<br />

(3.10)<br />

For illustration, suppose p = 10, c = 1, and (Dn) n≥1 is i.i.d with Pareto distribution<br />

whose tail index is 1/2:<br />

⎧ 1/2 ⎨ 1 1 − y , if y ≥ 1<br />

P [Dn ≤ y] =<br />

⎩<br />

0, if y < 1<br />

<strong>for</strong> all n ≥ 1. For given x ≥ 1, the expected profit is<br />

E [p min {x, Dn}] − cx = E [10 min {x, Dn}] − x<br />

x<br />

= 5 y −1/2 dy + 10xP [Dn > x] − x<br />

y=1<br />

= 10y 1/2 | x y=1 + 10x 1/2 − x<br />

= 20x 1/2 − 10 − x.<br />

Then, we obtain a maximum expected profit of 90 if<br />

x ∗ = Q p−c Dn = Q 9<br />

p<br />

10 Dn = 100.<br />

However, while x ∗ = 100 maximizes the expected profit, the risk that we lose a lot<br />

of money is very high. For our newsvendor example, the probability that we lose at<br />

least $50 while following our optimal policy is given by<br />

P [p min {x ∗ , Dn} − cx ∗ ≤ −50] = P [10 min {100, Dn} ≤ 50]<br />

1/2 = P [D ≤ 5] = 1 −<br />

1<br />

5<br />

≈ .55.


16 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

A policy that has a 55% chance of losing 50% or more of the initial investment of 100<br />

is a disastrous policy even if the expected return on investment is 90%. In a <strong>heavy</strong><strong>tailed</strong><br />

environment, focusing on the expected profit leads one to take on imprudent<br />

risks. The above policy has only a 30% chance of making a profit of 10% return or<br />

more. Instead of trying to maximize the expectation, suppose we want to maximize<br />

some quantile. Since<br />

argmaxQα<br />

[p min {x, Dn} − cx] = QαDn =<br />

x∈X<br />

1<br />

(1 − α)<br />

we maximize the median profit if x ∗ = 4. Then, the median profit is 36 while the<br />

expected profit is 26 and the minimum amount of profit we can make is 6. That is,<br />

we have a 100% chance of making more than 150% profit on the initial investment and<br />

our expected return on investment is 650%. The median return on our investment<br />

is 900%. Again, when x ∗ = 4, we have a 50% chance of making a profit of 36<br />

or more, while we only have a 30% chance of making a profit of 10 or more when<br />

x ∗ = 100. In fact, x ∗ = 100 maximizes the 0.9-quantile of the profit. This example<br />

illustrates that it is not always reasonable to try to maximize the expectation even<br />

when we can compute the expectation. The merit of the more conservative policy<br />

derived from quantile <strong>optimization</strong> becomes even more conspicuous if we conduct<br />

independent experiments multiple times and observe what happens to the cumulative<br />

profit. Let<br />

Ui = 10 min {100, Di} − 100<br />

be the profit of a business that follows the expectation-maximizing policy (x∗ = 100)<br />

at the ith iteration. Let<br />

n<br />

Vn =<br />

be the cumulative profit of a business after the n th iteration. Left plot on the Figure<br />

3.1 shows ten different sample paths of Vn as well as their average.<br />

Next, let<br />

i=1<br />

Ui<br />

Xi = 10 min {4, Di} − 4<br />

be the profit of a business that follows the median-maximizing policy (x∗ = 4) at the<br />

ith iteration. Let<br />

n<br />

Yn =<br />

i=1<br />

be the cumulative profit. The right plot on the Figure 3.1 shows several different<br />

sample paths following the median-maximization policy. By comparing the two plots,<br />

we can see that while Vn grows a lot more faster than Yn and thus is much more<br />

profitable on average, it is much more volatile than Yn. We can observe how risky<br />

the expectation-maximizing policy can be; one sample path of Vn reaches below -$700<br />

after the 11 th iteration, implying that the business would have gone bankrupt if it did<br />

not have initial capital in excess of $700.<br />

For the above example, we happened to know the closed-<strong>for</strong>m expression <strong>for</strong> the<br />

quantiles, but there are many applications where we do not have a <strong>for</strong>mula in a closed<strong>for</strong>m.<br />

Often, we do not even know which distribution we are sampling from. In such<br />

Xi<br />

2 ,


QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 17<br />

Fig. 3.1. Cumulative profit obtained by following the expectation-maximizing policy (left) and<br />

by following the median-maximizing policy (right), <strong>for</strong> ten sample paths.<br />

Fig. 3.2. Convergence of xn when α = 0.5, 0.684, and 0.8.<br />

cases, we must rely on the algorithm (3.10). Figure 3.2 shows the convergence of the<br />

algorithm (3.10) <strong>for</strong> α = 0.5, 0.684, and 0.8, where xn → 4, 10, and 25, respectively.<br />

The horizontal axis is shown in the logarithmic scale. Table 3.1 shows the expected<br />

profit and the probability of loss <strong>for</strong> different quantiles.


18 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

x ∗<br />

Expected<br />

Profit<br />

Probability of<br />

Loss<br />

α = 0.5 4 36 0<br />

α = 0.684 10 43.2 0<br />

α = 0.8 25 65 0.37<br />

α = 0.9 100 90 0.68<br />

Table 3.1<br />

Optimizing the quantile of the profit <strong>for</strong> the newsvendor problem<br />

4. Trading Electricity in the Presence of Storage. Trading on the electricity<br />

spot market introduces the challenge of dealing with spot prices which are<br />

well-known to be <strong>heavy</strong>-<strong>tailed</strong> [2, 4, 8]. In this section, we show how quantile <strong>optimization</strong><br />

can be used to derive a trading strategy in the presence of storage. At each<br />

time t, we sign a contract and determine whether to buy or sell electricity during<br />

the time interval [t, t + 1). We propose a trading strategy and a per<strong>for</strong>mance metric<br />

that can be assumed to be approximately i.i.d, thus allowing us to apply our quantile<br />

<strong>optimization</strong> algorithm. We use electricity spot market price data from Ercot and<br />

PJM West Hub. Ercot is a utility that covers most of Texas. PJM West covers the<br />

Chicago metropolitan area. Ercot and PJM represent competitive electricity markets<br />

where the price is settled by the market based on the supply and the demand in real<br />

time. The electricity price in Ercot is particularly volatile because of the significant<br />

role wind energy plays in the market. While demand <strong>for</strong> electricity is always random,<br />

reliance on intermittent energy such as wind or solar makes the supply side of the<br />

equation random as well, thus making a large mismatch between supply and demand<br />

more likely, resulting in violent price swings.<br />

4.1. Model. Throughout this section, we assume the amount of electricity we<br />

can deliver to the market by discharging our storage <strong>for</strong> a one hour period is one<br />

megawatt-hour. We assume that the round-trip efficiency of our storage is 0 < ρ < 1.<br />

For most of the commercially available storage devices the value ρ is between .65 and<br />

.80 [40]. We must import 1/ρ megawatt-hours of electricity when charging our storage<br />

device <strong>for</strong> every megawatt-hour of energy that we want to release. Throughout this<br />

section, we assume that we have an infinite capacity storage device. Moreover, we<br />

assume that we are sufficiently small compared to the overall market so that we can<br />

always discharge our storage and sell one-megawatt-hour of electricity to the market<br />

and we can always buy one-megawatt-hour of electricity and charge our storage if we<br />

choose to do so.<br />

State Variables:<br />

Let t ∈ N+ be a discrete time index corresponding to the decision epoch.<br />

pt = price of electricity per megawatt-hour traded during the time interval [t, t + 1) , ∀t.<br />

When we discharge our storage one unit to sell one megawatt-hour of electricity, we<br />

earn pt. When we buy electricity to charge our storage, we are buying 1/ρ megawatt-


QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 19<br />

hour of electricity, thus we have to pay pt/ρ.<br />

St = the electricity price history over the last 99 hours = (pt ′) t−99≤t ′ ≤t . The history<br />

St is needed to compute the quantile of the price process. We interpret St as the state<br />

of the system at time t.<br />

Decision (Action) Variable:<br />

xt = decision that tells us to discharge (sell), hold, or charge (buy) our storage at<br />

time t. xt ∈ {−1, 0, 1} where xt = −1 indicates discharge, xt = 0 indicates hold, and<br />

xt = 1 indicates charge.<br />

Exogenous Process:<br />

pt = pt − pt−1 = random variable that captures the evolution of the price process<br />

(pt) t≥0 .<br />

Let Ω be the set of all possible outcomes and let F be a σ-algebra on the set, with<br />

filtrations Ft generated by the in<strong>for</strong>mation given up to time t :<br />

Ft = σ (S0, x0, p1, S1, x1, p2, S2, x2, ..., pt, St, xt) .<br />

P is the probability measure on the measure space (Ω, F) . We have defined the state<br />

of our system at time t as all variables that are Ft-measurable and needed to compute<br />

our decision at time t.<br />

Storage Transition Function:<br />

Rt+1 = Rt + xt.<br />

If xt = 1, we import electricity from the market and store it with some conversion<br />

factor ρ. That is, we purchase 1/ρ megawatt-hours of electricity from the market,<br />

but it only fills up one megawatt-hour worth of electricity in our storage due to the<br />

conversion loss. If xt = −1, the potential energy in the storage is converted into<br />

electricity with a conversion factor of 1, and sold to the market.<br />

4.2. Electricity Price Behavior. At time t, let p (i)<br />

t denote the order statistics<br />

of the past hundred hours of data (pt ′) t−99≤t ′ ≤t sorted in increasing order:<br />

Then, let<br />

p (1)<br />

t ≤ p (2)<br />

t ≤ ... ≤ p (100)<br />

t<br />

<br />

qt := min b ∈ {1, 2, ..., 100} : p (b)<br />

<br />

t ≥ pt<br />

.<br />

(4.1)<br />

be the index of pt in the order statistics. For the hourly electricity spot market price<br />

from Texas Ercot and PJM West Hub,<br />

P [qt ≤ b] ≈ b/100, ∀b ∈ {1, 2, ..., 100} . (4.2)<br />

Next, Table 4.1 shows the behavior of the electricity price <strong>for</strong> Texas Ercot. One<br />

can see that <strong>for</strong> lower values of qt, the probability that the price will increase grows,<br />

and vice versa. The results are similar <strong>for</strong> prices <strong>for</strong> the PJM West hub, as shown


20 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

b = 10 20 30 40 50 60 70 80 90<br />

P pt+1 ≥ 1<br />

0.7 pt | qt ≤ b = .41 .30 .24 .20 .17 .15 .14 .13 .12<br />

P pt+1 ≥ 1<br />

0.75 pt | qt ≤ b = .44 .33 .27 .23 .20 .18 .16 .15 .14<br />

P pt+1 ≥ 1<br />

0.8 pt | qt ≤ b = .47 .37 .32 .28 .25 .22 .21 .19 .18<br />

P pt+1 ≥ 1<br />

0.9 pt | qt ≤ b = .55 .47 .44 .40 .38 .35 .33 .32 .30<br />

P [pt+1 ≤ pt | qt > b] = .52 .53 .54 .56 .57 .59 .62 .66 .74<br />

Table 4.1<br />

Dependence of electricity price behavior on qt <strong>for</strong> Texas Ercot<br />

b = 10 20 30 40 50 60 70 80 90<br />

P pt+1 ≥ 1<br />

0.7 pt | qt ≤ b = .173 .132 .121 .119 .115 .110 .106 .102 .098<br />

P pt+1 ≥ 1<br />

0.75 pt | qt ≤ b = .211 .174 .153 .151 .148 .144 .139 .134 .129<br />

P pt+1 ≥ 1<br />

0.8 pt | qt ≤ b = .249 .205 .193 .191 .190 .186 .181 .176 .171<br />

P pt+1 ≥ 1<br />

0.9 pt | qt ≤ b = .393 .339 .323 .320 .317 .310 .304 .297 .289<br />

P [pt+1 ≤ pt | qt > b] = .52 .54 .55 .56 .58 .60 .65 .72 .83<br />

Table 4.2<br />

Dependence of electricity price behavior on qt <strong>for</strong> PJM West Hub<br />

in Table 4.2. This is because we may assume that the electricity price is medianreverting,<br />

as shown in [18]. According to [18], the electricity price is <strong>heavy</strong>-<strong>tailed</strong> and<br />

non-stationary so that the empirical mean cannot be used to characterize the behavior<br />

of the price. Instead of using standard Gaussian jump-diffusion processes with meanreversion,<br />

[18] shows that the price behavior is more appropriately modelled as being<br />

median-reverting with an underlying <strong>heavy</strong>-<strong>tailed</strong> process. Thus, if the current price<br />

level is at the higher end of the prices realized in the past one hundred hours, we know<br />

there is a high probability that the price will go down due to the median-reversion,<br />

and vice versa.<br />

4.3. A Robust Trading Policy. We believe that a robust policy <strong>for</strong> trading<br />

electricity is a policy whose per<strong>for</strong>mance is consistent across different sample paths<br />

of the price process. The policy optimized based on past data should remain optimal<br />

going into the future. In other words, if we have a parameterized policy, the parameters<br />

that were optimal <strong>for</strong> year 2008 and the parameters that were optimal <strong>for</strong> year<br />

2009 should be almost identical. This implies that by the end of the year 2008, we<br />

would know what policy we will use to trade in year 2009 and we would know that<br />

the policy will per<strong>for</strong>m well. Our goal is to devise a robust trading policy that utilizes<br />

the fact that electricity price is median-reverting, as shown in the previous section.<br />

In prior research, we have shown that the electricity price is <strong>heavy</strong>-<strong>tailed</strong> [18],<br />

which implies that our trading policy should only depend on zeroth-order statistics,<br />

i.e., the quantile function. Since we do not want to have negative exposure to <strong>heavy</strong><strong>tailed</strong><br />

price movements, we must always buy low and sell high, which is possible


QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 21<br />

because the price process is median-reverting.<br />

In the previous section, we have shown that we can ascertain how high or low<br />

the current price level is by comparing it to the percentile constructed from the order<br />

statistics. Given a set of potential quantiles 1 ≤ x L ≤ 50 and 51 ≤ x H ≤ 100, our<br />

policy is<br />

where<br />

X π t (St) =<br />

⎧<br />

⎪⎨<br />

⎪⎩<br />

1, if qt ≤ x L<br />

−1, if qt ≥ x H<br />

0, otherwise<br />

π := x L , x H .<br />

. (4.3)<br />

x L determines when to buy electricity. Of course, the lower the threshold x L , the<br />

higher the probability that the price will increase by more than a factor of 1/ρ, as<br />

shown in Table 4.1 and Table 4.2. On the other hand, <strong>for</strong> smaller values of x L , we<br />

will buy electricity less frequently and thus we will have less opportunity to make<br />

money, as implied from (4.2). Similarly, the higher the threshold x H , the higher the<br />

probability that the price will decrease, but we will be able to sell electricity less<br />

frequently, reducing the opportunities to make money.<br />

We define an objective function in the following way:<br />

⎧<br />

⎪⎨<br />

x<br />

L H<br />

ft+1 x , x , ω =<br />

⎪⎩<br />

L <br />

· pt+1 − 1<br />

ρpt <br />

(ω) , if qt ≤ xL 0, if<br />

t<br />

xL < qt < xH <br />

H 100 − x · (pt − pt+1) (ω) , if xH ≤ qt<br />

to be our objective function. Note that<br />

P qt ≤ x L ≈ xL<br />

100 and P qt ≥ x H ≈<br />

100 − xH<br />

,<br />

100<br />

(4.4)<br />

as shown in (4.2). Notice that when x L ≥ qt, we buy electricity and hence we would<br />

make a profit (in a mark-to-market sense) if pt+1 − 1<br />

ρ pt ≥ 0 and have a loss if pt+1 −<br />

1<br />

ρ pt < 0. In the other hand, when x H ≤ qt, we would have correctly sold electricity<br />

if pt+1 − pt ≤ 0, and incur a loss in opportunity cost if pt+1 − pt ≥ 0. Clearly<br />

(4.4) reflects these observations and allows us to calculate the one-step contribution<br />

of buying, selling, or holding electricity in our storage device. <br />

L H Proposition 4.1. Given Ft, define the objective function ft+1 x , x , ω as in<br />

(4.4). Then x L satisfies the monotonicity condition (3.2) while x H satisfies (3.3).<br />

<br />

L H Proof. Let ft+1 x , x <br />

denote the random variable<br />

<br />

of which<br />

<br />

specific sample re-<br />

L H L H alization is ft+1 x , x , ω . Then, <strong>for</strong> ∀ω ∈ Ω, ft+1 x , x , ω is a concave function<br />

of xL , xH . When xL ≥ qt, we buy electricity and hence we would make a profit (in<br />

a mark-to-market sense) if pt+1 − 1<br />

ρ pt ≥ 0 and have a loss if pt+1 − 1<br />

ρ pt < 0. When<br />

x H ≤ qt, we would have correctly sold electricity if pt+1 − pt ≤ 0, and incur a loss in<br />

opportunity cost if pt+1 − pt ≥ 0.<br />

Next, we know that<br />

⎧<br />

∂ ⎨<br />

L H<br />

ft+1 x , x , ω =<br />

∂xL ⎩<br />

<br />

pt+1 − 1<br />

ρ pt<br />

<br />

(ω) , if qt ≤ x L t ,<br />

0, else.


22 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

and<br />

For some ω 1 , ω 2 ∈ Ω,<br />

⎧<br />

∂ ⎨<br />

L H<br />

ft+1 x , x , ω =<br />

∂xH ⎩<br />

(pt+1 − pt) (ω) , if x H ≤ qt,<br />

0, else.<br />

L H 1<br />

ft+1 x , x , ω L H 2<br />

≤ ft+1 x , x , ω <br />

if and only if one of the following three conditions hold:<br />

1. qt ≤ xL <br />

and<br />

ω<br />

1 <br />

≤<br />

2. x L < qt < x H<br />

pt+1 − 1<br />

ρ pt<br />

pt+1 − 1<br />

ρ pt<br />

3. x H ≤ qt and (pt+1 − pt) ω 1 ≤ (pt+1 − pt) ω 2 ,<br />

These conditions hold true if and only if<br />

and<br />

∂<br />

ft+1<br />

∂xL ∂<br />

ft+1<br />

∂xH L H 1<br />

x , x , ω ≤ ∂ L H 2<br />

ft+1 x , x , ω<br />

∂xL <br />

L H 1<br />

x , x , ω ≥ ∂ L H 2<br />

ft+1 x , x , ω<br />

∂xH .<br />

There<strong>for</strong>e, x L satisfies (3.2) while x H satisfies (3.3).<br />

Then, our goal is to compute x L∗ , x H∗ such that<br />

ω 2 <br />

L∗ H∗<br />

x , x := argmax<br />

1≤xL≤50,50


QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 23<br />

Fig. 4.1. x L t<br />

computed from (4.5) as t → ∞, <strong>for</strong> α = 0.5 and ρ = 0.7.<br />

α = 0.3 α = 0.4 α = 0.5<br />

ρ = 0.7 36.2 43.6 49.4<br />

ρ = 0.75 37.1 44.6 50.5<br />

ρ = 0.8 39.0 46.3 51.6<br />

Table 4.3<br />

x L computed from PJM West Hub Data<br />

and round-trip conversion factor ρ. The lower the value of ρ, the greater the conversion<br />

loss, and hence our decision to buy must be more conservative, leading to a lower x L .<br />

And of course, a larger value of α leads to more risk taking and hence higher x L . We<br />

show results <strong>for</strong> α ≤ 0.5 because <strong>for</strong> larger α, we end up with policies that take too<br />

much risk and buy electricity too readily and end up not making a reasonable amount<br />

of profit. Similarly, Table 4.4 shows x L computed from the above algorithm using<br />

Texas Ercot price data. Next, Table 4.5 shows x H computed from PJM West and<br />

Texas Ercot data <strong>for</strong> some α ≤ 0.6. For α > 0.6, we end up with policies that take<br />

too much risk and sell our electricity too easily and end up not making a reasonable<br />

amount of profit.<br />

5. Maximum Profit Trading Policy. In the previous section, we have shown<br />

how to compute x L and x H based on our risk-appetite determined by α. If we want to<br />

find out exactly how much risk we would have had to accept in the past to maximize


24 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

α = 0.3 α = 0.4 α = 0.5<br />

ρ = 0.7 32.6 39.6 46.0<br />

ρ = 0.75 33.7 40.8 47.2<br />

ρ = 0.8 35.5 42.2 49.0<br />

Table 4.4<br />

x L computed from Texas Ercot Data<br />

α = 0.3 α = 0.4 α = 0.5 α = 0.6<br />

PJM West 87.9 75.2 62.4 50.1<br />

Texas Ercot 85.4 72.1 59.7 45.5<br />

Table 4.5<br />

x H computed from Texas Ercot and PJM West Data<br />

our profit, we can first find x L and x H that maximizes the cumulative profit<br />

C x L , x H := −<br />

T<br />

ptxt,<br />

through deterministic search. Then, α that is closest to producing the above to x L<br />

and x H tells us what the necessary risk-appetite should have been <strong>for</strong> us to have<br />

maximized our profit. Assuming ρ = .75, <strong>for</strong> year 2006, when the median price of<br />

the year was $47.1 in Texas Ercot, x L = 39 and x H = 71 would have given us the<br />

highest profit of $7.6 per hour, and these correspond to α ≈ 0.36. In other words, we<br />

should have been fairly conservative in controlling the probability of losses in order<br />

to maximize our profit. This is because the <strong>heavy</strong>-<strong>tailed</strong> spikes in the electricity<br />

prices have a disproportionate impact on profits and losses. Instead of trading often<br />

and trying to make a profit as often as possible, we must be patient and wait long<br />

enough <strong>for</strong> large price deviations to occur. These large deviations actually occur often<br />

enough so that we can eventually get a good price <strong>for</strong> buying and selling electricity at<br />

a good profit. For year 2007, when the median price of the year was $48.3, x L = 36<br />

and x H = 68 would have been optimal, giving us a profit of $7.2 per hour. This<br />

again corresponds to about α ≈ 0.36. Thus, even though the electricity price is highly<br />

volatile and non-stationary with wildly differing price <strong>distributions</strong> over time, the riskappetite<br />

that would have been optimal <strong>for</strong> year 2006 was still optimal <strong>for</strong> year 2007.<br />

Results are similar <strong>for</strong> year 2008 and 2009 and <strong>for</strong> the data from PJM West hub. The<br />

trading policy based on optimizing the quantiles is truly robust - it is not sensitive<br />

to the particular sample paths of data. This is because the policy captures the core<br />

characteristics of the data. The hourly electricity spot price is median-reverting and<br />

<strong>heavy</strong>-<strong>tailed</strong>.<br />

Table 5.1 shows the per<strong>for</strong>mance of the above trading policy from year 2007 to<br />

2009 in Ercot. We can see that the difference between the profit we can make using<br />

the optimal parameters x L , x H and the profit we can make using the parameters<br />

computed from the previous year’s data, is negligible, implying that the quantile-based<br />

trading policy is robust.<br />

t=0


QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 25<br />

year median price optimal (xL, xH) optimal profit last year’s (xL, xH) profit<br />

2007 $48.3 (36, 68) $7.2/hour (39, 71) $7.1/hour<br />

2008 $48.7 (39, 68) $17.0/hour (36, 68) $16.8/hour<br />

2009 $23.5 (39, 69) $6.3/hour (39, 68) $6.2/hour<br />

Table 5.1<br />

Per<strong>for</strong>mance of quantile based trading policy in Ercot<br />

6. Conclusion. In this article, we have presented a simple, provably convergent<br />

algorithm <strong>for</strong> optimizing the quantile of a continuous random function which may not<br />

have expectation. The algorithm follows the spirit of Robbins and Monro [36], and<br />

does not require us to store and sort the data. In the real world, we must often deal<br />

with non-stationary and <strong>heavy</strong>-<strong>tailed</strong> data. In that case, a policy based on higherorder<br />

statistics such as the expectation and the variance can be unstable at best or<br />

enormously risky, as we have shown with our example of newsvendor problem. We<br />

also showed that in the electricity spot market, a simple trading policy based on<br />

quantile <strong>optimization</strong> is robust and generate good profit.<br />

REFERENCES<br />

[1] G. R. Arce and J. L. Paredes, Recursive weighted median filters admitting negative weights<br />

and their <strong>optimization</strong>, IEEE Trans. Signal Processing, 48 (3), 2000, pp. 768–779.<br />

[2] J. P. Bouchaud, Power laws in economics and finance: some ideas from physics, Quantitative<br />

Finance, 1 (1), 2001, pp. 105–112.<br />

[3] D. Bertsimas and A. Thiele, A robust <strong>optimization</strong> approach to inventory theory, Operations<br />

Research, 54 (1), 2006, pp. 150–168<br />

[4] W. Breymann, A. Dias, and P. Embrecths, Dependence structure <strong>for</strong> multivariate highfrequency<br />

data in finance, Quantitative Finance, 3 (1), 2003, pp. 1–14.<br />

[5] H. N. E. Bystrom, Extreme value theory and extremely large electricity price changes, International<br />

Review of Economics & Finance, 14 (1), 2002, pp. 41–55.<br />

[6] A. Cartea and M. G. Figueroa, Pricing in electricity markets: a mean-reverting jumpdiffusion<br />

model with seasonality Applied Mathematical Finance, 12 (4), 2005, pp. 313–335.<br />

[7] A. Eydeland and K. Wolyniec, Energy and Power Risk Management, John Wiley & Sons,<br />

New Jersey, 2003.<br />

[8] J. D. Farmer and F. Lillio, On the origin of power law tails in price fluctuations, Quantitative<br />

Finance, 4 (1), 2004, C7–C11.<br />

[9] A. P. George and W. B. Powell, Adaptive stepsizes <strong>for</strong> recursive estimation with applications<br />

in approximate dynamic programming, Machine Learning, 65 (1), 2006, pp. 167–198.<br />

[10] B. M. Hill, A simple general approach to inference about the tail of a distribution, Annals of<br />

Statistics, 3 (5), 1975, pp. 1163–1174.<br />

[11] P. Heidelberger and P. A. W. Lewis, <strong>Quantile</strong> Estimation in Dependent Sequences, Operations<br />

Research, 32 (1), 1984, pp. 185–209.<br />

[12] D. P. Heyman and M. J. Sobel, Stochastic Models in Operations Research, Dover Publications,<br />

New York, 2003.<br />

[13] T. Hsing, On tail index estimation using dependent data, Annals of Statistics, 19 (3), 1991,<br />

pp. 1547–1569.<br />

[14] R. Huisman and R. Maiheu, Regime jumps in electricity prices, Energy Economics, 25 (5),<br />

2003, pp. 425–434.<br />

[15] D. Hutchinson and J. C. Spall, Stopping small-sample stochastic approximation, in American<br />

Control Conference, 2009, AAC’09, IEEE, pp. 26–31.<br />

[16] V. R. Joseph, Y. Tian, and C. f. Wu, Adaptive designs <strong>for</strong> stochastic root-finding, Satistica<br />

Sinica, 17 (4), 2007, pp. 1549–1565.<br />

[17] A. I. Kibzun and V. Y. Kurbakovskiy, Guaranteeing approach to solving quantile <strong>optimization</strong><br />

problems, Annals of Operations Research, 30 (1), 1991, pp. 81–93.


26 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />

[18] J. H. Kim and W. B. Powell, An Hour-Ahead Prediction Model <strong>for</strong> Heavy-Tailed Spot Prices,<br />

submitted to Energy Economics, 2011.<br />

[19] A. J. Kleywegt, A. Shapiro, and T. Homem-de-Mello, The sample average approximation<br />

method <strong>for</strong> stochastic discrete <strong>optimization</strong>, SIAM Journal on Optimization, 12 (2), 2002,<br />

pp. 479–502.<br />

[20] R. Koenker, <strong>Quantile</strong> Regression, Cambridge University Press, New York, 2005.<br />

[21] H. J. Kushner, and G. Yin, Stochastic approximation and recursive algorithms and applications,<br />

Sringer-Verlag, New York, 2003.<br />

[22] T. L. Lai, Stochastic approximation: invited paper, Annals of Statistics, 31 (2), 2003, pp.<br />

391–406.<br />

[23] T. l. Lai and H. Robbins, Adaptive design and stochastic approximation, The Annals of<br />

Statistics, 7 (6), 1979, pp. 1196–1221.<br />

[24] T. L. Lai and H. Robbins, Consistency and asymptotic efficiency of slope estimates in stochastic<br />

approximation schemes, Probability Theory and Related Fields, 56 (3), 1981, pp. 329–<br />

360.<br />

[25] B. LeBaron, Stochastic volatiliy as a simple generator of apparent financial power laws and<br />

long memory, Quantitative Finance, 1 (6), 2001, pp. 621–631.<br />

[26] J. J. Lucia and E. S. Schwartz, Electricity prices and power derivatives: Evidence from the<br />

Nordic power exchange, Review of Derivatives Research, 5 (1), 2002, pp. 5–20.<br />

[27] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation<br />

approach to stochastic programming, SIAM Journal on Optimization, 19 (4), 2009, pp.<br />

1574-1609.<br />

[28] L. Peng, Estimating the mean of a <strong>heavy</strong>-<strong>tailed</strong> distribution, Statistics and Probability Letters,<br />

52 (3), 2001, pp. 255–264.<br />

[29] L. Peng, Empirical-likelihood based confidence interval <strong>for</strong> the mean with a <strong>heavy</strong>-<strong>tailed</strong> distribution,<br />

Annals of Statistics, 32 (3), 2004, pp. 1192–1214.<br />

[30] L. Peng and Y. Qi, Confidence regions <strong>for</strong> high quantiles of a <strong>heavy</strong>-<strong>tailed</strong> distribution, Annals<br />

of Statistics, 34 (4), 2006, pp. 1964–1986.<br />

[31] N. C. Petruzzi and M. Dada, Pricing and the newsvendor problem: A review with extensions,<br />

Operations Research, 47 (2), 1999, pp. 183–194.<br />

[32] D. Pilipovic, Energy Risk: Valuing and Managing Energy Derivatives, Mcgraw-Hill, New<br />

York, 2007.<br />

[33] W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality,<br />

John Wiley and Sons, New York, 2007.<br />

[34] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming,<br />

John Wiley and Sons, New York, 1994.<br />

[35] S. Resnick and C. Starica, Tail index estimation <strong>for</strong> dependent data, Annals of Applied<br />

Probability, 8 (4), 1998, pp. 1156–1183.<br />

[36] H. Robbins and S. Monro, A stochastic Approximation Method, Annals of Mathematical<br />

Statistics, 22 (3), 1951, pp. 400–407.<br />

[37] A. Ruszczyński and A. Shapiro, Stochastic Programming, Handbooks in Operations Research<br />

and Management Science, 10, Elsevier, 2003.<br />

[38] A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on Stochastic Programming: Modeling<br />

and Theory, MPS-SIAM Series on Optimization, 9, MPS-SIAM, Philadelphia, 2009.<br />

[39] E. S. Schwartz, Stochastic behavior of commodity prices: Implications <strong>for</strong> valuation and<br />

hedging, The Journal of Finance, 52 (3), 1997, pp. 923–974.<br />

[40] R. Sioshansi, P. Denholm, T. Jenkin, and J. Weiss, Estimating the value of electricity<br />

storage in PJM: Arbitrage and some welfare effects, Energy Economics, 31 (2), 2009, pp.<br />

269–277.<br />

[41] P. Sunehag, J. Trumpf, S. V. N. Vishwanathan, and N. Schraudolph, Variable metric<br />

stochastic approximation theory, Arxiv preprint arXiv:0908.3529v1, 2009.<br />

[42] N. N. Taleb, Black swans and the domains of statistics, The American Statistician, 61 (3),<br />

2007, pp. 198–200.<br />

[43] N. N. Taleb, Errors, robustness, and the fourth quadrant, International Journal of Forecasting,<br />

25 (4), 2009, pp. 744–759.<br />

[44] L. Yin, R. Yang, M. Gabbouj, and Y. Neuvo, Weighted Median Filters: A Tutorial, IEEE<br />

Transactions on Circuits and Systems, 43 (3), 1996, pp. 157–192.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!