Quantile optimization for heavy-tailed distributions ... - CASTLE Lab
Quantile optimization for heavy-tailed distributions ... - CASTLE Lab
Quantile optimization for heavy-tailed distributions ... - CASTLE Lab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
QUANTILE OPTIMIZATION FOR HEAVY-TAILED<br />
DISTRIBUTIONS USING ASYMMETRIC SIGNUM FUNCTIONS<br />
JAE HO KIM ∗ , WARREN B. POWELL † , AND RICARDO A. COLLADO ‡<br />
Abstract. In this article, we present a provably convergent algorithm <strong>for</strong> computing the quantile<br />
of a continuous random variable that does not require the existence of expectation or storing all of the<br />
sample realizations. We then present an algorithm <strong>for</strong> optimizing the quantile of a random function<br />
satisfying some strict monotonicity and differentiability properties. This random function may be<br />
characterized by a <strong>heavy</strong>-<strong>tailed</strong> distribution where the expectation is not defined. The algorithm is<br />
illustrated in the context of electricity trading in the presence of storage, where electricity prices are<br />
known to be <strong>heavy</strong>-<strong>tailed</strong> with infinite variance.<br />
Key words. quantile <strong>optimization</strong>, stochastic gradient, signum functions, energy systems<br />
1. Introduction. For the past several decades, academic research in stochastic<br />
control and <strong>optimization</strong> has been focused mainly on maximizing or minimizing the<br />
expectation of a random process [12, 19, 21, 33, 34, 37, 38]. A separate line of research<br />
has focused on worst-case outcomes under the umbrella of robust <strong>optimization</strong><br />
[3]. However, when we are dealing with a non-stationary and complex or <strong>heavy</strong>-<strong>tailed</strong><br />
system, the expectation is difficult or impossible to compute and it is often meaningless<br />
to even try to approximate it. In those cases, it is rational to optimize our policy<br />
based on quantiles, which are zeroth order statistics derived directly from the probability<br />
distribution. The cumulative distribution function (CDF) and the inverse CDF,<br />
also known as the quantile function, can almost always be computed in a practical<br />
manner. Optimizing expectations is especially difficult in various financial markets<br />
including equities, commodities, and electricity markets, many of which are notorious<br />
<strong>for</strong> being non-stationary, extremely volatile, and <strong>heavy</strong>-<strong>tailed</strong> [2, 4, 5, 8]. For example,<br />
daily volatilities of 20-30% are common in electricity markets while the equity market<br />
also exhibits occasional <strong>heavy</strong>-<strong>tailed</strong> crashes as happened recently during the 1987<br />
meltdown, the LTCM crisis in 1998, the dot-com bubble burst in 2001, and of course,<br />
during the 2008 financial meltdown.<br />
A <strong>heavy</strong>-<strong>tailed</strong> distribution is defined by its structure of the decline in probabilities<br />
<strong>for</strong> large deviations. In a <strong>heavy</strong>-<strong>tailed</strong> environment, the usual statistical tools at our<br />
disposal can be tricked into producing erroneous results from observations of data in<br />
a finite sample and jumping to wrong conclusions. For example, suppose we want<br />
to compute the variance from a finite number of i.i.d. samples whose underlying<br />
distribution is <strong>heavy</strong>-<strong>tailed</strong> such that the second and higher-order moments are not<br />
finite. If we use the standard empirical <strong>for</strong>mula <strong>for</strong> computing the variance, we will<br />
get “a number,” thus giving the illusion of finiteness, but they typically show a high<br />
level of instability and grows steadily as we add more and more samples [43]. In this<br />
article, we present a framework <strong>for</strong> optimizing the median or any other quantile of<br />
a stochastic process subject to some strict monotonicity properties and show how it<br />
can be applied to optimizing a trading policy in the electricity markets. Since the<br />
quantile function always exists and is relatively stable even in a highly volatile and<br />
non-stationary environment, optimizing the quantile is a robust procedure, and almost<br />
∗email: jaek@princeton.edu<br />
† email: powell@princeton.edu / Author supported in part by a grant from SAP and AFOSR<br />
contract FA9550-08-1-0195.<br />
‡ email: rcollado@princeton.edu<br />
1
2 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
always a reasonable thing to do in a complex, volatile, and <strong>heavy</strong>-<strong>tailed</strong> environment<br />
or in a situation where the worst-case scenario is unknown or too extreme.<br />
For the past decade, many countries have deregulated their electricity markets.<br />
Electricity prices in deregulated markets are known to be very volatile and as a result<br />
participants in those markets face large risks. Recently, there has been a growing<br />
interest among electricity market participants about the possibility of using storage<br />
devices to mitigate the effect of the volatilities in electricity prices. However, storage<br />
devices have capacity limits and significant conversion losses, and it is unclear how one<br />
should use a storage device to maximize its value. Fortunately, unlike stock prices, it<br />
is well-known that electricity prices in deregulated markets exhibit reversion to the<br />
long-term “average,” allowing us to make directional bets [6, 7, 26, 32, 39]. The value<br />
of a storage device is determined by how good we are at making such bets-buying<br />
electricity when we believe the price will go up and selling electricity when we believe<br />
the price will go down. When using a storage device, we automatically lose 10-30% of<br />
the energy through conversion losses. The conversion loss can be seen as a transaction<br />
cost, and hence the storage is valuable only if the price process is volatile enough so<br />
that we can sell at a price that is sufficiently higher than the price we buy to make<br />
up <strong>for</strong> the loss due to conversion.<br />
The statistics community has long recognized that in the context of linear regression,<br />
minimizing the mean-squared or absolute-value error is an arbitrary choice. For<br />
certain applications, minimizing some quantile of the error terms instead can be a<br />
viable alternative. This approach is generally referred to as quantile regression, which<br />
minimizes the empirical expectation of a loss function on a given finite sample set via<br />
deterministic <strong>optimization</strong>. A comprehensive review of quantile regression and the<br />
relevant literature can be found in [20]. Meanwhile, the signal processing community<br />
has recognized that when the data is non-stationary and highly volatile, applying<br />
median filters [1] using order statistics yields better results than applying traditional<br />
filters based on minimizing empirical expectations.<br />
The method described in this article is different from the ones described in [20]<br />
or [1]. Instead of applying order statistics and deterministic algorithms on a given<br />
data set, we use an adaptation of the recursive algorithm pioneered by Robbins and<br />
Monro. In its seminal 1951 paper [36], Robbins and Monro addressed the problem of<br />
estimating the α-quantile of a distribution function by the use of response, no response<br />
data and a recursive algorithm that looks quite similar to the Newton’s method. Just<br />
like in the classical Newton’s method, Robbins and Monro’s method depends on a<br />
stepsize sequence with a set of properties that the authors named type 1/n. Crucial<br />
to this method is the existence of expectation of the related random variables used in<br />
the model. This quantile estimation problem is a simple application of a fundamental<br />
recursive method developed in [36] <strong>for</strong> stochastic root-finding which is considered by<br />
many to be the foundation of stochastic approximation. The Robbins and Monro<br />
recursive algorithm has spawned numerous research on stepsizes, stopping conditions<br />
and related root-finding methods, see [15, 16, 23, 24, 27]. Later, the method was<br />
generalized to the use of adaptive stepsizes with stochastic properties. See [22, 41] <strong>for</strong><br />
an overview of the development of adaptive stepsizes and extensions of the Robbins-<br />
Monro method.<br />
While general algorithms <strong>for</strong> estimating quantiles require storing the history of<br />
observations, our adaptation of Robbins and Monro method per<strong>for</strong>ms quantile <strong>optimization</strong><br />
without this requirement, making our implementation much more compact.<br />
The first contribution of this article is the development of an algorithm which avoids
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 3<br />
the need to store the history of our process by replacing the stochastic gradient with<br />
the asymmetric signum function. Unlike the stochastic approximation algorithms that<br />
are derived from Robbins and Monro [36], we do not assume that the expectation of<br />
the random variable or random function exists. This is a fundamental difference<br />
that makes our algorithm applicable to <strong>heavy</strong>-<strong>tailed</strong> random variables and random<br />
functions whose mean and variance are not necessarily well-defined.<br />
Our next contribution is the development of a quantile <strong>optimization</strong> problem<br />
with an objective function that satisfies some strict monotonicity and differentiability<br />
properties. We use the quantile algorithm to develop a recursive method with a<br />
stochastic stepsize to solve our quantile <strong>optimization</strong> problem. Then we apply our<br />
<strong>optimization</strong> problem to trading electricity in the spot market in the presence of<br />
storage. This application is in a <strong>heavy</strong>-<strong>tailed</strong> environment where the expectation is<br />
not necessarily well defined and the application of other methods would fail.<br />
This article is organized as follows. In §2, we present a provably convergent<br />
algorithm that allows us to find any quantile of a continuous random variable. In §3,<br />
we present a framework <strong>for</strong> optimizing the quantile of a random function satisfying<br />
some strict monotonicity and differentiability properties that are defined in §3.1. In §4,<br />
we apply our quantile <strong>optimization</strong> method to trading electricity in the spot market<br />
in the presence of storage. In §5, we compare the trading strategy <strong>for</strong> maximum profit<br />
based on quantiles to the standard trading policy based on the hour of the day. In<br />
§6, we summarize our conclusions.<br />
2. Computing the <strong>Quantile</strong> of a Random Variable. In this section, we provide<br />
a simple, provably convergent algorithm that allows us to compute the quantile of<br />
a continuous random variable without storing the history of observations. Let (Ω, F)<br />
be a sample space with set of all possible outcomes Ω. Let X1, X2, . . . be a sequence<br />
of continuous i.i.d. random variables on Ω such that −∞ < Xn < ∞, ∀n ≥ 0.<br />
For z ∈ R and n ≥ 1, let Fn(z) = P [Xn ≤ z] be the right-continuous CDF of<br />
the random variable Xn. The i.i.d. property of (Xn) n≥1 guarantee us that all the<br />
CDF’s Fn are identical so we can simply denote them by F . For any α ∈ (0, 1), the<br />
α-quantile of Xn is defined by<br />
qα := inf {z ∈ R : P [Xn ≤ z] ≥ α}<br />
= inf {z ∈ R : F (z) ≥ α} .<br />
We can see in the last equality that qα is well defined due to the right-continuity of<br />
F .<br />
We consider a filtration F0 ⊆ · · · ⊆ Fn ⊆ · · · ⊆ F where each Fn is generated by<br />
the in<strong>for</strong>mation given up to time n:<br />
Fn = σ (x0, . . . , xn) ,<br />
here xi is the outcome of the random variable Xi, 1 ≤ i ≤ n. We assume that <strong>for</strong> each<br />
n ≥ 0, Xn is Fn-measurable but we do not assume that E [Xn] exists. For example,<br />
(Xn) n≥0 can be i.i.d. Cauchy random variables.<br />
Given Fn, the most common and straight<strong>for</strong>ward method <strong>for</strong> approximating qα<br />
is to sort (Xi) 0≤i≤n in increasing order and pick the nα th smallest number. However,<br />
in order to find the exact quantile using this method, we need an infinite amount<br />
of memory to store all (Xi) 0≤i≤n as n → ∞, which is not practical. We present an<br />
algorithm that only requires us to store one estimate of the quantile at any given time<br />
and show that it converges to the true quantile as n → ∞.
4 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
For α ∈ (0, 1) define the asymmetric signum function as<br />
⎧<br />
⎨ 1 − α, if u ≥ 0,<br />
sgnα(u) =<br />
⎩ −α, if u < 0.<br />
Theorem 2.1. Let Y0 ∈ R be some finite number and<br />
where γn−1 ≥ 0 a.s. and (γn) ∞<br />
Yn = Yn−1 − γn−1sgnα (Yn−1 − Xn) ,<br />
n=0<br />
F0 ⊆ · · · ⊆ Fn ⊆ · · · ⊆ F satisfying<br />
and<br />
Then,<br />
is a stochastic stepsize adapted to the filtration<br />
∞<br />
γn = ∞ a.s.<br />
n=0<br />
∞<br />
(γn) 2 < ∞ a.s.<br />
n=0<br />
lim<br />
n→∞ Yn<br />
a.s.<br />
= qα.<br />
Proof. We know that Yn is Fn-measurable and<br />
(Yn − qα) 2 = (Yn−1 − qα) 2 − 2γn−1 (Yn−1 − qα) · sgn (Yn−1 − Xn) + (γn−1) 2 λn,<br />
where<br />
and hence 0 ≤ λn ≤ 1. Since<br />
λn := sgn 2 α (Yn−1 − Xn)<br />
E [sgn α (Yn−1 − Xn) | Fn−1] = (1 − α) P [Xn ≤ Yn−1 | Fn−1] − αP [Xn > Yn−1 | Fn−1]<br />
we know that<br />
There<strong>for</strong>e,<br />
Next, define<br />
= (1 − α) P [Xn ≤ Yn−1 | Fn−1] − α (1 − P [Xn ≤ Yn−1 | Fn−1])<br />
= P [Xn ≤ Yn−1 | Fn−1] − α<br />
≤ 0 if and only if Yn−1 ≤ qα,<br />
E [(Yn−1 − qα) sgn α (Yn−1 − Xn) | Fn−1] ≥ 0. (2.1)<br />
<br />
E (Yn − qα) 2 <br />
| Fn−1 ≤ (Yn−1 − qα) 2 + (γn−1) 2 , ∀n ≥ 1. (2.2)<br />
Wn := (Yn − qα) 2 +<br />
∞<br />
(γm) 2 , ∀n ∈ N+. (2.3)<br />
m=n
Then, from (2.2),<br />
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 5<br />
E [Wn+1 | Fn] ≤ Wn, ∀n ∈ N+,<br />
showing that (Wn) n≥1 is a positive supermartingale. Then, by the martingale convergence<br />
theorem,<br />
where<br />
Moreover, since<br />
lim<br />
n→∞ (Yn − qα) 2 = lim<br />
n→∞ Wn<br />
a.s.<br />
= W∞, (2.4)<br />
0 ≤ W∞ < ∞.<br />
0 ≤ E [Wn | F0] ≤ W0 < ∞, ∀n ∈ N+,<br />
by the dominated convergence theorem,<br />
<br />
E [W∞ | F0] = E lim<br />
N→∞ WN<br />
<br />
| F0<br />
Define<br />
= lim<br />
N→∞ E [WN | F0] . (2.5)<br />
βn−1,n := E [(Yn−1 − qα) · sgn α (Yn−1 − Xn) | Fn−1] .<br />
From (2.1), βn−1,n ≥ 0, and hence<br />
Next, from (2.3),<br />
β0,n = E [(Yn−1 − qα) · sgn α (Yn−1 − Xn) | F0]<br />
= E [E [(Yn−1 − qα) · sgn α (Yn−1 − Xn) | Fn−1] | F0]<br />
= E [βn−1,n | F0] ≥ 0.<br />
WN − (Y0 − qα) 2 = (YN − qα) 2 − (Y0 − qα) 2 +<br />
=<br />
N<br />
n=1<br />
= −2<br />
∞<br />
m=N<br />
(γm) 2<br />
<br />
(Yn − qα) 2 − (Yn−1 − qα) 2<br />
+<br />
N<br />
n=1<br />
∞<br />
m=N<br />
(γm) 2<br />
γn−1 (Yn−1 − qα) · sgn α (Yn−1 − Xn) +<br />
There<strong>for</strong>e, from (2.5),<br />
E [W∞ | F0] − (Y0 − qα) 2 = lim<br />
N→∞ E<br />
<br />
WN − (Y0 − qα) 2 <br />
| F0<br />
N<br />
= (γn−1) 2 λn +<br />
∞<br />
(γm) 2<br />
=<br />
n=1<br />
−2 lim<br />
N<br />
n=1<br />
N<br />
m=N<br />
N<br />
n=1<br />
(γn−1) 2 λn +<br />
γn−1E [(Yn−1 − qα) · sgnα (Yn−1 − Xn) | F0]<br />
N→∞<br />
n=1<br />
(γn−1) 2 λn +<br />
∞<br />
m=N<br />
(γm) 2 − 2<br />
∞<br />
n=1<br />
γn−1β0,n.<br />
∞<br />
m=N<br />
(γm) 2 .
6 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
Since the left-hand side is finite and<br />
we must have<br />
Since β0,n ≥ 0 and<br />
we conclude that<br />
N<br />
n=1<br />
(γn−1) 2 λn +<br />
∞<br />
m=N<br />
(γm) 2 ≤<br />
∞<br />
(γm) 2 < ∞,<br />
n=1<br />
∞<br />
γn−1β0,n < ∞.<br />
n=1<br />
∞<br />
γn−1 > ∞,<br />
n=1<br />
lim inf<br />
n→∞ β0,n = lim<br />
n→∞ E [βn−1,n | F0] = 0.<br />
There<strong>for</strong>e there is a subsequence (n∗ ) ⊆ (n) ∞<br />
n=0 such that<br />
lim<br />
n∗ β0,n∗ = lim<br />
→∞ n∗→∞ E [βn∗−1,n∗ | F0] = 0.<br />
However, since βn ∗ −1,n∗ is a non-negative random variable, this implies that βn ∗ −1,n∗<br />
goes to 0 with probability 1 as n ∗ → ∞. In other words,<br />
Since<br />
lim<br />
n∗→∞ βn∗−1,n∗ = lim<br />
n→∞ (Yn∗−1 − qα) E [sgnα (Yn∗−1 − Xn∗) | Fn∗−1] a.s.<br />
= 0.<br />
if and only if Yn ∗ −1 = qα,<br />
E [sgn α (Yn ∗ −1 − Xn ∗) | Fn ∗ −1] = 0<br />
lim<br />
n∗ a.s.<br />
Yn∗ = qα. (2.6)<br />
→∞<br />
This proves our goal but only <strong>for</strong> the subsequence (Yn∗). In order to get the convergence<br />
of the whole sequence (Yn) ∞<br />
n=0 we have to look back at equation (2.4) from<br />
which we obtain that<br />
where<br />
lim<br />
n→∞ |Yn − qα| a.s.<br />
= W∞, (2.7)<br />
0 ≤ W∞ < ∞.<br />
Let Y + n , Y − n be the the random variables<br />
Y + <br />
Yn<br />
n =<br />
Y + n−1 otherwise,<br />
if Yn ≥ qα,
and<br />
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 7<br />
We can see from (2.7) that<br />
and<br />
Y − n =<br />
<br />
Yn<br />
Y − n−1 otherwise.<br />
if Yn < qα,<br />
lim<br />
n→∞ Y + a.s.<br />
n − qα = W∞,<br />
lim<br />
n→∞ qα − Y − n<br />
a.s.<br />
= W∞,<br />
a.s.<br />
= qα − √ W∞. By<br />
which implies that limn→∞ Y + a.s.<br />
n = qα + √ W∞ and limn→∞ Y − n<br />
(2.6) we can conclude that at least one of Y + n or Y − n converges almost surely to qα,<br />
which implies that W∞ = 0 and so<br />
Clearly,<br />
and so<br />
as we set to prove.<br />
lim<br />
n→∞ Y + n<br />
Yn =<br />
a.s. a.s.<br />
= qα = lim<br />
n→∞ Y − n .<br />
<br />
Y + n<br />
Y − n<br />
if Yn ≥ qα,<br />
otherwise,<br />
lim<br />
n→∞ Yn<br />
a.s.<br />
= qα,<br />
3. Optimizing the <strong>Quantile</strong> of a Random Function. In this section, we<br />
show how to optimize the quantile of an almost everywhere differentiable function.<br />
We show that a monotonicity condition in a random function and its derivative with<br />
respect to a sample realization allows us to optimize the quantile of the random<br />
function using a simple, provably convergent algorithm. We also show that the wellknown<br />
newsvendor problem has the a<strong>for</strong>ementioned monotonicity.<br />
3.1. Monotonicity Assumptions. Let Ω be the set of all possible outcomes<br />
and let X ⊆ R be a compact control space. We start by defining our control variable<br />
x ∈ X as a scalar <strong>for</strong> a clear and concise presentation of our concepts and the proof of<br />
convergence <strong>for</strong> our algorithm. Then, we generalize our process to the vector-valued<br />
case. Let f : X × Ω ↦→ R be a function satisfying the following conditions:<br />
(C1) f (·, ω) is convex and continuous <strong>for</strong> all ω ∈ Ω;<br />
(C2) f (·, ω) is differentiable almost everywhere <strong>for</strong> all ω ∈ Ω;<br />
(C3) f (x, ·) is a real-valued random variable <strong>for</strong> all x ∈ X .<br />
For example, f (x, ω) may be piecewise-linear with respect to x ∈ X with a finite<br />
number of break-points <strong>for</strong> all fixed ω ∈ Ω. Let f (x) denote the random variable<br />
whose specific sample realization is f (x, ω) . Also, let Qα denote the operator such<br />
that <strong>for</strong> any random variable X,<br />
QαX = inf {z ∈ R : P [X ≤ z] ≥ α}
8 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
gives the α-quantile of the random variable X.<br />
For x ∈ X let Fx(z) = P [f(x) ≤ z] be the right-continuous CDF of the random<br />
variable f(x). Then the α-quantile of f(x) is given by<br />
Qαf(x) = inf {z ∈ R : P [f(x) ≤ z] ≥ α}<br />
= inf {z ∈ R : Fx(z) ≥ α} ,<br />
where, as we saw be<strong>for</strong>e, Qαf(x) is guaranteed to exist due to the right-continuity of<br />
Fx.<br />
Let z ∗ := Qαf(x). If Fx is not continuous at z ∗ then the fact that Fx is nondecreasing<br />
and bounded help us see that<br />
P [f(x) < z ∗ ] = lim<br />
z→z ∗ − Fx(z) < lim<br />
z→z ∗ + Fx(z) = Fx(z ∗ ) = P [f(x) ≤ z ∗ ] . (3.1)<br />
From (3.1) and the definition of the probability of a random variable we conclude that<br />
z ∗ is in the image of f(x). Un<strong>for</strong>tunately if Fx is continuous at z ∗ we cannot readily<br />
conclude that z ∗ is in the image of f(x). This is an important property <strong>for</strong> us and<br />
thus we add it as a requirement <strong>for</strong> the function f:<br />
(C4) Qαf(x) is in the image of f(x) <strong>for</strong> every x ∈ X .<br />
The following propositions show the monotonicity assumptions that allow us to<br />
interchange the gradient ∂/∂x and the quantile operator Qα.<br />
Proposition 3.1. Assume that<br />
f (x, ω1) ≤ f (x, ω2) if and only if ∂<br />
∂x f (x, ω1) ≤ ∂<br />
∂x f (x, ω2) , (3.2)<br />
∀ω1, ω2 ∈ Ω and ∀x ∈ X . Then,<br />
Qα<br />
Proposition 3.2. Assume that<br />
∂ ∂<br />
f (x) =<br />
∂x ∂x Qαf (x) , ∀x ∈ X .<br />
f (x, ω1) ≤ f (x, ω2) if and only if ∂<br />
∂x f (x, ω1) ≥ ∂<br />
∂x f (x, ω2) , (3.3)<br />
∀ω1, ω2 ∈ Ω and ∀x ∈ X . Then,<br />
Let<br />
and<br />
Q1−α<br />
∂ ∂<br />
f (x) =<br />
∂x ∂x Qαf (x) , ∀x ∈ X .<br />
Proof. By condition (C4) of f(x) there is a sample realization ω ∗ ∈ Ω such that<br />
Qαf (x) = f (x, ω ∗ ) .<br />
A := {ω ∈ Ω | f (x, ω) ≤ f (x, ω ∗ )}<br />
B := {ω ∈ Ω | f (x, ω) > f (x, ω ∗ )} .
Then, if (3.2) holds,<br />
and hence<br />
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 9<br />
∂<br />
∂x f (x, ω1) ≤ ∂<br />
∂x f (x, ω∗ ) < ∂<br />
∂x f (x, ω2) , ∀ω1 ∈ A, ω2 ∈ B,<br />
Qα<br />
On the other hand, if (3.3) holds,<br />
and hence<br />
∂ ∂<br />
f (x) =<br />
∂x ∂x f (x, ω∗ ) = ∂<br />
∂x Qαf (x) .<br />
∂<br />
∂x f (x, ω1) ≥ ∂<br />
∂x f (x, ω∗ ) > ∂<br />
∂x f (x, ω2) , ∀ω1 ∈ A, ω2 ∈ B,<br />
Q1−α<br />
∂ ∂<br />
f (x) =<br />
∂x ∂x f (x, ω∗ ) = ∂<br />
∂x Qαf (x) .<br />
We can illustrate Proposition 3.2 through the following example. Suppose X =<br />
[0, 1] and<br />
f (x) = (1 − x) U, ∀x ∈ X ,<br />
where U ∼ U (0, 1) is a uni<strong>for</strong>m random variable. Then, ∀ω1, ω2 ∈ Ω and ∀x ∈ X ,<br />
f (x, ω1) = (1 − x) U (ω1) ≤ (1 − x) U (ω2) = f (x, ω2)<br />
if and only if U (ω1) ≤ U (ω2) , and thus<br />
∂<br />
∂x f (x, ω1) = −U (ω1) ≥ −U (ω2) = ∂<br />
∂x f (x, ω2) ,<br />
satisfying (3.3). Next, Qαf (x) = α (1 − x) and hence<br />
On the other hand,<br />
∂<br />
∂x Qαf (x) = −α, ∀0 < α < 1.<br />
∂<br />
∂<br />
f (x) = −U and hence Q1−α f (x) = −α,<br />
∂x ∂x<br />
satisfying Proposition 3.1.<br />
Note that the classical stochastic gradient algorithm that allows us to optimize<br />
the expectation of a function works because the expectation operator is interchangeable<br />
with the differential operator as long as the expectation is finite. We do not<br />
need to satisfy this condition when optimizing quantiles. Our goal is to develop an<br />
algorithm that is analogous to the classical stochastic gradient algorithm that allows<br />
us to optimize the quantile of a function.
10 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
3.2. <strong>Quantile</strong> Optimization. We begin by stating the algorithm <strong>for</strong> quantile<br />
<strong>optimization</strong> and proving its convergence.<br />
From this point on we assume that the function f and α ∈ (0, 1) are such that<br />
argmin<br />
x∈X<br />
Qα f (x) (3.4)<br />
is a real number. For example, a way to accomplish (3.4) is by assuming that there is<br />
a metric on Ω that makes the CDF Fx(z) : X × R → R uni<strong>for</strong>mly continuous. Then<br />
we can conclude that Qαf(·) is a lower semi-continuous function and that (3.4) is well<br />
defined.<br />
Theorem 3.3. Let f : X × Ω → R be a function as defined be<strong>for</strong>e. Define<br />
x α := argmin<br />
x∈X<br />
Qα f (x) .<br />
Let x0 ∈ X be fixed. For n ∈ N+ and xn−1 ∈ X , let fn (xn−1) be the random variable<br />
whose distribution is that of f (xn−1). Define Fn as the sigma algebra on Ω generated<br />
by the in<strong>for</strong>mation given up to time n:<br />
Denote<br />
Let<br />
⎧<br />
⎨<br />
xn =<br />
⎩<br />
Fn = σ (f1(x0), . . . , fn(xn−1)) .<br />
f ′ n (xn−1) := ∂<br />
∂x fn (x, ω) |x=xn−1 .<br />
xn−1 − γn−1sgn α (f ′ n (xn−1)) , if (3.2) holds <strong>for</strong> f (x)<br />
xn−1 − γn−1sgn 1−α (f ′ n (xn−1)) , if (3.3) holds <strong>for</strong> f (x)<br />
where γn−1 ≥ 0 a.s. and (γn) ∞<br />
F1 ⊆ F2 ⊆ · · · ⊆ F satisfying<br />
and<br />
Then,<br />
n=0<br />
, ∀n ∈ N+,<br />
is a stochastic stepsize adapted to the filtration<br />
∞<br />
γn = ∞ a.s. (3.5)<br />
n=0<br />
∞<br />
(γn) 2 < ∞ a.s. (3.6)<br />
n=0<br />
lim<br />
n→∞ xn<br />
a.s. α<br />
= x .<br />
Proof. Notice first that by construction the sequence of random variables (fn (xn−1)) ∞<br />
n=1<br />
is adapted to the filtration F1 ⊆ F2 ⊆ · · · ⊆ F. Assume (3.2) holds, let x0 ∈ X and<br />
xn = xn−1 − γn−1sgn α (f ′ n (xn−1)) .
Then,<br />
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 11<br />
(xn − x α ) 2 = (xn−1 − x α ) 2 − 2γn−1 (xn−1 − x α ) · sgn α (f ′ n (xn−1)) + (γn−1) 2 λn,<br />
where<br />
From assumption (3.2), if xn−1 ≤ x α ,<br />
and if xn−1 > x α ,<br />
There<strong>for</strong>e,<br />
λ α n := sgn 2 α (f ′ n (xn−1)) .<br />
Qαf ′ n (xn−1) = ∂<br />
∂x Qαf (x) |x=xn−1 ≤ 0<br />
Qαf ′ n (xn−1) = ∂<br />
∂x Qαf (x) |x=xn−1 > 0.<br />
E [sgn α (f ′ n (xn−1)) | Fn−1] = (1 − α) P [f ′ n (xn−1) ≥ 0 | Fn−1] − αP [f ′ n (xn−1) < 0 | Fn−1]<br />
and hence<br />
Then,<br />
= (1 − α) P [f ′ n (xn−1) ≥ 0 | Fn−1] − α (1 − P [f ′ n (xn−1) ≥ 0 | Fn−1])<br />
= P [f ′ n (xn−1) ≥ 0 | Fn−1] − α<br />
≤ 0 if and only if xn−1 ≤ x α ,<br />
E [(xn−1 − x α ) · sgn α (f ′ n (xn−1)) | Fn−1] ≥ 0.<br />
<br />
E (xn − x α ) 2 <br />
| Fn−1 ≤ (xn−1 − x α ) 2 + (γn−1) 2 , ∀n ≥ 1.<br />
Using the same proof technique used in section §2, we can show that<br />
Then,<br />
lim<br />
n→∞ xn<br />
a.s. α<br />
= x .<br />
Similarly, if assumption (3.3) holds instead, we can let x0 ∈ X and<br />
xn = xn−1 − γn−1sgn 1−α (f ′ n (xn−1)) .<br />
(xn − x α ) 2 = (xn−1 − x α ) 2 − 2γn−1 (xn−1 − x α ) · sgn 1−α (f ′ n (xn−1)) + (γn−1) 2 λn,<br />
where<br />
λ 1−α<br />
n<br />
Under assumption (3.3), if xn−1 ≤ x α ,<br />
:= sgn 2 1−α (f ′ n (xn−1)) .<br />
Q1−αf ′ n (xn−1) = ∂<br />
∂x Qαf (x) |x=xn−1 ≤ 0
12 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
and if xn−1 > x α ,<br />
There<strong>for</strong>e,<br />
Q1−αf ′ n (xn−1) = ∂<br />
∂x Qαf (x) |x=xn−1 > 0.<br />
E sgn1−α (f ′ ′<br />
n (xn−1)) | Fn−1 = αP [f n (xn−1) ≥ 0 | Fn−1] − (1 − α) P [f ′ n (xn−1) < 0 | Fn−1]<br />
and hence<br />
Again,<br />
and we can show that<br />
= αP [f ′ n (xn−1) ≥ 0 | Fn−1] − (1 − α) (1 − P [f ′ n (xn−1) ≥ 0 | Fn−1])<br />
= P [f ′ n (xn−1) ≥ 0 | Fn−1] − (1 − α)<br />
≤ 0 if and only if xn−1 ≤ x α ,<br />
E (xn−1 − x α ) · sgn1−α (f ′ <br />
n (xn−1)) | Fn−1 ≥ 0.<br />
<br />
E (xn − x α ) 2 <br />
| Fn−1 ≤ (xn−1 − x α ) 2 + (γn−1) 2 , ∀n ≥ 1,<br />
lim<br />
n→∞ xn<br />
a.s. α<br />
= x .<br />
While we have shown the proof <strong>for</strong> scalar x, the algorithm can be readily extended<br />
to the multivariable case as long as the monotonicity assumption holds in each of the<br />
variables and corresponding partial derivatives. For a simple illustration, suppose<br />
f ( −→ x , ω) is convex in −→ x = x 1 , x 2 , . . . , x m ∈ X <strong>for</strong> some m-dimensional compact<br />
space X ⊂ R m and fixed ω ∈ Ω.<br />
Theorem 3.4. For each 1 ≤ i ≤ m, assume that either<br />
∂<br />
∂xi f (−→ x , ω1) ≤ ∂<br />
∂xi f (−→ x , ω2) if and only if f ( −→ x , ω1) ≤ f ( −→ x , ω2) (3.7)<br />
holds true or<br />
∂<br />
∂xi f (−→ x , ω1) ≥ ∂<br />
∂xi f (−→ x , ω2) if and only if f ( −→ x , ω1) ≤ f ( −→ x , ω2) (3.8)<br />
holds true, ∀ω1, ω2 ∈ Ω and ∀ −→ x ∈ X . For some n ∈ N+ and xn−1 ∈ X , let fn ( −→ x n−1)<br />
be the Fn-measurable random variable whose distribution is that of f ( −→ x n−1). Then,<br />
if we let<br />
and<br />
x i n =<br />
⎧<br />
⎨<br />
⎩<br />
−→ x 0 = x 1 0, x 2 0, ..., x m 0<br />
xi n−1 − γi <br />
∂<br />
n−1sgnα x i n−1 − γ i n−1sgn 1−α<br />
<strong>for</strong> some −→ x 0 ∈ X<br />
∂x i fn ( −→ x ) |−→ x = −→ x n−1<br />
while the stochastic stepsizes γ i n ≥ 0 satisfy<br />
∂<br />
∂x i fn ( −→ x ) |−→ x = −→ x n−1<br />
∞<br />
n=0<br />
γ i n = ∞ a.s.<br />
, if (3.7) holds true,<br />
, if (3.8) holds true,
and<br />
∀1 ≤ i ≤ m, then<br />
where<br />
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 13<br />
∞<br />
n=0<br />
i 2<br />
γn < ∞ a.s,<br />
lim<br />
−→ a.s.<br />
x n =<br />
−→ α<br />
x .<br />
n→∞<br />
−→ x α = argmin<br />
−→ x ∈X<br />
Qα f ( −→ x ) .<br />
3.3. Initialization and the Scaling of Stepsizes. We have shown that the<br />
algorithm <strong>for</strong> finding some quantile of a random variable and optimizing some quantile<br />
of a random function converges as long as the stepsize γn ≥ 0 a.s. satisfies (3.5)<br />
and (3.6). Extensive study on various stepsizes <strong>for</strong> optimizing the expectation of a<br />
random function has been done. A comprehensive review of the literature can be<br />
found in [9, 33]. The typical approach <strong>for</strong> finding the optimal stepsizes in the case of<br />
optimizing expectations is as follows. Suppose (Xi) 1≤i≤n are i.i.d random variables<br />
with finite variance and we want to find the best stepsizes (γi−1) 1≤i≤n that gives the<br />
best estimate <strong>for</strong> the E [Xi] . Then, the mean-squared error<br />
min E<br />
γ0,...,γn<br />
n<br />
i=1<br />
γi−1Xi − E [Xi]<br />
is minimized when γn−1 = 1<br />
n , ∀n. When (Xi) 1≤i≤n are not i.i.d, different stepsizes<br />
can give better results.<br />
However, when computing quantiles using asymmetric signum functions, it is<br />
difficult to determine what the optimal stepsizes should be even <strong>for</strong> the i.i.d case<br />
because it is unclear what we should try to minimize. We cannot minimize the<br />
expectation or variance in a <strong>heavy</strong>-<strong>tailed</strong> environment. Thus, instead of trying to<br />
identify stepsizes that are optimal in some sense, we present an ad-hoc method <strong>for</strong><br />
practical use. Let (Xi) 1≤i≤n be i.i.d random variables and assume we want to compute<br />
some α-quantile. However, suppose the α-quantile of Xi is 10 6 , <strong>for</strong> example. Then, if<br />
we use the following algorithm<br />
2<br />
Yn = Yn−1 − γn−1sgn α (Yn−1 − Xn) ,<br />
with Y0 = 0 and γn−1 = 1<br />
n , it will take prohibitively long <strong>for</strong> the algorithm to give us<br />
a reasonable estimate. Thus, finding a reasonable initialization point Y0 and stepsizes<br />
γn−1 is critical. We propose initially taking sample realizations (Xi) 1≤i≤k <strong>for</strong> some<br />
small k and sorting it in increasing order and pick the kαth smallest number and let<br />
it be Y0. Next, we let<br />
where<br />
σ = 1/2<br />
γn−1 := σ<br />
n<br />
(3.9)<br />
<br />
<br />
<br />
.75-quantile of the set (Xi) 1≤i≤k − .25-quantile of the set (Xi) 1≤i≤k
14 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
is a scaling factor.<br />
For the quantile <strong>optimization</strong> problem, first pick (xm) 1≤m≤M evenly dividing the<br />
compact space X <strong>for</strong> some small M. For each xm, we can generate (Fi (xm)) 1≤i≤k<br />
to find an approximation to the α-quantile of F (xm) . Then, we let our initialization<br />
point x0 be the one that gives the minimum α-quantile. That is, initialization requires<br />
generating M × k samples. Next, we let the stepsize to be given by equation (3.9)<br />
where we replace Xi with Fi (x0) . When optimizing expectations using the standard<br />
Robbins-Monro algorithm, the scaling factor has to strike a balance between the units<br />
of x ∈ X and the scale of the stochastic gradient, which can be highly volatile, ranging<br />
between large negative values and large positive values. However, when optimizing<br />
the quantile using our algorithm, the stepsizes have to be scaled based on x alone.<br />
If we have a rough idea of where the optimal x may fall, we can scale the stepsize<br />
accordingly.<br />
3.4. Newsvendor Problem. To develop an understanding of the properties<br />
of the algorithm, we begin by illustrating it in the context of a simple newsvendor<br />
problem [31]. Given a random demand D ∈ R+, we usually want to compute the<br />
optimal stocking quantity x ∈ R+ that maximizes the expected profit<br />
G (x) := E [p min {x, D} − cx]<br />
where p > 0 is the unit price, c > 0 is the unit cost, and p > c. For this simple<br />
problem, the optimal solution<br />
x ∗ = Q p−c D<br />
p<br />
is the p−c<br />
p -quantile of the random variable D, which can be computed using the<br />
algorithm shown in section §2. However, instead of maximizing the expected profit,<br />
suppose a manager wants to maximize some α-quantile of the profit<br />
This is equivalent to minimizing<br />
qα := Qα [p min {x, D} − cx] .<br />
q1−α := Q1−α [cx − p min {x, D}] .<br />
Let Ω be the set of all possible outcomes. For each ω ∈ Ω, let D (ω) be the sample<br />
realization of the demand and denote<br />
f (x, ω) : = cx − p min {x, D (ω)}<br />
⎧<br />
⎨ x (c − p) , if x ≤ D (ω) ,<br />
=<br />
⎩ cx − pD (ω) , if x > D (ω) .<br />
In order to en<strong>for</strong>ce compactness on the control set we restrict to x ∈ [0, M], where M<br />
is some number big enough to capture any feasible stocking quantity. As an example<br />
in the newsvendor problem we could set M to be an upper bound on the number of<br />
newspapers produced on any given day.<br />
Let X = [0, M]. Then,<br />
⎧<br />
∂<br />
⎨<br />
f (x, ω) =<br />
∂x ⎩<br />
c − p, if x < D (ω) ,<br />
c, if x > D (ω) .
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 15<br />
In general the derivative ∂<br />
∂x f(x, ω) is not defined when x = D(ω) but notice that this<br />
still agrees with the condition (C2) of f(x, ·) being differentiable almost everywhere.<br />
Next, ∀ω1, ω2 ∈ Ω and ∀x ∈ X , we know that f (x, ω1) ≤ f (x, ω2) if and only if<br />
D (ω1) ≥ D (ω2) , which again is true if and only if<br />
∂<br />
∂x f (x, ω1) ≤ ∂<br />
∂x f (x, ω2) .<br />
Let (Dn) n≥1 be i.i.d random variables. If we let x0 = 0 and<br />
xn = xn−1 − 1<br />
⎧<br />
⎨<br />
=<br />
⎩<br />
where Dn is Fn-measurable, then<br />
lim<br />
n→∞ xn<br />
a.s.<br />
from Theorem 3.3. This implies<br />
n sgn α (f ′ n (xn−1))<br />
xn−1 + α<br />
n , if xn−1 ≤ Dn<br />
xn−1 − 1−α<br />
n , if xn−1 > Dn<br />
= argmax<br />
x∈X<br />
Qα [p min {x, D} − cx] ,<br />
argmaxQα<br />
[p min {x, D} − cx] = QαD.<br />
x∈X<br />
(3.10)<br />
For illustration, suppose p = 10, c = 1, and (Dn) n≥1 is i.i.d with Pareto distribution<br />
whose tail index is 1/2:<br />
⎧ 1/2 ⎨ 1 1 − y , if y ≥ 1<br />
P [Dn ≤ y] =<br />
⎩<br />
0, if y < 1<br />
<strong>for</strong> all n ≥ 1. For given x ≥ 1, the expected profit is<br />
E [p min {x, Dn}] − cx = E [10 min {x, Dn}] − x<br />
x<br />
= 5 y −1/2 dy + 10xP [Dn > x] − x<br />
y=1<br />
= 10y 1/2 | x y=1 + 10x 1/2 − x<br />
= 20x 1/2 − 10 − x.<br />
Then, we obtain a maximum expected profit of 90 if<br />
x ∗ = Q p−c Dn = Q 9<br />
p<br />
10 Dn = 100.<br />
However, while x ∗ = 100 maximizes the expected profit, the risk that we lose a lot<br />
of money is very high. For our newsvendor example, the probability that we lose at<br />
least $50 while following our optimal policy is given by<br />
P [p min {x ∗ , Dn} − cx ∗ ≤ −50] = P [10 min {100, Dn} ≤ 50]<br />
1/2 = P [D ≤ 5] = 1 −<br />
1<br />
5<br />
≈ .55.
16 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
A policy that has a 55% chance of losing 50% or more of the initial investment of 100<br />
is a disastrous policy even if the expected return on investment is 90%. In a <strong>heavy</strong><strong>tailed</strong><br />
environment, focusing on the expected profit leads one to take on imprudent<br />
risks. The above policy has only a 30% chance of making a profit of 10% return or<br />
more. Instead of trying to maximize the expectation, suppose we want to maximize<br />
some quantile. Since<br />
argmaxQα<br />
[p min {x, Dn} − cx] = QαDn =<br />
x∈X<br />
1<br />
(1 − α)<br />
we maximize the median profit if x ∗ = 4. Then, the median profit is 36 while the<br />
expected profit is 26 and the minimum amount of profit we can make is 6. That is,<br />
we have a 100% chance of making more than 150% profit on the initial investment and<br />
our expected return on investment is 650%. The median return on our investment<br />
is 900%. Again, when x ∗ = 4, we have a 50% chance of making a profit of 36<br />
or more, while we only have a 30% chance of making a profit of 10 or more when<br />
x ∗ = 100. In fact, x ∗ = 100 maximizes the 0.9-quantile of the profit. This example<br />
illustrates that it is not always reasonable to try to maximize the expectation even<br />
when we can compute the expectation. The merit of the more conservative policy<br />
derived from quantile <strong>optimization</strong> becomes even more conspicuous if we conduct<br />
independent experiments multiple times and observe what happens to the cumulative<br />
profit. Let<br />
Ui = 10 min {100, Di} − 100<br />
be the profit of a business that follows the expectation-maximizing policy (x∗ = 100)<br />
at the ith iteration. Let<br />
n<br />
Vn =<br />
be the cumulative profit of a business after the n th iteration. Left plot on the Figure<br />
3.1 shows ten different sample paths of Vn as well as their average.<br />
Next, let<br />
i=1<br />
Ui<br />
Xi = 10 min {4, Di} − 4<br />
be the profit of a business that follows the median-maximizing policy (x∗ = 4) at the<br />
ith iteration. Let<br />
n<br />
Yn =<br />
i=1<br />
be the cumulative profit. The right plot on the Figure 3.1 shows several different<br />
sample paths following the median-maximization policy. By comparing the two plots,<br />
we can see that while Vn grows a lot more faster than Yn and thus is much more<br />
profitable on average, it is much more volatile than Yn. We can observe how risky<br />
the expectation-maximizing policy can be; one sample path of Vn reaches below -$700<br />
after the 11 th iteration, implying that the business would have gone bankrupt if it did<br />
not have initial capital in excess of $700.<br />
For the above example, we happened to know the closed-<strong>for</strong>m expression <strong>for</strong> the<br />
quantiles, but there are many applications where we do not have a <strong>for</strong>mula in a closed<strong>for</strong>m.<br />
Often, we do not even know which distribution we are sampling from. In such<br />
Xi<br />
2 ,
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 17<br />
Fig. 3.1. Cumulative profit obtained by following the expectation-maximizing policy (left) and<br />
by following the median-maximizing policy (right), <strong>for</strong> ten sample paths.<br />
Fig. 3.2. Convergence of xn when α = 0.5, 0.684, and 0.8.<br />
cases, we must rely on the algorithm (3.10). Figure 3.2 shows the convergence of the<br />
algorithm (3.10) <strong>for</strong> α = 0.5, 0.684, and 0.8, where xn → 4, 10, and 25, respectively.<br />
The horizontal axis is shown in the logarithmic scale. Table 3.1 shows the expected<br />
profit and the probability of loss <strong>for</strong> different quantiles.
18 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
x ∗<br />
Expected<br />
Profit<br />
Probability of<br />
Loss<br />
α = 0.5 4 36 0<br />
α = 0.684 10 43.2 0<br />
α = 0.8 25 65 0.37<br />
α = 0.9 100 90 0.68<br />
Table 3.1<br />
Optimizing the quantile of the profit <strong>for</strong> the newsvendor problem<br />
4. Trading Electricity in the Presence of Storage. Trading on the electricity<br />
spot market introduces the challenge of dealing with spot prices which are<br />
well-known to be <strong>heavy</strong>-<strong>tailed</strong> [2, 4, 8]. In this section, we show how quantile <strong>optimization</strong><br />
can be used to derive a trading strategy in the presence of storage. At each<br />
time t, we sign a contract and determine whether to buy or sell electricity during<br />
the time interval [t, t + 1). We propose a trading strategy and a per<strong>for</strong>mance metric<br />
that can be assumed to be approximately i.i.d, thus allowing us to apply our quantile<br />
<strong>optimization</strong> algorithm. We use electricity spot market price data from Ercot and<br />
PJM West Hub. Ercot is a utility that covers most of Texas. PJM West covers the<br />
Chicago metropolitan area. Ercot and PJM represent competitive electricity markets<br />
where the price is settled by the market based on the supply and the demand in real<br />
time. The electricity price in Ercot is particularly volatile because of the significant<br />
role wind energy plays in the market. While demand <strong>for</strong> electricity is always random,<br />
reliance on intermittent energy such as wind or solar makes the supply side of the<br />
equation random as well, thus making a large mismatch between supply and demand<br />
more likely, resulting in violent price swings.<br />
4.1. Model. Throughout this section, we assume the amount of electricity we<br />
can deliver to the market by discharging our storage <strong>for</strong> a one hour period is one<br />
megawatt-hour. We assume that the round-trip efficiency of our storage is 0 < ρ < 1.<br />
For most of the commercially available storage devices the value ρ is between .65 and<br />
.80 [40]. We must import 1/ρ megawatt-hours of electricity when charging our storage<br />
device <strong>for</strong> every megawatt-hour of energy that we want to release. Throughout this<br />
section, we assume that we have an infinite capacity storage device. Moreover, we<br />
assume that we are sufficiently small compared to the overall market so that we can<br />
always discharge our storage and sell one-megawatt-hour of electricity to the market<br />
and we can always buy one-megawatt-hour of electricity and charge our storage if we<br />
choose to do so.<br />
State Variables:<br />
Let t ∈ N+ be a discrete time index corresponding to the decision epoch.<br />
pt = price of electricity per megawatt-hour traded during the time interval [t, t + 1) , ∀t.<br />
When we discharge our storage one unit to sell one megawatt-hour of electricity, we<br />
earn pt. When we buy electricity to charge our storage, we are buying 1/ρ megawatt-
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 19<br />
hour of electricity, thus we have to pay pt/ρ.<br />
St = the electricity price history over the last 99 hours = (pt ′) t−99≤t ′ ≤t . The history<br />
St is needed to compute the quantile of the price process. We interpret St as the state<br />
of the system at time t.<br />
Decision (Action) Variable:<br />
xt = decision that tells us to discharge (sell), hold, or charge (buy) our storage at<br />
time t. xt ∈ {−1, 0, 1} where xt = −1 indicates discharge, xt = 0 indicates hold, and<br />
xt = 1 indicates charge.<br />
Exogenous Process:<br />
pt = pt − pt−1 = random variable that captures the evolution of the price process<br />
(pt) t≥0 .<br />
Let Ω be the set of all possible outcomes and let F be a σ-algebra on the set, with<br />
filtrations Ft generated by the in<strong>for</strong>mation given up to time t :<br />
Ft = σ (S0, x0, p1, S1, x1, p2, S2, x2, ..., pt, St, xt) .<br />
P is the probability measure on the measure space (Ω, F) . We have defined the state<br />
of our system at time t as all variables that are Ft-measurable and needed to compute<br />
our decision at time t.<br />
Storage Transition Function:<br />
Rt+1 = Rt + xt.<br />
If xt = 1, we import electricity from the market and store it with some conversion<br />
factor ρ. That is, we purchase 1/ρ megawatt-hours of electricity from the market,<br />
but it only fills up one megawatt-hour worth of electricity in our storage due to the<br />
conversion loss. If xt = −1, the potential energy in the storage is converted into<br />
electricity with a conversion factor of 1, and sold to the market.<br />
4.2. Electricity Price Behavior. At time t, let p (i)<br />
t denote the order statistics<br />
of the past hundred hours of data (pt ′) t−99≤t ′ ≤t sorted in increasing order:<br />
Then, let<br />
p (1)<br />
t ≤ p (2)<br />
t ≤ ... ≤ p (100)<br />
t<br />
<br />
qt := min b ∈ {1, 2, ..., 100} : p (b)<br />
<br />
t ≥ pt<br />
.<br />
(4.1)<br />
be the index of pt in the order statistics. For the hourly electricity spot market price<br />
from Texas Ercot and PJM West Hub,<br />
P [qt ≤ b] ≈ b/100, ∀b ∈ {1, 2, ..., 100} . (4.2)<br />
Next, Table 4.1 shows the behavior of the electricity price <strong>for</strong> Texas Ercot. One<br />
can see that <strong>for</strong> lower values of qt, the probability that the price will increase grows,<br />
and vice versa. The results are similar <strong>for</strong> prices <strong>for</strong> the PJM West hub, as shown
20 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
b = 10 20 30 40 50 60 70 80 90<br />
P pt+1 ≥ 1<br />
0.7 pt | qt ≤ b = .41 .30 .24 .20 .17 .15 .14 .13 .12<br />
P pt+1 ≥ 1<br />
0.75 pt | qt ≤ b = .44 .33 .27 .23 .20 .18 .16 .15 .14<br />
P pt+1 ≥ 1<br />
0.8 pt | qt ≤ b = .47 .37 .32 .28 .25 .22 .21 .19 .18<br />
P pt+1 ≥ 1<br />
0.9 pt | qt ≤ b = .55 .47 .44 .40 .38 .35 .33 .32 .30<br />
P [pt+1 ≤ pt | qt > b] = .52 .53 .54 .56 .57 .59 .62 .66 .74<br />
Table 4.1<br />
Dependence of electricity price behavior on qt <strong>for</strong> Texas Ercot<br />
b = 10 20 30 40 50 60 70 80 90<br />
P pt+1 ≥ 1<br />
0.7 pt | qt ≤ b = .173 .132 .121 .119 .115 .110 .106 .102 .098<br />
P pt+1 ≥ 1<br />
0.75 pt | qt ≤ b = .211 .174 .153 .151 .148 .144 .139 .134 .129<br />
P pt+1 ≥ 1<br />
0.8 pt | qt ≤ b = .249 .205 .193 .191 .190 .186 .181 .176 .171<br />
P pt+1 ≥ 1<br />
0.9 pt | qt ≤ b = .393 .339 .323 .320 .317 .310 .304 .297 .289<br />
P [pt+1 ≤ pt | qt > b] = .52 .54 .55 .56 .58 .60 .65 .72 .83<br />
Table 4.2<br />
Dependence of electricity price behavior on qt <strong>for</strong> PJM West Hub<br />
in Table 4.2. This is because we may assume that the electricity price is medianreverting,<br />
as shown in [18]. According to [18], the electricity price is <strong>heavy</strong>-<strong>tailed</strong> and<br />
non-stationary so that the empirical mean cannot be used to characterize the behavior<br />
of the price. Instead of using standard Gaussian jump-diffusion processes with meanreversion,<br />
[18] shows that the price behavior is more appropriately modelled as being<br />
median-reverting with an underlying <strong>heavy</strong>-<strong>tailed</strong> process. Thus, if the current price<br />
level is at the higher end of the prices realized in the past one hundred hours, we know<br />
there is a high probability that the price will go down due to the median-reversion,<br />
and vice versa.<br />
4.3. A Robust Trading Policy. We believe that a robust policy <strong>for</strong> trading<br />
electricity is a policy whose per<strong>for</strong>mance is consistent across different sample paths<br />
of the price process. The policy optimized based on past data should remain optimal<br />
going into the future. In other words, if we have a parameterized policy, the parameters<br />
that were optimal <strong>for</strong> year 2008 and the parameters that were optimal <strong>for</strong> year<br />
2009 should be almost identical. This implies that by the end of the year 2008, we<br />
would know what policy we will use to trade in year 2009 and we would know that<br />
the policy will per<strong>for</strong>m well. Our goal is to devise a robust trading policy that utilizes<br />
the fact that electricity price is median-reverting, as shown in the previous section.<br />
In prior research, we have shown that the electricity price is <strong>heavy</strong>-<strong>tailed</strong> [18],<br />
which implies that our trading policy should only depend on zeroth-order statistics,<br />
i.e., the quantile function. Since we do not want to have negative exposure to <strong>heavy</strong><strong>tailed</strong><br />
price movements, we must always buy low and sell high, which is possible
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 21<br />
because the price process is median-reverting.<br />
In the previous section, we have shown that we can ascertain how high or low<br />
the current price level is by comparing it to the percentile constructed from the order<br />
statistics. Given a set of potential quantiles 1 ≤ x L ≤ 50 and 51 ≤ x H ≤ 100, our<br />
policy is<br />
where<br />
X π t (St) =<br />
⎧<br />
⎪⎨<br />
⎪⎩<br />
1, if qt ≤ x L<br />
−1, if qt ≥ x H<br />
0, otherwise<br />
π := x L , x H .<br />
. (4.3)<br />
x L determines when to buy electricity. Of course, the lower the threshold x L , the<br />
higher the probability that the price will increase by more than a factor of 1/ρ, as<br />
shown in Table 4.1 and Table 4.2. On the other hand, <strong>for</strong> smaller values of x L , we<br />
will buy electricity less frequently and thus we will have less opportunity to make<br />
money, as implied from (4.2). Similarly, the higher the threshold x H , the higher the<br />
probability that the price will decrease, but we will be able to sell electricity less<br />
frequently, reducing the opportunities to make money.<br />
We define an objective function in the following way:<br />
⎧<br />
⎪⎨<br />
x<br />
L H<br />
ft+1 x , x , ω =<br />
⎪⎩<br />
L <br />
· pt+1 − 1<br />
ρpt <br />
(ω) , if qt ≤ xL 0, if<br />
t<br />
xL < qt < xH <br />
H 100 − x · (pt − pt+1) (ω) , if xH ≤ qt<br />
to be our objective function. Note that<br />
P qt ≤ x L ≈ xL<br />
100 and P qt ≥ x H ≈<br />
100 − xH<br />
,<br />
100<br />
(4.4)<br />
as shown in (4.2). Notice that when x L ≥ qt, we buy electricity and hence we would<br />
make a profit (in a mark-to-market sense) if pt+1 − 1<br />
ρ pt ≥ 0 and have a loss if pt+1 −<br />
1<br />
ρ pt < 0. In the other hand, when x H ≤ qt, we would have correctly sold electricity<br />
if pt+1 − pt ≤ 0, and incur a loss in opportunity cost if pt+1 − pt ≥ 0. Clearly<br />
(4.4) reflects these observations and allows us to calculate the one-step contribution<br />
of buying, selling, or holding electricity in our storage device. <br />
L H Proposition 4.1. Given Ft, define the objective function ft+1 x , x , ω as in<br />
(4.4). Then x L satisfies the monotonicity condition (3.2) while x H satisfies (3.3).<br />
<br />
L H Proof. Let ft+1 x , x <br />
denote the random variable<br />
<br />
of which<br />
<br />
specific sample re-<br />
L H L H alization is ft+1 x , x , ω . Then, <strong>for</strong> ∀ω ∈ Ω, ft+1 x , x , ω is a concave function<br />
of xL , xH . When xL ≥ qt, we buy electricity and hence we would make a profit (in<br />
a mark-to-market sense) if pt+1 − 1<br />
ρ pt ≥ 0 and have a loss if pt+1 − 1<br />
ρ pt < 0. When<br />
x H ≤ qt, we would have correctly sold electricity if pt+1 − pt ≤ 0, and incur a loss in<br />
opportunity cost if pt+1 − pt ≥ 0.<br />
Next, we know that<br />
⎧<br />
∂ ⎨<br />
L H<br />
ft+1 x , x , ω =<br />
∂xL ⎩<br />
<br />
pt+1 − 1<br />
ρ pt<br />
<br />
(ω) , if qt ≤ x L t ,<br />
0, else.
22 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
and<br />
For some ω 1 , ω 2 ∈ Ω,<br />
⎧<br />
∂ ⎨<br />
L H<br />
ft+1 x , x , ω =<br />
∂xH ⎩<br />
(pt+1 − pt) (ω) , if x H ≤ qt,<br />
0, else.<br />
L H 1<br />
ft+1 x , x , ω L H 2<br />
≤ ft+1 x , x , ω <br />
if and only if one of the following three conditions hold:<br />
1. qt ≤ xL <br />
and<br />
ω<br />
1 <br />
≤<br />
2. x L < qt < x H<br />
pt+1 − 1<br />
ρ pt<br />
pt+1 − 1<br />
ρ pt<br />
3. x H ≤ qt and (pt+1 − pt) ω 1 ≤ (pt+1 − pt) ω 2 ,<br />
These conditions hold true if and only if<br />
and<br />
∂<br />
ft+1<br />
∂xL ∂<br />
ft+1<br />
∂xH L H 1<br />
x , x , ω ≤ ∂ L H 2<br />
ft+1 x , x , ω<br />
∂xL <br />
L H 1<br />
x , x , ω ≥ ∂ L H 2<br />
ft+1 x , x , ω<br />
∂xH .<br />
There<strong>for</strong>e, x L satisfies (3.2) while x H satisfies (3.3).<br />
Then, our goal is to compute x L∗ , x H∗ such that<br />
ω 2 <br />
L∗ H∗<br />
x , x := argmax<br />
1≤xL≤50,50
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 23<br />
Fig. 4.1. x L t<br />
computed from (4.5) as t → ∞, <strong>for</strong> α = 0.5 and ρ = 0.7.<br />
α = 0.3 α = 0.4 α = 0.5<br />
ρ = 0.7 36.2 43.6 49.4<br />
ρ = 0.75 37.1 44.6 50.5<br />
ρ = 0.8 39.0 46.3 51.6<br />
Table 4.3<br />
x L computed from PJM West Hub Data<br />
and round-trip conversion factor ρ. The lower the value of ρ, the greater the conversion<br />
loss, and hence our decision to buy must be more conservative, leading to a lower x L .<br />
And of course, a larger value of α leads to more risk taking and hence higher x L . We<br />
show results <strong>for</strong> α ≤ 0.5 because <strong>for</strong> larger α, we end up with policies that take too<br />
much risk and buy electricity too readily and end up not making a reasonable amount<br />
of profit. Similarly, Table 4.4 shows x L computed from the above algorithm using<br />
Texas Ercot price data. Next, Table 4.5 shows x H computed from PJM West and<br />
Texas Ercot data <strong>for</strong> some α ≤ 0.6. For α > 0.6, we end up with policies that take<br />
too much risk and sell our electricity too easily and end up not making a reasonable<br />
amount of profit.<br />
5. Maximum Profit Trading Policy. In the previous section, we have shown<br />
how to compute x L and x H based on our risk-appetite determined by α. If we want to<br />
find out exactly how much risk we would have had to accept in the past to maximize
24 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
α = 0.3 α = 0.4 α = 0.5<br />
ρ = 0.7 32.6 39.6 46.0<br />
ρ = 0.75 33.7 40.8 47.2<br />
ρ = 0.8 35.5 42.2 49.0<br />
Table 4.4<br />
x L computed from Texas Ercot Data<br />
α = 0.3 α = 0.4 α = 0.5 α = 0.6<br />
PJM West 87.9 75.2 62.4 50.1<br />
Texas Ercot 85.4 72.1 59.7 45.5<br />
Table 4.5<br />
x H computed from Texas Ercot and PJM West Data<br />
our profit, we can first find x L and x H that maximizes the cumulative profit<br />
C x L , x H := −<br />
T<br />
ptxt,<br />
through deterministic search. Then, α that is closest to producing the above to x L<br />
and x H tells us what the necessary risk-appetite should have been <strong>for</strong> us to have<br />
maximized our profit. Assuming ρ = .75, <strong>for</strong> year 2006, when the median price of<br />
the year was $47.1 in Texas Ercot, x L = 39 and x H = 71 would have given us the<br />
highest profit of $7.6 per hour, and these correspond to α ≈ 0.36. In other words, we<br />
should have been fairly conservative in controlling the probability of losses in order<br />
to maximize our profit. This is because the <strong>heavy</strong>-<strong>tailed</strong> spikes in the electricity<br />
prices have a disproportionate impact on profits and losses. Instead of trading often<br />
and trying to make a profit as often as possible, we must be patient and wait long<br />
enough <strong>for</strong> large price deviations to occur. These large deviations actually occur often<br />
enough so that we can eventually get a good price <strong>for</strong> buying and selling electricity at<br />
a good profit. For year 2007, when the median price of the year was $48.3, x L = 36<br />
and x H = 68 would have been optimal, giving us a profit of $7.2 per hour. This<br />
again corresponds to about α ≈ 0.36. Thus, even though the electricity price is highly<br />
volatile and non-stationary with wildly differing price <strong>distributions</strong> over time, the riskappetite<br />
that would have been optimal <strong>for</strong> year 2006 was still optimal <strong>for</strong> year 2007.<br />
Results are similar <strong>for</strong> year 2008 and 2009 and <strong>for</strong> the data from PJM West hub. The<br />
trading policy based on optimizing the quantiles is truly robust - it is not sensitive<br />
to the particular sample paths of data. This is because the policy captures the core<br />
characteristics of the data. The hourly electricity spot price is median-reverting and<br />
<strong>heavy</strong>-<strong>tailed</strong>.<br />
Table 5.1 shows the per<strong>for</strong>mance of the above trading policy from year 2007 to<br />
2009 in Ercot. We can see that the difference between the profit we can make using<br />
the optimal parameters x L , x H and the profit we can make using the parameters<br />
computed from the previous year’s data, is negligible, implying that the quantile-based<br />
trading policy is robust.<br />
t=0
QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS 25<br />
year median price optimal (xL, xH) optimal profit last year’s (xL, xH) profit<br />
2007 $48.3 (36, 68) $7.2/hour (39, 71) $7.1/hour<br />
2008 $48.7 (39, 68) $17.0/hour (36, 68) $16.8/hour<br />
2009 $23.5 (39, 69) $6.3/hour (39, 68) $6.2/hour<br />
Table 5.1<br />
Per<strong>for</strong>mance of quantile based trading policy in Ercot<br />
6. Conclusion. In this article, we have presented a simple, provably convergent<br />
algorithm <strong>for</strong> optimizing the quantile of a continuous random function which may not<br />
have expectation. The algorithm follows the spirit of Robbins and Monro [36], and<br />
does not require us to store and sort the data. In the real world, we must often deal<br />
with non-stationary and <strong>heavy</strong>-<strong>tailed</strong> data. In that case, a policy based on higherorder<br />
statistics such as the expectation and the variance can be unstable at best or<br />
enormously risky, as we have shown with our example of newsvendor problem. We<br />
also showed that in the electricity spot market, a simple trading policy based on<br />
quantile <strong>optimization</strong> is robust and generate good profit.<br />
REFERENCES<br />
[1] G. R. Arce and J. L. Paredes, Recursive weighted median filters admitting negative weights<br />
and their <strong>optimization</strong>, IEEE Trans. Signal Processing, 48 (3), 2000, pp. 768–779.<br />
[2] J. P. Bouchaud, Power laws in economics and finance: some ideas from physics, Quantitative<br />
Finance, 1 (1), 2001, pp. 105–112.<br />
[3] D. Bertsimas and A. Thiele, A robust <strong>optimization</strong> approach to inventory theory, Operations<br />
Research, 54 (1), 2006, pp. 150–168<br />
[4] W. Breymann, A. Dias, and P. Embrecths, Dependence structure <strong>for</strong> multivariate highfrequency<br />
data in finance, Quantitative Finance, 3 (1), 2003, pp. 1–14.<br />
[5] H. N. E. Bystrom, Extreme value theory and extremely large electricity price changes, International<br />
Review of Economics & Finance, 14 (1), 2002, pp. 41–55.<br />
[6] A. Cartea and M. G. Figueroa, Pricing in electricity markets: a mean-reverting jumpdiffusion<br />
model with seasonality Applied Mathematical Finance, 12 (4), 2005, pp. 313–335.<br />
[7] A. Eydeland and K. Wolyniec, Energy and Power Risk Management, John Wiley & Sons,<br />
New Jersey, 2003.<br />
[8] J. D. Farmer and F. Lillio, On the origin of power law tails in price fluctuations, Quantitative<br />
Finance, 4 (1), 2004, C7–C11.<br />
[9] A. P. George and W. B. Powell, Adaptive stepsizes <strong>for</strong> recursive estimation with applications<br />
in approximate dynamic programming, Machine Learning, 65 (1), 2006, pp. 167–198.<br />
[10] B. M. Hill, A simple general approach to inference about the tail of a distribution, Annals of<br />
Statistics, 3 (5), 1975, pp. 1163–1174.<br />
[11] P. Heidelberger and P. A. W. Lewis, <strong>Quantile</strong> Estimation in Dependent Sequences, Operations<br />
Research, 32 (1), 1984, pp. 185–209.<br />
[12] D. P. Heyman and M. J. Sobel, Stochastic Models in Operations Research, Dover Publications,<br />
New York, 2003.<br />
[13] T. Hsing, On tail index estimation using dependent data, Annals of Statistics, 19 (3), 1991,<br />
pp. 1547–1569.<br />
[14] R. Huisman and R. Maiheu, Regime jumps in electricity prices, Energy Economics, 25 (5),<br />
2003, pp. 425–434.<br />
[15] D. Hutchinson and J. C. Spall, Stopping small-sample stochastic approximation, in American<br />
Control Conference, 2009, AAC’09, IEEE, pp. 26–31.<br />
[16] V. R. Joseph, Y. Tian, and C. f. Wu, Adaptive designs <strong>for</strong> stochastic root-finding, Satistica<br />
Sinica, 17 (4), 2007, pp. 1549–1565.<br />
[17] A. I. Kibzun and V. Y. Kurbakovskiy, Guaranteeing approach to solving quantile <strong>optimization</strong><br />
problems, Annals of Operations Research, 30 (1), 1991, pp. 81–93.
26 QUANTILE OPTIMIZATION FOR HEAVY-TAILED DISTRIBUTIONS<br />
[18] J. H. Kim and W. B. Powell, An Hour-Ahead Prediction Model <strong>for</strong> Heavy-Tailed Spot Prices,<br />
submitted to Energy Economics, 2011.<br />
[19] A. J. Kleywegt, A. Shapiro, and T. Homem-de-Mello, The sample average approximation<br />
method <strong>for</strong> stochastic discrete <strong>optimization</strong>, SIAM Journal on Optimization, 12 (2), 2002,<br />
pp. 479–502.<br />
[20] R. Koenker, <strong>Quantile</strong> Regression, Cambridge University Press, New York, 2005.<br />
[21] H. J. Kushner, and G. Yin, Stochastic approximation and recursive algorithms and applications,<br />
Sringer-Verlag, New York, 2003.<br />
[22] T. L. Lai, Stochastic approximation: invited paper, Annals of Statistics, 31 (2), 2003, pp.<br />
391–406.<br />
[23] T. l. Lai and H. Robbins, Adaptive design and stochastic approximation, The Annals of<br />
Statistics, 7 (6), 1979, pp. 1196–1221.<br />
[24] T. L. Lai and H. Robbins, Consistency and asymptotic efficiency of slope estimates in stochastic<br />
approximation schemes, Probability Theory and Related Fields, 56 (3), 1981, pp. 329–<br />
360.<br />
[25] B. LeBaron, Stochastic volatiliy as a simple generator of apparent financial power laws and<br />
long memory, Quantitative Finance, 1 (6), 2001, pp. 621–631.<br />
[26] J. J. Lucia and E. S. Schwartz, Electricity prices and power derivatives: Evidence from the<br />
Nordic power exchange, Review of Derivatives Research, 5 (1), 2002, pp. 5–20.<br />
[27] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation<br />
approach to stochastic programming, SIAM Journal on Optimization, 19 (4), 2009, pp.<br />
1574-1609.<br />
[28] L. Peng, Estimating the mean of a <strong>heavy</strong>-<strong>tailed</strong> distribution, Statistics and Probability Letters,<br />
52 (3), 2001, pp. 255–264.<br />
[29] L. Peng, Empirical-likelihood based confidence interval <strong>for</strong> the mean with a <strong>heavy</strong>-<strong>tailed</strong> distribution,<br />
Annals of Statistics, 32 (3), 2004, pp. 1192–1214.<br />
[30] L. Peng and Y. Qi, Confidence regions <strong>for</strong> high quantiles of a <strong>heavy</strong>-<strong>tailed</strong> distribution, Annals<br />
of Statistics, 34 (4), 2006, pp. 1964–1986.<br />
[31] N. C. Petruzzi and M. Dada, Pricing and the newsvendor problem: A review with extensions,<br />
Operations Research, 47 (2), 1999, pp. 183–194.<br />
[32] D. Pilipovic, Energy Risk: Valuing and Managing Energy Derivatives, Mcgraw-Hill, New<br />
York, 2007.<br />
[33] W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality,<br />
John Wiley and Sons, New York, 2007.<br />
[34] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming,<br />
John Wiley and Sons, New York, 1994.<br />
[35] S. Resnick and C. Starica, Tail index estimation <strong>for</strong> dependent data, Annals of Applied<br />
Probability, 8 (4), 1998, pp. 1156–1183.<br />
[36] H. Robbins and S. Monro, A stochastic Approximation Method, Annals of Mathematical<br />
Statistics, 22 (3), 1951, pp. 400–407.<br />
[37] A. Ruszczyński and A. Shapiro, Stochastic Programming, Handbooks in Operations Research<br />
and Management Science, 10, Elsevier, 2003.<br />
[38] A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on Stochastic Programming: Modeling<br />
and Theory, MPS-SIAM Series on Optimization, 9, MPS-SIAM, Philadelphia, 2009.<br />
[39] E. S. Schwartz, Stochastic behavior of commodity prices: Implications <strong>for</strong> valuation and<br />
hedging, The Journal of Finance, 52 (3), 1997, pp. 923–974.<br />
[40] R. Sioshansi, P. Denholm, T. Jenkin, and J. Weiss, Estimating the value of electricity<br />
storage in PJM: Arbitrage and some welfare effects, Energy Economics, 31 (2), 2009, pp.<br />
269–277.<br />
[41] P. Sunehag, J. Trumpf, S. V. N. Vishwanathan, and N. Schraudolph, Variable metric<br />
stochastic approximation theory, Arxiv preprint arXiv:0908.3529v1, 2009.<br />
[42] N. N. Taleb, Black swans and the domains of statistics, The American Statistician, 61 (3),<br />
2007, pp. 198–200.<br />
[43] N. N. Taleb, Errors, robustness, and the fourth quadrant, International Journal of Forecasting,<br />
25 (4), 2009, pp. 744–759.<br />
[44] L. Yin, R. Yang, M. Gabbouj, and Y. Neuvo, Weighted Median Filters: A Tutorial, IEEE<br />
Transactions on Circuits and Systems, 43 (3), 1996, pp. 157–192.