NON-ADDICTIVE SAMPLE BASED FACTS - Deetc
NON-ADDICTIVE SAMPLE BASED FACTS - Deetc
NON-ADDICTIVE SAMPLE BASED FACTS - Deetc
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>NON</strong>-<strong>ADDICTIVE</strong> <strong>SAMPLE</strong> <strong>BASED</strong> <strong>FACTS</strong><br />
A dimensional model applied to audience analysis<br />
Nuno Datia, Helder Pita<br />
Instituto Superior de Engenharia de Lisboa (ISEL),<br />
Departamento de Engenharia Electrónica e Telecomunicações e de Computadores (DEETC), Lisboa, Portugal<br />
datia@isel.ipl.pt,hp@isel.ipl.pt<br />
João Moura-Pires<br />
Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa (FCT-UNL),<br />
Departamento de Informática, Monte da Caparica, Portugal<br />
jmp@di.fct.unl.pt<br />
Keywords: Data warehouse, data mart, dimension modelling, non-additive facts, online analytical processing, people<br />
meter data, decision support.<br />
Abstract: In every Online Analytical Processing (OLAP) cube, there are certain type of measures, generally referred<br />
as facts, that must be treated carefully, due to their non-addictive nature. This means they can’t be added<br />
directly to present a summarised result. In this paper, the authors focused on a specific type of non-addictive<br />
facts, quota sampled based. These are, generally, ratios that need to be normalised against a reference value<br />
calculated from a subset of the quota sample. This process guarantee the representativeness of the measure.<br />
However, the reference value is not static, as it changes with the chosen subset. This subset is user depended,<br />
as result of restrictions the user impose the data, through the analysis interface. Using the audience analysis<br />
domain, where almost all performance indicators are non-addictive, this paper discuss a specific dimensional<br />
model, OLAP oriented, capable of address the non-addictive facts’ singularities. The model is star schema<br />
based, that addresses both efficiency and simplicity, and is targeted to audience analysis of TV generic programs.<br />
1 INTRODUCTION<br />
The Datawarehouse along with the operational system,<br />
are, today, the foundation of an organisation data<br />
centre. Where the first will seek to hold accurate,<br />
current data, the data warehouse will seek a much<br />
broader job: hold a series of snapshots of data over<br />
time. The subject is sufficient mature and its possible<br />
to resume the most valuable work to just two schools<br />
of thought: the bottom-up approach proposed, by<br />
Kimball [9], and the top down approach, proposed<br />
by Inmom [6]. However, despite their divergences,<br />
both agree the operational system and datawarehouse<br />
world is different. Those differences lead to a change<br />
on the both the methodology of development and the<br />
model used to store the data. The key characteristics<br />
of a datawarehouse are [6] : (i) subject-oriented; (ii)<br />
integrated; (iii) time variant; and (iv) non-volatile.<br />
Dimension modelling consists on a simple, performance<br />
oriented, relational model, capable of stored<br />
the happenings of the business. It consists of two<br />
types of tables:<br />
• Fact tables, where the numerical performance<br />
measurements of the business are stored;<br />
• Dimension tables, that store textual attributes,<br />
that give context to the facts.<br />
The fact table expresses the many-to-many relationships<br />
between dimensions. The primary key of the<br />
fact table are, generally, the set of foreign keys to the<br />
dimensions. So, dimensions give context to measures<br />
stored at the fact table.<br />
From a conceptual point of view, a dimension model<br />
can be represented as N dimensional sparse space,<br />
where axis represents dimensions and the intersections<br />
of those axis represent measures. This space<br />
is normally referred to as Online Analytic Processing<br />
Cube [2]. The manipulation of this cube is carried<br />
through a series of operations [7] (slice, pivot, drill<br />
down and up), and an aggregation function to summarise<br />
facts, whose nature dictate the set of functions<br />
that can be use. Facts can be [9]:<br />
• Additive, can be summed up through all of the<br />
dimensions;
• Semi-additive, can be summed up only for a subset<br />
of the dimensions;<br />
• Non-additive, cannot be summed for any of the<br />
dimensions.<br />
Non-additive does not directly relates with the measure<br />
being numerical; it relates with the usefulness<br />
(and accuracy) of the summed value. It may simply<br />
not make sense to the business. Ratios are an example<br />
of a non-addictive fact.<br />
The audience analysis domain shared with OLAP and<br />
Datawarehouse, all of the characteristics listed above.<br />
The volume of data are too similar in size. However,<br />
audience analysis world tends to be closed and proprietary<br />
oriented, and it’s applications do not apply,<br />
directly, generic OLAP techniques. Programs like<br />
[10] and [13] use pivot tables like interfaces, typical<br />
in OLAP, but do not use more of the OLAP reporting<br />
facilities. The former, [13], is developed by one of the<br />
leading software houses in audience analysis; it uses<br />
a proprietary text file format to store information, not<br />
an OLAP engine. In this paper, the authors evaluate<br />
the feasibility of general OLAP techniques applied to<br />
audience analysis domain. The purpose is to determine<br />
if it’s possible to use general, non proprietary<br />
solutions to achieve the same degree of freedom in<br />
audience analysis, as the referred tools. Particularly,<br />
we discuss a dimension model that addresses the singularity<br />
of the audience analysis performance indicators,<br />
almost all of them quota sample based and nonaddictive.<br />
Some of the necessary values to compute<br />
these indicators, mainly related to representativeness,<br />
are only known at runtime, after the users finished the<br />
analysis’ restrictions.<br />
One couldn’t find any related work, regarding<br />
dimension model and sample quota based facts. The<br />
main datawarehouse and olap literature [6, 9] do not<br />
address this kind of facts. And being a relative closed<br />
world, audience analysis are not an ease domain to<br />
build a state of the art [14]. The few articles publicly<br />
available are related to datamining and audience<br />
patterns analysis and not to datawarehouse and olap<br />
techniques appliance.<br />
This paper is organised as follows: section 2 describes<br />
the data, the requirements and some performance<br />
indicators used in audience analysis; section 3<br />
present the dimensional model, using Kimball’s approach,<br />
for a generic TV content analysis datamart;<br />
finally, in section 4 the authors draw some conclusions<br />
of their work.<br />
2 PROBLEM DOMAIN<br />
2.1 The Data<br />
The meter system seldom produces the data for the<br />
entire viewers’ panel viewers, since several practical<br />
problems may occur which range from meter misuse,<br />
to data communication problems, which endanger<br />
the desired panel representativeness. In order to<br />
adjust the panel representativeness, a weighting procedure<br />
is used which attempts to correct undesirable<br />
non-representative data tendencies. This is the Rim<br />
Weighting algorithm [4] (also known as Iterative Proportional<br />
Fitting [1]), which provides a daily weight<br />
for each viewer trying to recover the panel sample<br />
representativeness.<br />
People meter data have three distinct components:<br />
(i) socio-demographic information; (ii) TV content<br />
data; (iii) visualisation data. The socio-demographic,<br />
which include characteristics such as age, occupation<br />
and social class, provide important information to determine<br />
the panel representativity, but may also be<br />
used for other purposes (mostly for advertisement).<br />
The TV content is characterized by their type of contents,<br />
duration and corresponding TV channels. Finally,<br />
meter data measures the viewing behavior of<br />
each one of the viewers, represented by a sequence of<br />
watch/non watch indicators referred to each second of<br />
the day, as illustrated by figure 1. These indicators are<br />
registered independently for each channel.<br />
Figure 1: Representation of a daily viewing pattern for an<br />
individual.<br />
2.2 Requirements<br />
The requirements were gathered through the analysis<br />
of several audience reports, and for on site<br />
day by day usage of the audience analysis program<br />
Telereport[13]. The reports are sufficient broad to embrace<br />
a series of heterogeneous users, ranging from<br />
television to advertising. They cover the most important<br />
aspects of general TV content performance analysis.<br />
However, they do not address the singularities<br />
of advertising spots. The audience analyst are interested,<br />
mainly, to determine the audiences by channel
and/or program, for a set of targets and time periods;<br />
the analysis are never carried through individual<br />
viewers. Viewers are arranged in targets, that closely<br />
relate to socio-demographic information. For example,<br />
a typical target is the AB, consisting of high income<br />
viewers. Time periods range from minute by<br />
minute analysis, to standard audience periods, with 2<br />
or more hours. Periods with more than a day, must<br />
be treated carefully, as the corrective weight change<br />
daily. More details can be found in the work of [3].<br />
2.3 Performance Indicators<br />
Audience analysis define more than a dozen performance<br />
indicators [3]. However, as a regular basis,<br />
audience analysts tend to use only a few. For demonstration<br />
purposes, these paper use only one: TV rating<br />
(rat).<br />
TV rating give us the percentage of the population<br />
that, for a given period, have watched a program/channel.<br />
It tell us the probability of an individual<br />
from the population become a viewer. Let P be<br />
the daily viewers’ panel and W the set of corrective<br />
weights. The rating for a single minute can be calculated<br />
as<br />
|V |<br />
ratminute = ∑<br />
i=0<br />
Vi ∗W(i),V ⊂ P (1)<br />
where V is the subset of individuals that watch television,<br />
on the a specific channel. (1) can be extend to<br />
accommodate a set of minutes M<br />
ratperiod =<br />
∑ ratminute(m)<br />
m∈M<br />
(2)<br />
|M|<br />
However, (2) does not fully address the representativeness<br />
for a target. By definition, most of the performance<br />
indicators relate to a reference value and can<br />
be displayed as percentages. When one say that program<br />
X got a 20% rating, those are the percentage of<br />
the viewers from the panel that share the desire sociodemographic<br />
value, and happened to be watching TV<br />
during the analysis period. Let Tsd be the subset of<br />
the panel that share the desire socio-demographic values<br />
sd. One can determine the sum of the corrective<br />
weigths TW for the target as follows<br />
TW = ∑ W(i) (3)<br />
Tsd∈P<br />
Normalising (2) using (3) give us the representative<br />
value for the desired target<br />
rat = ratperiod<br />
(4)<br />
tw<br />
The rating calculated from (4) is valid measure, not<br />
only for the panel’s viewers, but also for the entire<br />
population of TV viewers.<br />
3 THE MODEL<br />
3.1 Dimensions<br />
From the requirements, each performance indicator is<br />
given context by: (i) Time, detailed to the minute;<br />
(ii) Date, giving access to the daily corrective weight;<br />
(iii) Program, the object of the performance analysis;<br />
and (iv) Socio-demographic information, enabling<br />
target analysis. Date is a regular dimension,<br />
whose characteristics and typical attributes can be observed<br />
in [9] and [6]. Time dimension represents the<br />
daily regularity, but also, typical time periods and a<br />
domain depended referential. The considered periods<br />
are the TV standard 1 : ”02h30-08h00”, ”08h00-<br />
12h00”, ”12h00-14h30”, ”14h30-18h30”, ”18h30-<br />
20h00”, ”20h00-23h00”, ”23h00-24h30”, ”24h30-<br />
26h30”.<br />
Socio-demographic dimension encloses some of<br />
the most used demographic information, such as<br />
Genre, Region, Social Class, Occupation, and Housewife<br />
status. The age information is generally divided<br />
from 5 to 7 intervals. The authors decided<br />
to use the seven interval division: ”[4-14]”,”[15-<br />
24]”, ”[25-34]”, ”[35-44]”, ”[45-54]”, ”[55-64]”,<br />
”>64”. The dimension is populated with all the possible<br />
attribute’s values combination. For the used<br />
data 2 , the value is 6720.<br />
Program dimension encloses all the information<br />
related to a TV show; their name, description, type,<br />
classification and duration. TV shows are classified<br />
with a three level taxonomy, at most. The first level<br />
indicates the topmost classification, e.g. Fiction. Second<br />
level classify the show inside the first level, e.g.<br />
Fiction-Film. Finally, the last level give the maximum<br />
detailed classification for the show, e.g. Fiction-Film-<br />
Comedy. Notice that not all of the shows have a 2nd<br />
or 3rd level of detail. Classification is an enclosed hierarchy<br />
inside the program dimension [9]. However,<br />
it’s not very deep nor its value is unknown; it can be<br />
modelled simply with 3 attributes, one for each level,<br />
with no lack of generality.<br />
With these dimensions, the granularity of the fact<br />
table was fixed at the minute. Note that such a model<br />
doesn’t not addresses the necessities of advertising<br />
spots, which is out of the scope of this work.<br />
3.2 Facts<br />
From a relational point of view, an OLAP cube is the<br />
projection of a relation R, where X1, X2, ..., Xn are<br />
1 For Portuguese TV audience analysis, by the year 2006<br />
2 From 2001
attribute keys and K is the remaining attribute. From<br />
an OLAP point of view, X1, X2, ..., Xn turn to be the<br />
axis of the cube and each value of attribute K is in<br />
the intersection of those axis. We can express K as a<br />
function<br />
K : f (X1,X2,...,Xn) (5)<br />
If one of the axis is omitted from (5), e.g Xn, we<br />
are performing a similar operation as a dimensional<br />
reduction<br />
Knew : f (X1,X2,...,Xn−1) (6)<br />
However, in (5), the result set does not contain<br />
duplicates, because the key set is contained in<br />
the projection result. In (6), to guarantee the<br />
distinction, K needs to be aggregated for each<br />
distinct tuple of the projection set PS set defined as<br />
(x1,x2,x3,...,xn)|xn ∈ dom(Xn). Using the sum as the<br />
aggregation function, the attribute K is summarised as<br />
∑<br />
xn∈dom(Xn)<br />
f (x1,x2,...,xn−1,xn) (7)<br />
The facts in the OLAP Cube does not always directly<br />
point to an a single attribute. Their definition can be<br />
based upon a mathematical operation applied to one<br />
or more data elements. These are generally referred<br />
to as derived data or derived facts [5]. If the operation<br />
defines a ratio3 , the application of (7) results in a nonmeaningful<br />
value, because ratios are non-addictive.<br />
The performance indicators cannot be pre-calculated<br />
and stored directly in the fact table; it must be implemented<br />
as a derived fact.<br />
To be able to properly calculate the performance<br />
indicators for the domain, e.g. rat, a corrective<br />
weight must be used, as can be see in (4). Since<br />
query’s targets are user depended, determined in<br />
runtime by the users’ restrictions over the Sociodemographic<br />
dimension, one can’t pre-determine the<br />
right weight to use. It’s necessary to store all the<br />
possible weights, using (3) to calculate them. For a 4<br />
attribute key cube, we are dealing with a theoretical<br />
value of 15 possible combinations, sum 4 k=1 (4 i<br />
). If one<br />
decided to stored all the possible weights as facts, for<br />
each tuple of the fact table 14 unnecessary values are<br />
stored, as only one is valid for each query. Besides,<br />
the number of possible targets increase this number.<br />
This approach is also not feasible because, for some<br />
indicators, it is necessary to store the weights for<br />
non-viewers, that is, individuals that didn’t watch<br />
television during the analysis period. Fact tables store<br />
only one type of occurrence, in this case, the fact that<br />
some individual watched television; it’s not a good<br />
3 Ratios are not the only type of non-addictive facts.<br />
practice to store the opposite fact too. One must use<br />
a more straightforward approach, using some domain<br />
knowledge.<br />
The data are quota sample-based, which means the<br />
sample was designed to be a representative subset of<br />
the population, regarding some descriptive characteristics.<br />
In this sense, it’s impossible to have an individual<br />
to support simultaneously two or more values for<br />
an attribute. For example, an individual that is present<br />
in the target males with ages ranging from 4 to 14 cannot<br />
be part of other target, females with ages ranging<br />
from 4 to 14.<br />
Property 1. Let Ta be a target with a restriction over<br />
a socio-demographic attribute a,<br />
ta=v1 ∩ta=v2 = /0<br />
If the weights are normalised, which can be done during<br />
the ETL 4 proccess, for any two or more disjoint<br />
subsets, their weights sum up to one.<br />
Property 2. Let Tb be a target with a restriction<br />
over socio-demographic two valued attribute b,<br />
∑ Ta=0∈PW(i) + ∑ Ta=1∈PW(i) = 1 is always true.<br />
Knowing that, it’s possible to create a fact table that<br />
store the weights for all possible targets, including<br />
the non-viewers individuals. That fact table shares<br />
the data and socio-demographic dimensions. Each tuple<br />
in the fact table represent the reference value for<br />
a specific combination of socio-demographic values,<br />
for a given day, applying (3) for each distinct combination.<br />
Figure 2 illustrate the Contact star-schema.<br />
Figure 2: Illustration of the contact starschema model.<br />
The former model provide the weights for all the contacted<br />
individuals, viewers or not, for one day. To calculate<br />
the performance indicators for the domain, e.g.<br />
rating, it’s also necessary to determine the weights of<br />
the viewers. To address this issue, it’s necessary to<br />
create another fact table, that store the viewers’ daily<br />
corrective weight. Since this table is indexed by all<br />
of the dimensions discussed so far, Date, Time, Program<br />
and Socio-Demographic, a tuple represents<br />
4 Extraction Transformation and Loading
a contact of a set of viewers, sharing equal sociodemographic<br />
values, for one minute, a particular program/channel<br />
and a specific day. By observation of<br />
(4), it’s necessary to store the weights and the number<br />
of minutes of the interval. Figure 3 illustrate the<br />
audience star-schema, the main portion of the overall<br />
model.<br />
Figure 3: Illustration of the audience starschema model.<br />
3.3 Optimization of the model<br />
The model is not optimized. The number of theoretical<br />
tuples per day are nearly 9,6 millions (1440 minutes<br />
x 6720 socio-demographic combinations). Even<br />
for the more realistic 1 3 of that value, the table grows<br />
40 million tuples per month. Any optimization of the<br />
model must be thought in terms of gain vs benefit,<br />
as is necessary to create auxiliary aggregate models,<br />
based on requirements.<br />
The first aggregate take advantage of the typical<br />
targets used in TV analysis. Not all of the sociodemographic<br />
combinations are interesting, so, only<br />
8 targets were considered here: Universe, all of the<br />
viewers; Class AB, the most wealthy viewers; 4:14,<br />
young children and teenagers; Housewife, viewers<br />
that are responsible for acquiring essential products<br />
for the house; Adults, viewers above 14 years old;<br />
ABC1 25:34, young active working viewers, from<br />
wealthiest social classes, ranging from 25 to 34 years<br />
old; ABC1 15:34, Young viewers from wealthiest social<br />
classes, ranging from 15 to 34 years old; ABC<br />
25:54, active working viewers, from all social stratus,<br />
except lower classes.<br />
This aggregate represents a 70% reduction of the<br />
tuples needed, compared to the one illustrated on<br />
figure 3. It is necessary to create a new dimension to<br />
accommodate the previous targets. The aggregate’s<br />
model share the Date dimension with the others. The<br />
fact table store the sum of the daily weight for the<br />
target.<br />
Other possible optimization is to aggregate by time<br />
periods. Several reports calculate the indicators for<br />
specific periods, e.g. prime-time. In this model,<br />
the Time dimension is replaced by a new dimension<br />
TimePeriod, a projection of the former, with only the<br />
8 value attribute Period. Consequently, the Program<br />
dimension is incompatible with the new aggregate’s<br />
granularity. From Program, is derived another new<br />
dimension, Channel, with only one attribute, channel.<br />
This represent a reduction of 180% in the first<br />
dimension and 60% in the second. The remaining<br />
dimensions of the main model (figure 3), are compatible.<br />
The fact table store the sum of the daily<br />
weight, for a specific channel, period, date and sociodemographic<br />
combination, and the time interval, in<br />
minutes. Figure 4 illustrates the overall model. The<br />
darker rectangles represent dimensions; the remaining<br />
are the fact tables.<br />
Figure 4: Illustration of the overall model. Bounded tables<br />
are aggregate specific and only exists to increase the models’<br />
performance.<br />
3.4 Evaluation<br />
The evaluation of the model was not based on performance<br />
but on flexibility and simplicity instead. One<br />
of the main goal was to prove the usability of generic<br />
OLAP tools fulfil the needs of the audience analysis<br />
domain, with the same degree of freedom and leading<br />
to the same results. A series of reports were made<br />
using Telereport [13] and the results kept as reference<br />
values. Those reports are expected to be representative<br />
of the daily necessities of an audience analyst.<br />
For lack of space, we do not address the ETL process,<br />
nor it’s evaluation. It suffice to say the OLAP cubes<br />
were created and populated with one year data, using<br />
SQL Server Data Transformations Services [12],<br />
for the ETL, and SQL Server Analysis Services [11]<br />
as a multidimensional engine. For each report, an<br />
MDX query was developed to mimic it, and then executed<br />
against our data. Every execution confirm the<br />
expected reference values, demonstrating that, for the<br />
tested reports, the model is adequate. The benefit of<br />
the aggregates were not tested in performance, but<br />
rather in simplicity. Listing 1 illustrate the necessary
code to implement one of the report. In the particular<br />
case, the query execution display the rating, for the<br />
main targets, by time periods, for one specific channel.<br />
WITH MEMBER Measures . TargetTotal as<br />
’ LookupCube (" ContactTarget ",<br />
"( Measures . weight ,"+ membertostr (<br />
Target . currentmember )+")") ’<br />
MEMBER Measures . rat as<br />
’ Measures . weight / Measures . TargetTotal<br />
/ Measures . num_min ’<br />
, FORMAT_STRING =’ Percent ’<br />
SELECT { Target . currentmember } ON COLUMNS ,<br />
time . Period . Members ON ROWS<br />
FROM AudienceTarget<br />
WHERE ( Measures .rat , Program . Canal .[2])<br />
Listing 1: The rating for the main targets by time periods<br />
for channel 2 using the target aggregate<br />
The code is rather simple because each target<br />
weight is pre-calculated in the ContactTarget aggregate.<br />
The LookupCube function lookup the value for<br />
each target querying it. With the absence of this cube,<br />
each target weight is calculate in runtime, looking up<br />
each socio-demographic variable that made up the target.<br />
Not only the execution times rise up, but also the<br />
query’ code.<br />
4 Conclusion<br />
This work presents a dimensional model capable<br />
to address the specificities of quota sample based data.<br />
The goal was twofold; first, to demonstrate that is<br />
possible to address audience analysis requirements,<br />
using non-proprietary repositories and technologies;<br />
second, to present a possible solution to other domains<br />
where data is also quota sample based. In the<br />
audience domain, the data tendencies are corrected by<br />
a daily individual weight, that must be taken into account<br />
if the indicators are meant to be representative<br />
to the entire population, not just the panel’s individuals.<br />
To deal with the audience performance indicators,<br />
mostly non-addictive and quota depended, is necessary<br />
to normalise each one with a reference value, calculated<br />
from the corrective daily weights. The present<br />
solution lay on the creation of an auxiliary contact table<br />
to store daily weights for each possible combination<br />
of socio-demographis values. The authors transform<br />
this way the non-addictive facts into addictive<br />
ones, sacrificing the capability of pre-calculated their<br />
values and store them into a fact table directly. All of<br />
the calculus must be done in runtime. To ensure an<br />
efficient solution, is necessary to create a series of domain<br />
dependant aggregates, with specific dimension<br />
models. The performance indicators test results, using<br />
both proprietary program and generic OLAP tools<br />
with the discussed model, have matched.<br />
Authors think the same methodology is appropriate<br />
to other domains if the data is quota sample based<br />
and the performance indicators values are always relative<br />
to the subset of the sample used in their calculus.<br />
REFERENCES<br />
[1] Yvonne M. M. Bishop, E. F. Fienberg, and P. W. Holland.<br />
Discrete multivariate analysis : theory and practice.<br />
The MIT Press, 1975.<br />
[2] EF Codd, SB Codd, and CT Sally. Providing<br />
OLAP to user-analysis. Technical report,<br />
http://www.arborsoft.com/essbase/wht ppr/coddps.<br />
zip, 1993.<br />
[3] Nuno Datia. Aplicação de técnicas de apoio à decisão<br />
a dados de audimetria. Master’s thesis, Faculdade de<br />
Ciências e Tecnologia - Universidade Nova de Lisboa,<br />
2006.<br />
[4] W. Edwards Deming and Frederick F. Stephan. On a<br />
least squares adjustment of a sampled frequency table<br />
when the expected marginal totals are known. Annals<br />
of Mathematical Statistics, 11(4):427–444, 1940.<br />
[5] C. Imhoff, N. Galemmo, and J.G. Geiger. Mastering<br />
Data Warehouse Design: Relational and Dimensional<br />
Techniques. Wiley, 2003.<br />
[6] WH Inmon. Building the data warehouse. John Wiley<br />
& Sons, Inc. New York, NY, USA, 2005.<br />
[7] N. Jukic, B. Jukic, and M. Malliaris. Online Analytical<br />
Processing (OLAP) for Decision Support. In<br />
Handbook on Decision Support Systems. Springer,<br />
2008.<br />
[8] R. Kimball, L. Reeves, M. Ross, and W. Thornthwaite.<br />
The Data Warehouse Lifecycle Toolkit. Wiley, 1998.<br />
[9] R. Kimball and M. Ross. The Data Warehouse Toolkit:<br />
The Complete Guide to Dimensional Modeling. Wiley,<br />
2002.<br />
[10] MediaSoft Kimono. http:// www. kubik. it/<br />
kimono_ en. html , last acessed on May 2008.<br />
[11] Sql Server Analysis Services. http://<br />
technet. microsoft. com/ pt-br/ sqlserver/<br />
bb671220(en-us). aspx , last acessed on May 2008.<br />
[12] Sql Server Data TRansformations Services. http://<br />
www. microsoft. com/ technet/ prodtechnol/ sql/<br />
2000/ deploy/ dtssql2k. mspx , last acessed on May<br />
2008.<br />
[13] Markdata Telereport. http:// www. markdata. net/<br />
v2/ , last acessed on May 2008.<br />
[14] Rene Weber. Methods to Forecast Television Viewing<br />
Patterns for Target Audiences. In Communication Research<br />
in Europe and Abroad –Challenges of the First<br />
Decade, 2003.