31.10.2012 Views

NON-ADDICTIVE SAMPLE BASED FACTS - Deetc

NON-ADDICTIVE SAMPLE BASED FACTS - Deetc

NON-ADDICTIVE SAMPLE BASED FACTS - Deetc

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• Semi-additive, can be summed up only for a subset<br />

of the dimensions;<br />

• Non-additive, cannot be summed for any of the<br />

dimensions.<br />

Non-additive does not directly relates with the measure<br />

being numerical; it relates with the usefulness<br />

(and accuracy) of the summed value. It may simply<br />

not make sense to the business. Ratios are an example<br />

of a non-addictive fact.<br />

The audience analysis domain shared with OLAP and<br />

Datawarehouse, all of the characteristics listed above.<br />

The volume of data are too similar in size. However,<br />

audience analysis world tends to be closed and proprietary<br />

oriented, and it’s applications do not apply,<br />

directly, generic OLAP techniques. Programs like<br />

[10] and [13] use pivot tables like interfaces, typical<br />

in OLAP, but do not use more of the OLAP reporting<br />

facilities. The former, [13], is developed by one of the<br />

leading software houses in audience analysis; it uses<br />

a proprietary text file format to store information, not<br />

an OLAP engine. In this paper, the authors evaluate<br />

the feasibility of general OLAP techniques applied to<br />

audience analysis domain. The purpose is to determine<br />

if it’s possible to use general, non proprietary<br />

solutions to achieve the same degree of freedom in<br />

audience analysis, as the referred tools. Particularly,<br />

we discuss a dimension model that addresses the singularity<br />

of the audience analysis performance indicators,<br />

almost all of them quota sample based and nonaddictive.<br />

Some of the necessary values to compute<br />

these indicators, mainly related to representativeness,<br />

are only known at runtime, after the users finished the<br />

analysis’ restrictions.<br />

One couldn’t find any related work, regarding<br />

dimension model and sample quota based facts. The<br />

main datawarehouse and olap literature [6, 9] do not<br />

address this kind of facts. And being a relative closed<br />

world, audience analysis are not an ease domain to<br />

build a state of the art [14]. The few articles publicly<br />

available are related to datamining and audience<br />

patterns analysis and not to datawarehouse and olap<br />

techniques appliance.<br />

This paper is organised as follows: section 2 describes<br />

the data, the requirements and some performance<br />

indicators used in audience analysis; section 3<br />

present the dimensional model, using Kimball’s approach,<br />

for a generic TV content analysis datamart;<br />

finally, in section 4 the authors draw some conclusions<br />

of their work.<br />

2 PROBLEM DOMAIN<br />

2.1 The Data<br />

The meter system seldom produces the data for the<br />

entire viewers’ panel viewers, since several practical<br />

problems may occur which range from meter misuse,<br />

to data communication problems, which endanger<br />

the desired panel representativeness. In order to<br />

adjust the panel representativeness, a weighting procedure<br />

is used which attempts to correct undesirable<br />

non-representative data tendencies. This is the Rim<br />

Weighting algorithm [4] (also known as Iterative Proportional<br />

Fitting [1]), which provides a daily weight<br />

for each viewer trying to recover the panel sample<br />

representativeness.<br />

People meter data have three distinct components:<br />

(i) socio-demographic information; (ii) TV content<br />

data; (iii) visualisation data. The socio-demographic,<br />

which include characteristics such as age, occupation<br />

and social class, provide important information to determine<br />

the panel representativity, but may also be<br />

used for other purposes (mostly for advertisement).<br />

The TV content is characterized by their type of contents,<br />

duration and corresponding TV channels. Finally,<br />

meter data measures the viewing behavior of<br />

each one of the viewers, represented by a sequence of<br />

watch/non watch indicators referred to each second of<br />

the day, as illustrated by figure 1. These indicators are<br />

registered independently for each channel.<br />

Figure 1: Representation of a daily viewing pattern for an<br />

individual.<br />

2.2 Requirements<br />

The requirements were gathered through the analysis<br />

of several audience reports, and for on site<br />

day by day usage of the audience analysis program<br />

Telereport[13]. The reports are sufficient broad to embrace<br />

a series of heterogeneous users, ranging from<br />

television to advertising. They cover the most important<br />

aspects of general TV content performance analysis.<br />

However, they do not address the singularities<br />

of advertising spots. The audience analyst are interested,<br />

mainly, to determine the audiences by channel

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!