Panel monitoring using Senstools 3.3

Panel monitoring 

using Senstools v3.3 

August 2005

Introduction 

‣ QDA panels 

‣ function, setup and maintenance 

‣ quality criteria 

‣ which requirements 

‣ how to monitor 

‣ monitoring panelist performance using Senstools

QDA or descriptive panels 

‣ trained assessors (6 to12) 

‣ product characterization by a fixed vocabulary 

‣ individual assessments, specified presentation 

design, controlled environment 

‣ often repeated measures (each product is rated two 

or three times by all assessors)

Structuur QDA data 

Structure of descriptive data 

(N ´ M) datamatrix X k 

data from 1 assessor 

(often replicates) 

N products 

K assessors 

M attributes 

3-mode data structure conventional 

profiling: 

N products are rated by K assessors 

on M attributes for p presentations 

(replicates)

The dataset 

‣ 10 assessors (‘sets’), 10 products (‘objects), 2 

presentations 

‣ 11 attributes: flower, rose, evergreen, wood, burnt, 

alcohol, pungent, medicinal, sulphur, grape, bitter 

‣ in total 2200 datapoints 

dataset with courtesy from 

Compusense, data from expert 

wine panel, original attribute 

names have been recoded, 

only part of the data is used

Data example 

Replica Assessor Product flower rose evergreen wood burnt alcohol pungent medicinal sulpher 

1 1 4 22 12 24 1 2 71 61 8 1 

2 1 4 25 7 7 1 1 50 16 8 1 

1 1 5 2 1 1 1 1 91 62 10 2 

2 1 5 1 1 2 10 1 47 20 10 1 

1 1 7 21 0 1 1 1 74 50 10 1 

2 1 7 1 6 8 0 2 62 38 7 0 

1 1 8 24 1 36 1 1 50 25 6 1 

2 1 8 1 1 2 9 1 61 43 7 14 

1 1 10 49 17 8 1 1 74 62 6 9 

2 1 10 21 2 11 2 1 74 62 8 1 

1 1 11 1 1 8 1 1 61 42 1 0 

2 1 11 18 1 7 1 1 51 20 6 1 

1 1 13 50 25 10 1 2 74 60 13 4 

2 1 13 43 15 7 2 0 99 86 2 1 

1 1 17 39 20 8 1 8 89 74 7 1 

2 1 17 47 19 6 1 1 72 31 7 1 

1 1 18 0 1 1 9 15 52 25 9 26 

2 1 18 21 1 7 9 13 71 33 8 31 

1 1 20 51 18 8 1 4 87 86 10 2 

2 1 20 15 1 6 1 0 57 11 7 1 

1 4 4 18 0 11 0 0 47 80 0 22 

2 4 4 12 25 4 0 0 47 82 0 8 

1 4 5 16 0 0 0 0 43 78 0 17 

2 4 5 18 0 0 0 5 77 81 0 0 

1 4 7 15 0 7 0 0 57 64 0 0 

2 4 7 9 0 0 0 4 65 78 34 0 

1 4 8 8 0 8 0 2 78 83 0 0 

2 4 8 10 0 14 0 0 75 80 14 0 

1 4 10 12 28 0 0 0 69 77 0 5 

2 4 10 28 25 6 0 0 78 89 0 0 

1 4 11 13 15 17 0 0 73 82 0 0 

2 4 11 17 24 7 0 0 57 56 0 0 

1 4 13 21 28 0 0 2 43 81 6 12 

2 4 13 27 31 4 0 0 57 86 0 20

Panel performance measures 

‣ Reliability and repeatability of assessors: do they give the same 

ratings to the same products – this requires repeated measures ! 

‣ Validity: do the assessor rate the products in a similar way 

(correlations between each individual assessor and the others: 

do they agree?) 

‣ Discrimination: does the individual assessor and the panel as a 

whole rate different products as different?

Reliability 

‣ in order to establish whether an individual can discriminate 

between products (i.e. give the same ratings to identical 

products and different ratings to different products) we need a 

measure of variability 

‣ when products are rated more than once, we can compute the 

variance of ratings of the same product (within-product 

variance of MS within ) and for ratings of different products 

(between-product variance of MS between ) 

‣ an assessor is considered to be discriminating when the 

within product variance is smaller than the between product 

variance 

‣ this is the ‘assessor statistic’ or the ratio 

MS between products /MS within 

‣ without replicates there is no MS within

Assessor statistics in Senstools 

Assessor statistics sulphur: 

ratio between/within variance for each 

assessor (Mssq object/Mssq error) 

>1=more variance for different products 

than for same products 

1=same variance for different or same 

products 

Repeated measures 

Visualization assessor statistics in Senstools 

view by attribute (for each assessor)


Visualization assessor statistics in Senstools 

view by subject (for each attribute) 

11 attributes


Validity 

‣ for each subject: all subjects must show comparable rating 

behavior (they have been trained for that): the ratings of the 

subjects should at least correlate positively 

‣ furthermore, the ‘between-subject’ variance should be small 

(MS panelist / MS error < 1) 

‣ for the panel as a whole: panelist should not disagree too much, in 

other words there should be little or no interaction between subject 

and product


Correlations between individual and panel 

for each subject, the object 

ratings for each attribute 

are standardized and the 

correlation coefficients are 

computed between the 

subject and the rest of the 

panel 

in this example, there are 11 

attributes and 10 products 

resulting in 110 z-scores for 

each subject 

Agreement Between Assessors 

(Correlations) 

number of 

subjects 

4 

3 

2 

1 

-1.0-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.10.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 

negatieve correlations 

positieve correlations 

distribution of correlations of subject X with 

the panel without subject X (total of 10 

coefficients when there are 10 subjects)


Between subject variance 

‣ the ‘between-subject’ variance should be small, they should all 

give similar ratings for a given attribute to a product: 

(MS panelist / MS error small) 

‣ for each object and attribute, the variability between assessors is 

computed (the object statistics in Senstools) and graphically 

represented: see example for attribute bitter 

100 

no disagreement 

for object 5 

bitter 

much disagreement 

for object 11 

10 

1 

object 4 object 5 object 7 object 8 object 10 object 11 object 13 object 17 object 18 object 20 

.1


variance between subjects 

‣ example of the individual ratings on attribute bitter for object 5 

(F-ratio 1.1) and object 11 (F-ratio 14.9) 

100 

90 

80 

70 

60 

50 

40 

30 

20 

10 

0 

Judge 100 

Judge 81 

Judge 64 

Judge 49 

Judge 36 

Judge 25 

Judge 16 

Judge 9 

Judge 4 

Judge 1

Visualization Repeated variance measures between subjects 

view by attribute (for each object)

Visualization Repeated variance measures between subjects 

view by object (for each attribute)


Interaction effects 

‣ for the panel as a whole: panelist should not disagree too much, in 

other words there should be little or no interaction between subject 

and product 

‣ Senstools shows the F ratio’s of the assessor and object variance 

for the MS-error and MS-interaction term


Interaction effects F-ratio objects 

RM ANOVA Object Interaction F Ratios by Attribute 

F-ratio 

100 

significant interactions between 

assessors and products 

F obj 

10 

significant difference 

between objects 

1 

no significant 

difference between 

objects 

F int obj 

0.1 

flowers evergreen burnt pungent sulphur bitter 

rose wood alcohol medicinal grape


Interaction effects F-ratio assessors 

RM ANOVA Set Interaction F Ratios by Attribute 

F-ratio 

100 

significant interactions between 

assessors and products 

F ass 

10 

significant difference 

between assessors 

1 

no significant difference 

between assessors 

F int ass 

0.1 

flowers evergreen burnt pungent sulphur bitter 

rose wood alcohol medicinal grape


Interaction effects burnt, sulphur & wood 

Attribute ratings 

for burnt, sulpher 

and wood by 

subject 

Subj 1 Subj 2 Subj 3 Subj 4 Subj 5 Subj 6 Subj 7 Subj 8 Subj 9 Subj 10 

100 

80 

burnt, no significant interaction 

60 

40 

20 

0 


100 

80 

sulpher, significant interaction 

60 

40 

20 

0 


100 

80 

wood, significant interaction 

60 

40 

20 

0 

object 4 object 5 object 7 object 8 object 10 object 11 object 13 object 17 object 18 object 20


Table of means 

obj 4 

obj 5 

obj 7 

obj 8 

obj 10 

obj 11 

obj 13 

obj 17 

obj 18 

obj 20 

flowers 

17 

15 

19 

18 

26 

16 

28 

22 

13 

24 

rose 

13 

13 

18 

13 

18 

15 

23 

16 

10 

17 

evergreen 

17 

11 

10 

16 

12 

13 

13 

15 

10 

15 

wood 

8 

3 

6 

4 

4 

3 

4 

3 

12 

2 

burnt 

9 

3 

4 

7 

3 

3 

5 

3 

9 

4 

alcohol 

51 

54 

53 

55 

62 

51 

61 

62 

54 

56 

pungent 

55 

55 

54 

59 

65 

53 

67 

62 

59 

60 

medicinal 

4 

3 

9 

5 

5 

2 

6 

4 

6 

4 

sulphur 

20 

3 

5 

4 

3 

2 

7 

2 

25 

6 

grape 

37 

39 

41 

39 

41 

41 

39 

39 

30 

41 

red: significant at 1% 

blue: significant at 5%


To summarize 

‣ without individual reliability no valid panel and no valid measuring 

instrument 

‣ full diagnosis requires repeated measures 

‣ performance of assessors can be diagnosed very accurately 

provided the right data and right analysis tools 

Questions about panelmonitoring or Senstools v3.3? 

Contact Pieter Punter at pieter@opp.nl 

see also: Example-GPA Using Senstools v3.3

Panel monitoring using Senstools 3.3

Create successful ePaper yourself

Delete template?

Save as template?