4.3 Hedonic analysis (M9b)

4.3.1 Method description

The hedonic evaluation test involves asking consumers to :

  1. to rate their preference from 1 (I dislike extremely) to 9 (I like very much) for three to four sensory attributes specific to the test product. The overall preference is ascertained at the beginning of the questionnaire in order not to influence the consumer and be closer to typical conditions of consumption.
  2. give additional information such as sex, age and organic consumption frequency in order to characterise the population sample.
  3. give additional sensory descriptors to describe products are asked after evaluation of each product.

4.3.1.1 Determine differences of appreciation for each samples

Regarding samples, the objectives of the hedonic tests are to

  1. determine differences of appreciation for a given attribute between the set of samples bsed on the note given by the juges and
  2. determine appreciation of samples based on descriptors given by the juges

Differences of sample regarding the note given by the juges.

The data distribution determines the type of tests that should be used to analyze the data set.

  • If the distribution is Normal, one-way analysis of variance (ANOVA) can be performed:

\(Y_{ij} = \alpha_i + \beta_j + \varepsilon_{ij}; \quad \varepsilon_{ijkl} \sim \mathcal{N} (0,\sigma^2)\)

with \(Y_{ij}\) the note from 1 to 9 given by a person to a sample, \(\alpha_i\) the person (i.e. assessor) that taste the sample, \(\beta_j\) the germplasm tasted, \(\varepsilon_{ijkl}\) the residuals.

Then, multiple comparison of mean on germplasm are performed. The aim is to obtain a final ranking based on consumers’ preferences.

  • If the data set doesn’t follow a Normal distribution, a Friedman test on the rank should be used to indicate if the varieties are perceived differently by assessors.

Appreciation of sample regarding the descriptors given by the juges.

To do so, Correspondance Analysis (CA) is done on the data with descriptors.

4.3.1.2 Juges profiles

Another objective of the analysis is to determine juges profiles based on the note given and the additional information such as sex, age and organic consumption frequency, etc.

It is done with a Hierarchical Clustering on Principle Components (HCPC) that can be implement to identify groups of juges preferences after a Principal Component Analysis (PCA).

4.3.2 Steps with PPBstats

For hedonic analysis, you can follow these steps (Figure 4.2):

  • Format the data with format_data_PPBstats()
  • Describe the data with plot()
  • Run the model with model_hedonic()
  • Check model outputs with graphs to know if you can continue the analysis with check_model()
  • Get mean comparisons on the note given by the juges for each factor with mean_comparisons() and vizualise it with plot()
  • Format data for CA and HCPC analysis with biplot_data() and visualise it with plot()

4.3.3 Format the data

data(data_hedonic)
head(data_hedonic)
##   sample germplasm location juges note           descriptors Age Sexe
## 1    832    germ-1    loc-1     1    7        douce; juteuse  21    F
## 2    412    germ-1    loc-1     1    8       juteuse; sucree  21    F
## 3    465    germ-2    loc-1     1    5                 acide  21    F
## 4    108    germ-3    loc-1     1    7                sucree  21    F
## 5    967    germ-4    loc-1     1    8                sucree  21    F
## 6    619    germ-5    loc-1     1    6 peau epaisse; juteuse  21    F
##   Bio.Non.Bio Circuit Departement
## 1           1   1;2;3          30
## 2           1   1;2;3          30
## 3           1   1;2;3          30
## 4           1   1;2;3          30
## 5           1   1;2;3          30
## 6           1   1;2;3          30

The data frame has the following columns: sample, germplasm, location, juges, note, descriptors. The descriptors must be separated by “;”. Any other column can be added as supplementary variables.

Then, you must format your data with format_data_PPBstats() and type = "data_organo_hedonic". Argument threshold can be set in order to keep only descriptors that have been cited several time. For exemple with threshold = 2, only descriptors cited at least twice are kept.

data_hedonic = format_data_PPBstats(data_hedonic, type = "data_organo_hedonic", threshold = 2)
## Warning in format_data_PPBstats.data_organo_hedonic(data, threshold): The following samples are not kept because they have been already tasted (i.e. germplasm and location combinaison already exist):
## sample 412 by juge 1 on row 2
## Warning in format_data_PPBstats.data_organo_hedonic(data, threshold): The
## following row in data have been remove because there are no descriptors :9
## The following descriptors have been remove because there were less or equal to 2 occurences : aciduee,  acidulee, classique,  classique, classique , cremeuse, croquante, epicee,  equilibree,  farineuse,  ferme,  fondante, legere, molle,  parfumee, salee, sucree
## data has been formated for PPBstats functions.
names(data_hedonic)
## [1] "data"        "var_sup"     "descriptors"

data_hedonic is a list of four elements :

  • data the data formated to run the anova and the multivariate analysis regarding
    • samples
head(data_hedonic$data$data_sample)
##         sample germplasm location juges note Age Sexe Bio.Non.Bio Circuit
## 1 loc-1:germ-1    germ-1    loc-1     1    7  21    F           1   1;2;3
## 2 loc-1:germ-2    germ-2    loc-1     1    5  21    F           1   1;2;3
## 3 loc-1:germ-3    germ-3    loc-1     1    7  21    F           1   1;2;3
## 4 loc-1:germ-4    germ-4    loc-1     1    8  21    F           1   1;2;3
## 5 loc-1:germ-5    germ-5    loc-1     1    6  21    F           1   1;2;3
## 6 loc-1:germ-6    germ-6    loc-1     2    7  30    F           1        
##   Departement      acide acidulee charnue      douce  douce equilibree
## 1          30 0.00000000        0       0 0.03703704      0          0
## 2          30 0.02941176        0       0 0.00000000      0          0
## 3          30 0.00000000        0       0 0.00000000      0          0
## 4          30 0.00000000        0       0 0.00000000      0          0
## 5          30 0.00000000        0       0 0.00000000      0          0
## 6          11 0.00000000        0       0 0.00000000      0          0
##   farineuse ferme fraiche fruitee goutue  goutue juteuse   juteuse neutre
## 1         0     0       0       0      0       0       0 0.3333333      0
## 2         0     0       0       0      0       0       0 0.0000000      0
## 3         0     0       0       0      0       0       0 0.0000000      0
## 4         0     0       0       0      0       0       0 0.0000000      0
## 5         0     0       0       0      0       0       0 0.3333333      0
## 6         0     0       0       0      0       0       0 0.0000000      0
##   parfumee  peau epaisse peau epaisse     sucree  sucree tendre
## 1        0             0   0.00000000 0.00000000       0      0
## 2        0             0   0.00000000 0.00000000       0      0
## 3        0             0   0.00000000 0.02272727       0      0
## 4        0             0   0.00000000 0.02272727       0      0
## 5        0             0   0.02040816 0.00000000       0      0
## 6        0             0   0.00000000 0.02272727       0      0
- juges
head(data_hedonic$data$data_juges)
##   juges loc-1:germ-1 loc-1:germ-2 loc-1:germ-3 loc-1:germ-4 loc-1:germ-5
## 1     1            7            5            7            8            6
## 2     2            7            8           NA            6           NA
## 3     4            6            4           NA           NA            3
## 4     5            6           NA            5            7            6
## 5     6            6            5            7            6            3
## 6     7           NA            5            7            6            2
##   loc-1:germ-6 germplasm location Age Sexe Bio.Non.Bio Circuit Departement
## 1           NA        NA       NA  NA   NA          NA      NA          NA
## 2            7         6        1   9    2           3       1           2
## 3            7         6        1   1    2           3       2          NA
## 4            4         6        1  NA    1           1       1          NA
## 5            7         6        1  21    3           3       2          NA
## 6            6         6        1  13    2           3       6           2
  • var_sup the supplementary variables used in the multivariate analysis
data_hedonic$var_sup
## [1] "germplasm"   "location"    "Age"         "Sexe"        "Bio.Non.Bio"
## [6] "Circuit"     "Departement"
  • descriptors the vector of descriptors cited knowing the threshold applyed when formated the data.
data_hedonic$descriptors
##  [1] "acide"         "acidulee"      "charnue"       "douce"        
##  [5] " douce"        "equilibree"    "farineuse"     "ferme"        
##  [9] "fraiche"       "fruitee"       "goutue"        " goutue"      
## [13] "juteuse"       " juteuse"      "neutre"        "parfumee"     
## [17] " peau epaisse" "peau epaisse"  "sucree"        " sucree"      
## [21] "tendre"

4.3.4 Describe the data

First, you can describe the data regarding the note given

p_note = plot(data_hedonic, plot_type = "boxplot", x_axis = "germplasm",
               in_col = "location", vec_variables = "note"
               )
## Warning in reshape_data_split_x_axis_in_col(d, variable, labels_on,
## x_axis, : 6 rows have been deleted for note because of only NA on the row
## for these variables.
p_note
## $note
## $note$`germplasm-1|location-1`

## 
## $note$`germplasm-2|location-1`

As well as the descriptors for each germplasm for example:

descriptors = data_hedonic$descriptors

p_des = plot(data_hedonic, plot_type = "radar", in_col = "germplasm", 
                         vec_variables = descriptors
                         )
p_des

4.3.5 Run the model

To run the model on the dataset, used the function model_hedonic.

out_hedonic = model_hedonic(data_hedonic)
## Warning in model_hedonic(data_hedonic): Rows in column "note" has been
## deleted because of NA.
## Warning in model_hedonic(data_hedonic): Some rows have been removed because
## there are no descriptors.
## Warning in PCA(data_juges_hcpc, quanti.sup = id_quanti.sup, quali.sup =
## id_quali.sup, : Missing values are imputed by the mean of the variable: you
## should use the imputePCA function of the missMDA package

out_hedonic is a list with three elements:

  • model : the result of the anova run on note
out_hedonic$model
## 
## Call:
## stats::lm(formula = note ~ juges + germplasm, data = data_sample)
## 
## Coefficients:
##     (Intercept)           juges2           juges4           juges5  
##         6.31772          0.34588         -1.35400         -1.04203  
##          juges6           juges7           juges8           juges9  
##        -0.93390         -1.45714         -2.35549          0.39943  
##         juges10          juges11          juges12          juges13  
##         0.75797          0.56610         -3.26724          0.39943  
##         juges14          juges15          juges16          juges17  
##        -1.19356         -1.35250          1.60000          0.39943  
##         juges19          juges20          juges23          juges24  
##         1.20006         -1.20183         -4.60341          1.14451  
##         juges25          juges26          juges27          juges28  
##         0.90629         -1.35400         -0.42967         -1.70036  
##         juges31          juges32          juges33          juges34  
##        -0.43390         -0.10400          2.27689          1.14451  
##         juges35          juges36          juges37          juges38  
##         0.55797         -1.26724         -0.10057         -0.42057  
##         juges39          juges41          juges42          juges43  
##        -1.60341          0.52540          0.64750          1.46789  
##         juges44          juges45          juges46          juges47  
##        -0.66903         -1.19878         -1.20183         -0.46056  
##         juges49          juges50          juges51          juges52  
##        -0.60057         -0.52690          0.93973         -1.70183  
##         juges53          juges54          juges56          juges57  
##         0.89520         -1.43390          2.11465         -1.58120  
##         juges58          juges59          juges62          juges63  
##        -2.39327         -1.76724         -0.97311          0.56610  
##         juges64          juges65          juges66          juges68  
##         0.14451          1.11465         -1.74361          0.52689  
##         juges69          juges70          juges71          juges72  
##         1.14451         -0.43390         -1.09371          0.65909  
##         juges73          juges74  germplasmgerm-2  germplasmgerm-3  
##         0.56610         -1.13931          0.07555          0.56763  
## germplasmgerm-4  germplasmgerm-5  germplasmgerm-6  
##         0.98436         -0.21612          0.28569
anova(out_hedonic$model)
## Analysis of Variance Table
## 
## Response: note
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## juges      61 325.98  5.3439  2.3286 1.071e-05 ***
## germplasm   5  32.83  6.5657  2.8610   0.01657 *  
## Residuals 169 387.84  2.2949                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • CA : the result of the correspondance analysis run on the data set with the supplementary variables with FactoMineR::CA
out_hedonic$CA
## **Results of the Correspondence Analysis (CA)**
## The row variable has  226  categories; the column variable has 21 categories
## The chi square of independence between the two variables is equal to 317.7171 (p-value =  1 ).
## *The results are available in the following objects:
## 
##    name                description                                
## 1  "$eig"              "eigenvalues"                              
## 2  "$col"              "results for the columns"                  
## 3  "$col$coord"        "coord. for the columns"                   
## 4  "$col$cos2"         "cos2 for the columns"                     
## 5  "$col$contrib"      "contributions of the columns"             
## 6  "$row"              "results for the rows"                     
## 7  "$row$coord"        "coord. for the rows"                      
## 8  "$row$cos2"         "cos2 for the rows"                        
## 9  "$row$contrib"      "contributions of the rows"                
## 10 "$quanti.sup$coord" "coord. for supplementary continuous var." 
## 11 "$quanti.sup$cos2"  "cos2 for supplementary continuous var."   
## 12 "$quali.sup$coord"  "coord. for supplementary categorical var."
## 13 "$quali.sup$cos2"   "cos2 for supplementary categorical var."  
## 14 "$call"             "summary called parameters"                
## 15 "$call$marge.col"   "weights of the columns"                   
## 16 "$call$marge.row"   "weights of the rows"
  • HCPC : the result of the correspondane analysis run on the data set with the supplementary variables with FactoMineR::PCA follow by FactoMineR::HCPC. It is a list of three elements:
out_hedonic$HCPC$res.pca
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 63 individuals, described by 13 variables
## *The results are available in the following objects:
## 
##    name               
## 1  "$eig"             
## 2  "$var"             
## 3  "$var$coord"       
## 4  "$var$cor"         
## 5  "$var$cos2"        
## 6  "$var$contrib"     
## 7  "$ind"             
## 8  "$ind$coord"       
## 9  "$ind$cos2"        
## 10 "$ind$contrib"     
## 11 "$quali.sup"       
## 12 "$quali.sup$coord" 
## 13 "$quali.sup$v.test"
## 14 "$call"            
## 15 "$call$centre"     
## 16 "$call$ecart.type" 
## 17 "$call$row.w"      
## 18 "$call$col.w"      
##    description                                          
## 1  "eigenvalues"                                        
## 2  "results for the variables"                          
## 3  "coord. for the variables"                           
## 4  "correlations variables - dimensions"                
## 5  "cos2 for the variables"                             
## 6  "contributions of the variables"                     
## 7  "results for the individuals"                        
## 8  "coord. for the individuals"                         
## 9  "cos2 for the individuals"                           
## 10 "contributions of the individuals"                   
## 11 "results for the supplementary categorical variables"
## 12 "coord. for the supplementary categories"            
## 13 "v-test of the supplementary categories"             
## 14 "summary statistics"                                 
## 15 "mean of the variables"                              
## 16 "standard error of the variables"                    
## 17 "weights for the individuals"                        
## 18 "weights for the variables"
out_hedonic$HCPC$res.hcpc
## **Results for the Hierarchical Clustering on Principal Components**
##    name                   
## 1  "$data.clust"          
## 2  "$desc.var"            
## 3  "$desc.var$quanti.var" 
## 4  "$desc.var$quanti"     
## 5  "$desc.axes"           
## 6  "$desc.axes$quanti.var"
## 7  "$desc.axes$quanti"    
## 8  "$desc.ind"            
## 9  "$desc.ind$para"       
## 10 "$desc.ind$dist"       
## 11 "$call"                
## 12 "$call$t"              
##    description                                             
## 1  "dataset with the cluster of the individuals"           
## 2  "description of the clusters by the variables"          
## 3  "description of the cluster var. by the continuous var."
## 4  "description of the clusters by the continuous var."    
## 5  "description of the clusters by the dimensions"         
## 6  "description of the cluster var. by the axes"           
## 7  "description of the clusters by the axes"               
## 8  "description of the clusters by the individuals"        
## 9  "parangons of each clusters"                            
## 10 "specific individuals"                                  
## 11 "summary statistics"                                    
## 12 "description of the tree"
head(out_hedonic$HCPC$clust)
##   loc-1:germ-1 loc-1:germ-2 loc-1:germ-3 loc-1:germ-4 loc-1:germ-5
## 1        7.000         5.00     7.000000     8.000000     6.000000
## 2        7.000         8.00     6.769231     6.000000     5.692308
## 4        6.000         4.00     6.769231     6.871795     3.000000
## 5        6.000         6.05     5.000000     7.000000     6.000000
## 6        6.000         5.00     7.000000     6.000000     3.000000
## 7        6.125         5.00     7.000000     6.000000     2.000000
##   loc-1:germ-6 germplasm location Age Sexe Bio.Non.Bio Circuit Departement
## 1     6.128205        NA       NA  NA   NA          NA      NA          NA
## 2     7.000000         6        1   9    2           3       1           2
## 4     7.000000         6        1   1    2           3       2          NA
## 5     4.000000         6        1  NA    1           1       1          NA
## 6     7.000000         6        1  21    3           3       2          NA
## 7     6.000000         6        1  13    2           3       6           2
##       clust
## 1 cluster 6
## 2 cluster 5
## 4 cluster 4
## 5 cluster 3
## 6 cluster 4
## 7 cluster 4

4.3.6 Check and visualize model outputs

The tests to check the model are explained in section 3.1.2.1.2.

4.3.6.1 Check the model

out_check_hedonic = check_model(out_hedonic)

out_check_hedonic is list with two elements:

  • hedonic which it the same objet as out_hedonic
  • data_ggplot a list containing information for ggplot:
    • data_ggplot_residuals a list containing :
      • data_ggplot_normality
      • data_ggplot_skewness_test
      • data_ggplot_kurtosis_test
      • data_ggplot_qqplot
    • data_ggplot_variability_repartition_pie
    • data_ggplot_var_intra

4.3.6.2 Visualize outputs

Once the computation is done, you can visualize the results with plot()

p_out_check_hedonic = plot(out_check_hedonic)

p_out_check_hedonic is a list with:

  • residuals of the ANOVA model
    • histogram : histogram with the distribution of the residuals
    p_out_check_hedonic$residuals$histogram
    ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
    • qqplot
    p_out_check_hedonic$residuals$qqplot

  • variability_repartition : pie with repartition of SumSq for each factor of the ANOVA model

p_out_check_hedonic$variability_repartition

  • variance_intra_germplasm : repartition of the residuals for each germplasm which represent the person assessor variation plus the intra-germplasm variance of the ANOVA model.
p_out_check_hedonic$variance_intra_germplasm

  • CA_composante_variance : variance caught by each dimension of the CA
p_out_check_hedonic$CA_composante_variance

  • HCPC_composante_variance : variance caught by each dimension of the PCA previous to the HCPC
p_out_check_hedonic$PCA_composante_variance

4.3.7 Get and visualize mean comparisons on note

The method to compute mean comparison are explained in section 3.1.2.1.3.

4.3.7.1 Get mean comparisons on note

Get mean comparisons with mean_comparisons().

out_mean_comparisons_hedonic = mean_comparisons(out_check_hedonic)

out_mean_comparisons_hedonic is a list of one element for futher ggplot : data_ggplot_LSDbarplot_germplasm

4.3.7.2 Visualize mean comparisons on note

p_out_mean_comparisons_hedonic = plot(out_mean_comparisons_hedonic)

p_out_mean_comparisons_hedonic is a list of on elements with barplots :

For each element of the list, there are as many graph as needed with nb_parameters_per_plot parameters per graph. Letters are displayed on each bar. Parameters that do not share the same letters are different regarding type I error (alpha) and alpha correction. The error I (alpha) and the alpha correction are displayed in the title.

  • germplasm : mean comparison for germplasm
pg = p_out_mean_comparisons_hedonic$germplasm
names(pg)
## [1] "1"
pg$`1`

4.3.8 Get and visualize biplot regarding samples (CA) and juges (HCPC)

The biplot represents information about the percentages of total variation explained by the two axes. It has to be linked to the total variation caught by the interaction. If the total variation is small, then the biplot is useless. If the total variation is high enought, then the biplot is useful if the two first dimension represented catch enought variation (the more the better).

4.3.8.1 Get biplot

Get biplot regading samples (CA) and juges (HCPC)

out_biplot_hedonic = biplot_data(out_check_hedonic)

4.3.8.2 Visualize biplot

p_out_biplot_hedonic = plot(out_biplot_hedonic)

p_out_biplot_hedonic is a list of two elements with

  • the CA biplot where descriptors are represented by a triangle in red and samples are represented by text in blue and point in color refering to the sample.
p_out_biplot_hedonic$ca_biplot

  • the HCPC biplot is a list of two elements : one with the variable ans the additionnal variables and the other with the groups of juges detected by the HCPC.
p_out_biplot_hedonic$hcpc_biplot
## $var

## 
## $cluster