3.4 Describe the data

Once the data have been collected, a first step is to describe them with plot(). Seven types of plot, through the plot_type argument are possible:

  • presence abscence matrix that represent the combinaison of germplasm \(\times\) location
  • histogramm
  • barplot, where sd error are displayed
  • boxplot
  • interaction
  • biplot
  • radar
  • raster
  • map

Then you must choose which factor to represent on the x axis (x_axis argument), the factor to display in color (in_col argument), and of course the variables to describe (vec_variables argument).

It is possible to tune the number of factor displayed (nb_parameters_per_plot_x_axis and nb_parameters_per_plot_in_col arguments) and the size of the labels regarding biplot and radar (labels_on and labels_size arguments).

Note that descriptive plots can be done based on version within the data set. See section 3.8 formore details.

3.4.1 Format the data

Get two data set to look at some examples

data("data_model_GxE")
data_model_GxE = format_data_PPBstats(data_model_GxE, type = "data_agro")
## data has been formated for PPBstats functions.
data("data_model_bh_GxE")
data_model_bh_GxE = format_data_PPBstats(data_model_bh_GxE, type = "data_agro")
## Warning in format_data_PPBstats.data_agro(data): Column "long" is needed to
## get map and not present in data.
## Warning in format_data_PPBstats.data_agro(data): Column "lat" is needed to
## get map and not present in data.
## data has been formated for PPBstats functions.

3.4.2 presence abscence matrix

The presence absence matrix may be different from experimental design planned because of NA. The plot represents the presence/absence matrix of G \(\times\) E combinations.

p = plot(
  data_model_GxE, plot_type = "pam",
  vec_variables = c("y1", "y2")
  )
names(p)
## [1] "y1" "y2"
p$y1

A score of 3 is for a given germplasm replicated three times in a given environement.

p = plot(
  data_model_bh_GxE, plot_type = "pam",
  vec_variables = c("y1", "y2")
  )
p$y1

Here there are lots of 0 meaning that a lot of germplasm are no in at least two locations. A score of 1 is for a given germplasm in a given location. A score of 2 is for a given germplasm replicated twice in a given location.

3.4.3 histogramm

p = plot(
  data_model_GxE, plot_type = "histogramm",
  vec_variables = c("y1", "y2")
  )
p$y1
## $`-NA|-NA`
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3.4.4 barplot

p = plot(
  data_model_GxE, plot_type = "barplot",
  vec_variables = c("y1", "y2"),
  x_axis = "germplasm"
  )

Note that for each element of the following list, there are as many graph as needed with nb_parameters_per_x_axis parameters per graph.

names(p$y1)
## [1] "germplasm-1|-NA" "germplasm-2|-NA" "germplasm-3|-NA" "germplasm-4|-NA"
p$y1$`germplasm-1|-NA`

p = plot(
  data_model_GxE, plot_type = "barplot",
  vec_variables = c("y1", "y2"),
  x_axis = "germplasm",
  in_col = "location"
  )

Note that for each element of the following list, there are as many graph as needed with nb_parameters_per_x_axis and nb_parameters_per_in_col parameters per graph.

names(p$y1)
## [1] "germplasm-1|location-1" "germplasm-2|location-1"
## [3] "germplasm-3|location-1" "germplasm-4|location-1"
p$y1$`germplasm-1|location-1`

3.4.5 boxplot

p = plot(
  data_model_GxE, plot_type = "boxplot",
  vec_variables = c("y1", "y2"),
  x_axis = "germplasm"
  )

Note that for each element of the following list, there are as many graph as needed with nb_parameters_per_x_axis parameters per graph.

names(p$y1)
## [1] "germplasm-1|-NA" "germplasm-2|-NA" "germplasm-3|-NA" "germplasm-4|-NA"
p$y1$`germplasm-1|-NA`

p = plot(
  data_model_GxE, plot_type = "boxplot",
  vec_variables = c("y1", "y2"),
  x_axis = "germplasm",
  in_col = "location"
  )

Note that for each element of the following list, there are as many graph as needed with nb_parameters_per_x_axis and nb_parameters_per_in_col parameters per graph.

names(p$y1)
## [1] "germplasm-1|location-1" "germplasm-2|location-1"
## [3] "germplasm-3|location-1" "germplasm-4|location-1"
p$y1$`germplasm-1|location-1`

3.4.6 interaction

p = plot(
  data_model_GxE, plot_type = "interaction",
  vec_variables = c("y1", "y2"),
  x_axis = "germplasm",
  in_col = "location"
  )

Note that for each element of the following list, there are as many graph as needed with nb_parameters_per_x_axis and nb_parameters_per_in_col parameters per graph.

names(p$y1)
## [1] "germplasm-1|location-1" "germplasm-2|location-1"
## [3] "germplasm-3|location-1" "germplasm-4|location-1"
p$y1$`germplasm-1|location-1`

It is also possible to have on the x_axis the date in julian day that have been automatically calculated from format_data_PPBstats(). Note that this is possible only for plot_type = "histogramm", "barplot", "boxplot" and "interaction".

p = plot(
  data_model_GxE, plot_type = "interaction",
  vec_variables = c("y1", "y2"),
  x_axis = "date_julian",
  in_col = "location"
)
## Warning in plot_descriptive_data(x, plot_type, x_axis, in_col,
## vec_variables, : x_axis = "date_julian" is a special feature that will
## display julian day for a given variable automatically calculated from
## format_data_PPBstats().
p$y1$`y1$date_julian-1|location-1`
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?

3.4.7 biplot

p = plot(
  data_model_GxE, plot_type = "biplot",
  vec_variables = c("y1", "y2", "y3"),
  in_col = "germplasm", labels_on = "germplasm"
  )

The name of the list correspond to the pairs of variables displayed. Note that for each element of the following list, there are as many graph as needed with nb_parameters_per_in_col parameters per graph.

names(p)
## [1] "y1 - y2" "y1 - y3" "y2 - y3"
p$`y1 - y2`$`-NA|germplasm-1`

3.4.8 radar

Radar can be display either for all variable and a gien factor:

p = plot(
  data_model_GxE, plot_type = "radar",
  vec_variables = c("y1", "y2", "y3"),
  in_col = "location"
  )
p
## $`all-variables`

or for each variable for two given factors:

p = plot(
  data_model_GxE, plot_type = "radar",
  vec_variables = c("y1", "y2", "y3"),
  x_axis = "location",
  in_col = "germplasm"
  )
p$y1

3.4.9 raster

Raster plot can be done for factor variables. Note than when there are no single value for a given x_axis, colums block, X and Y are added in order to have single value.

p = plot(
  data_model_GxE, 
  plot_type = "raster", 
  vec_variables = c("desease", "vigor"), 
  x_axis = "germplasm"
)
## Warning in fun_raster(data, vec_variables, x_axis,
## nb_parameters_per_plot_x_axis): There are no single value for each x_axis,
## therefore block, X and Y colums have been added in order to have single
## value.
p$`germplasm-block-X-Y-9|-NA`

3.4.10 map

You can display map with location if you have data with latitude and longitude for each location. When using map, do not forget to use credit : Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.

p = plot(
  data_model_GxE, plot_type = "map", labels_on = "location"
)
p$map

and add pies for a given variables

p = plot(
  data_model_GxE, vec_variables = c("y1", "desease"),
  plot_type = "map"
)
p$pies_on_map_y1

p$pies_on_map_desease