## 3.8 Family 4 of analysis : specific research questions on one farm or more

### 3.8.1 Version

In this section, the main data set is the one used in $$G \times E$$ models:

data("data_model_GxE")
data_model_GxE = format_data_PPBstats(data_model_GxE, type = "data_agro")
## data has been formated for PPBstats functions.

#### 3.8.1.1 Reponse to selection

For a given trait, selection differential corresponds to the difference between mean of the selected spikes and mean of the bulk (i.e. spikes that have not been selected). Response to selection correponds to the difference between mean of spikes coming from the selected spikes and the spikes coming from the bulk (Figure 3.18). Selection differential ($$S$$) and response to selection ($$R$$) are linked with the realized heritability ($$h^2_r$$): $$R = h^2_r \times S$$.

##### 3.8.1.1.1 Format data
data("data_version_SR")
data_version_SR = format_data_PPBstats(data_version_SR, type = "data_agro_version")
## data has been formated for PPBstats functions.

Where group represents differential selection (S) or reponse to selection (R) and version represents bouquet or vrac.

##### 3.8.1.1.2 Describe the data
p = plot(data_model_GxE, vec_variables = "y1",
data_version =data_version_SR, plot_type = "barplot")
p$y1$loc-1:germ-1
## Warning: Removed 2 rows containing missing values (geom_bar).

##### 3.8.1.1.3 Hierarchical Bayesian intra-location model

Version can be studied through the Hierarchical Bayesian intra-location model presented in section 3.5.4.

Regarding pairs of entries in a given environment, it is possible to get comparison of paris of entries in a given location. This is useful if you want to compare two versions within a group. For exemple:

data(data_version)
data_version = format_data_PPBstats(data_version, type = "data_agro_version")
## data has been formated for PPBstats functions.
head(data_version)
##   year location germplasm group version
## 1 2005    loc-1   germ-25     1      v1
## 2 2005    loc-1   germ-26     1      v2
## 3 2005    loc-1    germ-1     2      v1
## 4 2005    loc-1   germ-12     2      v2
## 5 2005    loc-2   germ-25     3      v1
## 6 2005    loc-2   germ-26     3      v2

Here, in location loc-1, germ-25 and germ-26 on one hand and germ-1 and germ-12 on the other hand are two version belonging to the same groupe.

Lets’ make the plots:

load("./data_PPBstats/out_mean_comparisons_model_bh_intra_location_mu.RData")

p_barplot_mu_version = plot(
out_mean_comparisons_model_bh_intra_location_mu,
data_version = data_version,
plot_type = "barplot"
)
p_barplot_mu_version$data_mean_comparisons$loc-1:2005
## $1 The stars corresponds to the pvalue: pvalue stars $$< 0.001$$ *** $$[0.001 , 0.05]$$ ** $$[0.05 , 0.01]$$ * $$> 0.01$$ . The pvalue is computed as describe in Section ?? if the parameters have been estimated with the model. For environments where MCMC did not converge or without environments, it is a @ref(t.test) which is perform. p_barplot_mu_version$data_env_with_no_controls
## $loc-25:2005 ## NULL p_barplot_mu_version$data_env_whose_param_did_not_converge
## NULL

#### 3.8.1.2 Migrant residant {#migrant-residant} (Home away)

##### 3.8.1.2.1 Format data
data("data_version_MR")
data_version_MR = format_data_PPBstats(data_version_MR, type = "data_agro_version")
## data has been formated for PPBstats functions.

Where group represents migrant or residant and version represents the location where come the germplasm from.

##### 3.8.1.2.2 Describe the data
p = plot(data_model_GxE, vec_variables = "y1", data_version = data_version_MR, plot_type = "barplot")
p$y1$loc-1

##### 3.8.1.2.3 Model

!!! TO DO !!!

The model is based on frequentist statistics (section 3.1.2.1). The tests to check the model are explained in section 3.1.2.1.2. The method to compute mean comparison are explained in section 3.1.2.1.3.

### 3.8.2 Variance intra : to estimate variance within germplasm

#### 3.8.2.1 Theory of the model

The model is based on bayesian statistics (section 3.1.2.2).

The phenotypic value $$Y_{ijkl}$$ for a given variable $$Y$$, germplasm $$i$$, environment $$j$$, plot $$k$$ and individual $$l$$ is modeled as :

$$Y_{ijkl} = \mu_{ijk} + \varepsilon_{ijkl} ; \quad \varepsilon_{ijkl} \sim \mathcal{N} (0,\sigma^2_{ij})$$,

Where $$Y_{ijkl}$$ is the phenotypic value for germplasm $$i$$, environment $$j$$, plot $$k$$ and individual $$l$$; $$\mu_{ijk}$$ is the mean of population $$i$$ in environnement $$j$$ and plot $$k$$ (nested in environment $$j$$); $$\varepsilon_{ijkl}$$ is what is not explained by the model in germplasm $$i$$, environment $$j$$, plot $$k$$ and individual $$l$$.

$$\varepsilon_{ijkl}$$ is taken from a normal distribution, centered on 0 with variance $$\sigma^2_{ij}$$

With priors : $$\sigma^2_{ij} \sim 1/Gamma(10^{-6},10^{-6})$$ and $$\mu_{ijk} \sim N(\mu_{.j.},10^{6})$$

$$\sigma^2_{ij}$$ correspond to the intra-germplasm variance.

No specific experimental design is needed as long as there are several individuals measured for a given germplasm.

#### 3.8.2.2 Steps with PPBstats

For variance intra analysis, you can follow these steps (Figure (fig:main-workflow)) :

• Format the data with format_data_PPBstats
• Run the model with model_bh_intra_location
• Check model outputs with graphs to know if you can continue the analysis with check_model
• Get mean comparisons for each factor with mean_comparisons and vizualise it with plot

#### 3.8.2.3 Format the data

data(data_model_bh_variance_intra)
data_model_bh_variance_intra = format_data_PPBstats(data_model_bh_variance_intra, type = "data_agro")
## Warning in format_data_PPBstats.data_agro(data): Column "long" is needed to
## get map and not present in data.
## Warning in format_data_PPBstats.data_agro(data): Column "lat" is needed to
## get map and not present in data.
## data has been formated for PPBstats functions.

#### 3.8.2.4 Run the model

To run the model , used the function model_bh_intra_location. You can run it on one variable.

#out_vi = model_bh_intra_location(data_model_bh_variance_intra, variable = "spike_weight", nb_iterations = 100)

#### 3.8.2.5 Check and visualize model outputs

The tests to check the model are explained in section 3.1.2.2.2.

##### 3.8.2.5.1 Check the model

Once the model is run, it is necessary to check if the outputs can be taken with confidence. This step is needed before going ahead in the analysis (in fact, object used in the next functions must come from check_model).

#out_check_vi = check_model(out_vi)
##### 3.8.2.5.2 Visualize outputs

Once the computation is done, you can visualize the results with plot

#p_out_check_vi = plot(out_check_vi)

#### 3.8.2.6 Get and visualize mean comparisons

The method to compute mean comparison are explained in section ??.

##### 3.8.2.6.1 Get mean comparisons

Get mean comparisons with mean_comparisons.

#out_mean_comparisons_vi = mean_comparisons(out_check_vi, parameter = "mu", p.adj = "soft.bonf")
##### 3.8.2.6.2 Visualize outputs

Once the computation is done, you can visualize the results with plot

#p_out_mc_vi = plot(out_mean_comparisons_vi)

#### 3.8.2.7 Apply the workflow to several variables

If you wish to apply the variance_intra workflow to several variables, you can use lapply with the following code :

workflow_model_variance_intra = function(x){
out_vi = model_bh_intra_location(data_variance_intra, variable = x)

out_check_vi = check_model(out_vi)
p_out_check_vi = plot(out_check_vi)

out_mean_comparisons_vi = mean_comparisons(out_check_vi, p.adj = "bonferroni")
p_out_mean_comparisons_vi = plot(out_mean_comparisons_vi)

out = list(
"out_vi" = out_vi,
"out_check_vi" = out_check_vi,
"p_out_check_vi" = p_out_check_vi,
"out_mean_comparisons_vi" = out_mean_comparisons_vi,
"p_out_mean_comparisons_vi" = p_out_mean_comparisons_vi
)

return(out)
}

## Not run because of memory and time issues !
# vec_variables = c("y1", "y2", "y3")
#
# out = lapply(vec_variables, workflow_model_variance_intra)
# names(out) = vec_variables