2.3 Unipart network for location analysis

This section deals with unipart network that represent relationship of germplasm diffusion between locations. The representation can be done for each germplasm or for each year.

2.3.1 Steps with PPBstats

  • Format the data with format_data_PPBstats()
  • get descriptive plot with plot()

2.3.2 Format the data

The format required is a data frame with the following compulsory columns as factor:

  • "location_parent" : the location associated to the parent seed lot
  • "location_child" : the location associated to the child seed lot
  • "relation_year_start" : the year when the relationship starts
  • "relation_year_end" : the year when the relationship stops

Possible column options are :

  • "germplasm_parent" : the germplasm associated to the parent seed lot
  • "year_parent" : the year of the last relationship of the parent seed lot
  • "germplasm_child" : the germplasm associated to the child seed lot
  • "year_child" : represents the year of the last relation event of the child seed lot

Other possibles column option are : "long_parent", "lat_parent", "long_child", "lat_child" to get map representation.

Note that data frame with unipart network for seed lots format can also be used.

The format of the data are checked by the function format_data_PPBstats() with the following arguments :

  • type : "data_network"
  • network_part : "unipart"
  • vertex_type : "location"
  • network_split : "germplasm" or "relation_year_start".

The function returns list of igraph object2 coming from igraph::graph_from_data_frame().

data(data_network_unipart_sl)
head(data_network_unipart_sl)
##          seed_lot_parent         seed_lot_child relation_type
## 1 germ-8_loc-1_2007_0001 germ-8_loc-1_2008_0001     selection
## 2 germ-8_loc-1_2008_0001 germ-8_loc-1_2009_0001  reproduction
## 3 germ-8_loc-1_2009_0001 germ-8_loc-2_2009_0001     diffusion
## 4 germ-8_loc-1_2008_0001 germ-8_loc-1_2009_0001     selection
## 5 germ-1_loc-1_2005_0001 germ-8_loc-1_2006_0001  reproduction
## 6 germ-6_loc-1_2005_0001 germ-8_loc-1_2006_0001  reproduction
##   relation_year_start relation_year_end germplasm_parent location_parent
## 1                2007              2008           germ-8           loc-1
## 2                2008              2009           germ-8           loc-1
## 3                2009              2009           germ-8           loc-1
## 4                2008              2009           germ-8           loc-1
## 5                2005              2006           germ-1           loc-1
## 6                2005              2006           germ-6           loc-1
##   year_parent alt_parent long_parent lat_parent germplasm_child
## 1        2007         50    0.616363   44.20314          germ-8
## 2        2008         50    0.616363   44.20314          germ-8
## 3        2009         50    0.616363   44.20314          germ-8
## 4        2008         50    0.616363   44.20314          germ-8
## 5        2005         50    0.616363   44.20314          germ-8
## 6        2005         50    0.616363   44.20314          germ-8
##   location_child year_child alt_child long_child lat_child
## 1          loc-1       2008        50   0.616363  44.20314
## 2          loc-1       2009        50   0.616363  44.20314
## 3          loc-2       2009       360   3.087025  45.77722
## 4          loc-1       2009        50   0.616363  44.20314
## 5          loc-1       2006        50   0.616363  44.20314
## 6          loc-1       2006        50   0.616363  44.20314

2.3.3 Format and describe the data for each germplasm

For network_split = "germplasm", it returns a list with as many elements as germplam in the data as well as all germplasms merged in the first element of the list.

net_unipart_location_g = format_data_PPBstats(
  type = "data_network",
  data = data_network_unipart_sl, 
  network_part = "unipart", 
  vertex_type =  "location",
  network_split = "germplasm")
## data has been formated for PPBstats functions.
names(net_unipart_location_g)
##  [1] "germ-10 / germ-11 / germ-12 / germ-13 / germ-2 / germ-3 / germ-4 / germ-5 / germ-8 / germ-9"
##  [2] "germ-10"                                                                                    
##  [3] "germ-11"                                                                                    
##  [4] "germ-12"                                                                                    
##  [5] "germ-13"                                                                                    
##  [6] "germ-2"                                                                                     
##  [7] "germ-3"                                                                                     
##  [8] "germ-4"                                                                                     
##  [9] "germ-5"                                                                                     
## [10] "germ-8"                                                                                     
## [11] "germ-9"

The different representations are done with the plot() function.

For network representation, set plot_type = "network" diffusion event are display with curve. in_col can be settle to customize color of vertex. The curve between location represent the diffusion, the number of diffusion is displayed on a color scale.

p_net = plot(net_unipart_location_g, plot_type = "network", 
                          labels_on = "location", labels_size = 4)
names(p_net) # one element per germplasm, the first element with all the data
## [1] "germ-10 / germ-11 / germ-12 / germ-13 / germ-2 / germ-3 / germ-4 / germ-5 / germ-8 / germ-9"
## [2] "germ-10"                                                                                    
## [3] "germ-11"                                                                                    
## [4] "germ-12"                                                                                    
## [5] "germ-13"                                                                                    
## [6] "germ-8"                                                                                     
## [7] "germ-9"
p_net$`germ-2`
## NULL

2.3.4 Format and describe the data for each year

For network_split = "relation_year_start", it returns a list with as many elements as year in the data as well as all years merged in the first element of the list.

net_unipart_location_y = format_data_PPBstats(
  type = "data_network",
  data = data_network_unipart_sl,
  network_part = "unipart", 
  vertex_type =  "location",
  network_split = "relation_year_start")
## data has been formated for PPBstats functions.
names(net_unipart_location_y)
## [1] "2007-2008-2009" "2007"           "2008"           "2009"

The different representations are done with the plot() function.

For network representation, set plot_type = "network" diffusion event are display with curve. in_col can be settle to customize color of vertex. The curve between location represent the diffusion, the number of diffusion is displayed on a color scale.

p_net = plot(net_unipart_location_y, plot_type = "network", 
                          labels_on = "location", labels_size = 4)
names(p_net) # one element per year, the first element with all the data
## [1] "2007-2008-2009" "2007"           "2008"           "2009"
p_net$`2007-2008-2009`
## $network

With barplots, it represents the number of germplasm received or given.

p_bar = plot(net_unipart_location_y, plot_type = "barplot", x_axis = "location", in_col = "germplasm")
names(p_bar) # one element per year, the first element with all the data
## [1] "2007-2008-2009" "2007"           "2008"           "2009"
p_bar = p_bar$`2007-2008-2009`
p_bar$barplot$received

p_bar$barplot$given

Location present on the network can be displayed on a map with plot_type = "map". When using map, do not forget to use credit : Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.

p_map = PPBstats:::plot.data_network(net_unipart_location_y[1], plot_type = "map", labels_on = "location")
# Note if you want to do it on all element of the list, you should use 
# plot(net_unipart_location_y, plot_type = "map", labels_on = "location")
# Here we use PPBstats:::plot.data_network only not to ask to often the map server that may bug if there are too many query
# We use ::: because the function is not exported as it is an S3 method
p_map$`2007-2008-2009`
## $map

As well as plot information regarding a variable on map with a pie with plot_type = "map" and by setting arguments data_to_pie and vec_variables:

# y1 is a quantitative variable
p_map_pies_y1 = PPBstats:::plot.data_network(net_unipart_location_y[1], data_to_pie, plot_type = "map", vec_variables = "y1")
p_map_pies_y1$`2007-2008-2009`
## $y1_map_with_pies

# y2 is a qualitative variable
p_map_pies_y2 = PPBstats:::plot.data_network(net_unipart_location_y[1], data_to_pie, plot_type = "map", vec_variables = "y2")
p_map_pies_y2$`2007-2008-2009`
## $y2_map_with_pies