2.1 Introduction

Describing the topology of networks of seed circulation is interesting since it gives insight on how exchanges are organized within a PPB programme or a Community Seed Bank(Vernooy, Shrestha, and Sthapit 2015)(Pautasso et al. 2013). Analysis can be done at several geographical or organizing scales, for example local, regional or national.

Two types of network are handle within PPBstats :

  • unipart networks where a node can be
    • a seed lot (i.e. a combinaison of a germplasm in a given location a given year) and edges are relationships such as diffusion, mixture, reproduction, crosses or selection for example.
    • a location and edges are diffusion events between location
  • bipart network where a node can be a location or a germplasm

2.1.1 Workflow and function relations in PPBstats regarding network analysis

The workflow is very simple as only descriptive analysis can be done based on network format (Figure 2.1).

Decision tree with objectives and analysis carry out in `PPBstats` regarding network analysis. **M** refers to methods.

Figure 2.1: Decision tree with objectives and analysis carry out in PPBstats regarding network analysis. M refers to methods.

Figure 2.2 displays the functions and their relationships. Table 2.1 describes each of the functions.

You can have more information for each function by typing ?function_name in your R session.

Main functions used in the workflow.

Figure 2.2: Main functions used in the workflow.

Table 2.1: Main function descriptions.
function name description
format_data_PPBstats Check and format the data to be used in PPBstats functions
plot Build ggplot objects to visualize output

2.1.2 Data format

Three formats are possible:

  1. unipart network that represent the relationships between seed lots: a data frame with the following compulsory columns:
    • "seed_lot_parent" : name of the parent seed lot in the relationship
    • "seed_lot_child" ; name of the child seed lots in the relationship
    • "relation_type" : the type of relationship between the seed lots
    • "relation_year_start" : the year when the relationship starts
    • "relation_year_end" : the year when the relationship stops
    • "germplasm_parent" : the germplasm associated to the parent seed lot
    • "location_parent" : the location associated to the parent seed lot
    • "year_parent" : the year of the last relationship of the parent seed lot
    • "germplasm_child" : the germplasm associated to the child seed lot
    • "location_child" : the location associated to the child seed lot
    • "year_child" : represents the year of the last relation event of the child seed lot

Possible options are : "long_parent", "lat_parent", "long_child", "lat_child" to get map representation, supplementary variables with tags: "_parent", "_child" or "_relation".

  1. unipart networks that represent relationship of germplasm diffusion between locations: a data frame with the followin compulsory columns (same as above): "location_parent", "location_child", "relation_year_start", "relation_year_end". Possible options are : "germplasm_parent", "year_parent", "germplasm_child", "year_child". Other possibles option are : "long_parent", "lat_parent", "long_child", "lat_child" to get map representation.

  2. bipart networks represent “which location has which germplasm which year”: a data frame with the followin compulsory columns: "germplasm", "location", "year". Possible options are : "long", "lat" to get map representation

Note that format 1. can be convert to format 2. and 3. as summarized in Table 2.2. When bipart network come from unipart network for seed lots, relation reproduction and diffusion are taken.

Table 2.2: Possible analysis (in column) regarding network format (in row).
unipart for seed lots analysis unipart for location analysis bipart for germplasm and location analysis
unipart for seed lots format X X X
unipart for location format X
bipart for germplasm and location format X

The format of the data are checked by the function format_data_PPBstats().

The following argument can be used :

  • type : "data_network"
  • network_part : "unipart" or "bipart"
  • vertex_type :
    • for unipart network : "seed_lots" or "location"
    • for bipart network : c("germplasm", "location")
  • network_split : for network_part = "unipart" and vertex_type = "location", split of the data that can be "germplasm" or "relation_year_start".

Possible values of argument regarding network format are displayed in Table 2.3.

Table 2.3: Possible values of argument (in colum) regarding network format (in row).
network_part vertex_type network_split
unipart for seed lots format unipart or bipart seed_lots or location or c("germplasm", "location") NULL or germplasm or relation_year_start
unipart for location format unipart location germplasm or relation_year_start
bipart for germplasm and location format bipart c("germplasm", "location") NULL

The following sections give exemples for each network format. The function returns list of igraph object1 coming from igraph::graph_from_data_frame().

2.1.2.1 unipart for seed lots data

data(data_network_unipart_sl)
head(data_network_unipart_sl)
##          seed_lot_parent         seed_lot_child relation_type
## 1 germ-8_loc-1_2007_0001 germ-8_loc-1_2008_0001     selection
## 2 germ-8_loc-1_2008_0001 germ-8_loc-1_2009_0001  reproduction
## 3 germ-8_loc-1_2009_0001 germ-8_loc-2_2009_0001     diffusion
## 4 germ-8_loc-1_2008_0001 germ-8_loc-1_2009_0001     selection
## 5 germ-1_loc-1_2005_0001 germ-8_loc-1_2006_0001  reproduction
## 6 germ-6_loc-1_2005_0001 germ-8_loc-1_2006_0001  reproduction
##   relation_year_start relation_year_end germplasm_parent location_parent
## 1                2007              2008           germ-8           loc-1
## 2                2008              2009           germ-8           loc-1
## 3                2009              2009           germ-8           loc-1
## 4                2008              2009           germ-8           loc-1
## 5                2005              2006           germ-1           loc-1
## 6                2005              2006           germ-6           loc-1
##   year_parent alt_parent long_parent lat_parent germplasm_child
## 1        2007         50    0.616363   44.20314          germ-8
## 2        2008         50    0.616363   44.20314          germ-8
## 3        2009         50    0.616363   44.20314          germ-8
## 4        2008         50    0.616363   44.20314          germ-8
## 5        2005         50    0.616363   44.20314          germ-8
## 6        2005         50    0.616363   44.20314          germ-8
##   location_child year_child alt_child long_child lat_child
## 1          loc-1       2008        50   0.616363  44.20314
## 2          loc-1       2009        50   0.616363  44.20314
## 3          loc-2       2009       360   3.087025  45.77722
## 4          loc-1       2009        50   0.616363  44.20314
## 5          loc-1       2006        50   0.616363  44.20314
## 6          loc-1       2006        50   0.616363  44.20314
  • unipart for seed lots format
net_unipart_sl = format_data_PPBstats(
  type = "data_network",
  data = data_network_unipart_sl, 
  network_part = "unipart", 
  vertex_type =  "seed_lots")
## data has been formated for PPBstats functions.
length(net_unipart_sl)
## [1] 1
head(net_unipart_sl)
## [[1]]
## IGRAPH 7f2b739 DN-- 81 94 -- 
## + attr: name (v/c), germplasm (v/c), location (v/c), year (v/c),
## | alt (v/c), long (v/c), lat (v/c), format (v/c), relation_type
## | (e/c)
## + edges from 7f2b739 (vertex names):
## [1] germ-8_loc-1_2007_0001->germ-8_loc-1_2008_0001
## [2] germ-8_loc-1_2008_0001->germ-8_loc-1_2009_0001
## [3] germ-8_loc-1_2009_0001->germ-8_loc-2_2009_0001
## [4] germ-8_loc-1_2008_0001->germ-8_loc-1_2009_0001
## [5] germ-1_loc-1_2005_0001->germ-8_loc-1_2006_0001
## [6] germ-6_loc-1_2005_0001->germ-8_loc-1_2006_0001
## + ... omitted several edges
  • unipart for location format
net_unipart_location_g = format_data_PPBstats(
  type = "data_network",
  data = data_network_unipart_sl, 
  network_part = "unipart", 
  vertex_type =  "location",
  network_split = "germplasm")
## data has been formated for PPBstats functions.

For network_split = "germplasm", it returns a list with as many elements as germplam in the data as well as all germplasms merged in the first element of the list.

names(net_unipart_location_g)
##  [1] "germ-10-germ-11-germ-12-germ-13-germ-2-germ-3-germ-4-germ-5-germ-8-germ-9"
##  [2] "germ-10"                                                                  
##  [3] "germ-11"                                                                  
##  [4] "germ-12"                                                                  
##  [5] "germ-13"                                                                  
##  [6] "germ-2"                                                                   
##  [7] "germ-3"                                                                   
##  [8] "germ-4"                                                                   
##  [9] "germ-5"                                                                   
## [10] "germ-8"                                                                   
## [11] "germ-9"
net_unipart_location_y = format_data_PPBstats(
  type = "data_network",
  data = data_network_unipart_sl,
  network_part = "unipart", 
  vertex_type =  "location",
  network_split = "relation_year_start")
## data has been formated for PPBstats functions.

For network_split = "relation_year_start", it returns a list with as many elements as year in the data as well as all years merged in the first element of the list.

names(net_unipart_location_y)
## [1] "2007-2008-2009" "2007"           "2008"           "2009"
  • bipart for germplasm and location format
net_bipart = format_data_PPBstats(
  type = "data_network",
  data = data_network_unipart_sl, 
  network_part = "bipart", 
  vertex_type =  c("germplasm", "location")
  )
## data has been formated for PPBstats functions.

For bipart network, it returns a list with as many elements as year in the data as well as all years merged in the first element of the list. If no year are provided into the data, all information are merged.

names(net_bipart)
## [1] "2005-2006-2007-2008-2009" "2005"                    
## [3] "2006"                     "2007"                    
## [5] "2008"                     "2009"

2.1.2.2 unipart for location data

data(data_network_unipart_location)
head(data_network_unipart_location)
##   location_parent location_child relation_year_start relation_year_end
## 1           loc-1          loc-2                2009              2009
## 2           loc-4          loc-1                2009              2009
## 3           loc-4          loc-2                2009              2009
## 4           loc-1          loc-2                2008              2009
## 5           loc-1          loc-3                2007              2007
## 6           loc-1          loc-4                2007              2007
##   germplasm_parent year_parent alt_parent long_parent lat_parent
## 1           germ-2        2009         50    0.616363   44.20314
## 2           germ-2        2009        110    4.835659   45.76404
## 3           germ-2        2009        110    4.835659   45.76404
## 4           germ-2        2008         50    0.616363   44.20314
## 5           germ-2        2007         50    0.616363   44.20314
## 6           germ-2        2007         50    0.616363   44.20314
##   germplasm_child year_child alt_child long_child lat_child
## 1          germ-2       2009       360   3.087025  45.77722
## 2          germ-2       2009        50   0.616363  44.20314
## 3          germ-2       2009       360   3.087025  45.77722
## 4          germ-2       2009       360   3.087025  45.77722
## 5          germ-2       2007       170   2.352222  48.85661
## 6          germ-2       2007       110   4.835659  45.76404
net_unipart_location_g = format_data_PPBstats(
  type = "data_network",
  data = data_network_unipart_location, 
  network_split = "germplasm",
  network_part = "unipart", 
  vertex_type =  "location")
## data has been formated for PPBstats functions.

For network_split = "germplasm", it returns a list with as many elements as germplam in the data as well as all germplasms merged in the first element of the list.

names(net_unipart_location_g)
##  [1] "germ-10-germ-11-germ-12-germ-13-germ-2-germ-3-germ-4-germ-5-germ-6-germ-7"
##  [2] "germ-10"                                                                  
##  [3] "germ-11"                                                                  
##  [4] "germ-12"                                                                  
##  [5] "germ-13"                                                                  
##  [6] "germ-2"                                                                   
##  [7] "germ-3"                                                                   
##  [8] "germ-4"                                                                   
##  [9] "germ-5"                                                                   
## [10] "germ-6"                                                                   
## [11] "germ-7"
net_unipart_location_y = format_data_PPBstats(
  type = "data_network",
  data = data_network_unipart_location, 
  network_split = "relation_year_start",
  network_part = "unipart", 
  vertex_type =  "location")
## data has been formated for PPBstats functions.

For network_split = "relation_year_start", it returns a list with as many elements as year in the data as well as all years merged in the first element of the list.

names(net_unipart_location_y)
## [1] "2007-2008-2009" "2007"           "2008"           "2009"

2.1.2.3 bipart for germplasm and location data

data(data_network_bipart)
head(data_network_bipart)
##   germplasm location year alt     long       lat
## 1    germ-2    loc-1 2008  50 0.616363 44.203142
## 2    germ-2    loc-1 2009  50 0.616363 44.203142
## 3    germ-1    loc-1 2005  50 0.616363 44.203142
## 4   germ-14    loc-1 2005  50 0.616363 44.203142
## 5    germ-2    loc-1 2006  50 0.616363 44.203142
## 6    germ-2    loc-1 2007  50 0.616363 44.203142
net_bipart = format_data_PPBstats(
  type = "data_network",
  data = data_network_bipart, 
  network_part = "bipart", 
  vertex_type =  c("germplasm", "location")
  )
## data has been formated for PPBstats functions.

For bipart network, it returns a list with as many elements as year in the data as well as all years merged in the first element of the list. If no year are provided into the data, all information are merged.

names(net_bipart)
## [1] "2005-2006-2007-2008-2009" "2005"                    
## [3] "2006"                     "2007"                    
## [5] "2008"                     "2009"

References

Vernooy, R., P. Shrestha, and B. Sthapit. 2015. Community Seed Banks: Origins, Evolution and Prospects. Issues in Agricultural Biodiversity. Earthscan for Routledge.

Pautasso, M., G. Aistara, A. Barnaud, S. Caillon, P. Clouvel, O. Coomes, M. Delêtre, et al. 2013. “Seed exchange networks for agrobiodiversity conservation. A review.” Agronomy for Sustainable Development 33.