ORIGINAL RESEARCH article

Management of genetic diversity in the era of genomics.

\r\nTheo H. E. Meuwissen*

  • 1 Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, Norway
  • 2 NOFIMA, Ås, Norway
  • 3 The Roslin Institute and R(D)SVS, The University of Edinburgh, Edinburgh, United Kingdom

Management of genetic diversity aims to (i) maintain heterozygosity, which ameliorates inbreeding depression and loss of genetic variation at loci that may become of importance in the future; and (ii) avoid genetic drift, which prevents deleterious recessives (e.g., rare disease alleles) from drifting to high frequency, and prevents random drift of (functional) traits. In the genomics era, genomics data allow for many alternative measures of inbreeding and genomic relationships. Genomic relationships/inbreeding can be classified into (i) homozygosity/heterozygosity based (e.g., molecular kinship matrix); (ii) genetic drift-based, i.e., changes of allele frequencies; or (iii) IBD-based, i.e., SNPs are used in linkage analyses to identify IBD segments. Here, alternative measures of inbreeding/relationship were used to manage genetic diversity in genomic optimal contribution (GOC) selection schemes. Contrary to classic inbreeding theory, it was found that drift and homozygosity-based inbreeding could differ substantially in GOC schemes unless diversity management was based upon IBD. When using a homozygosity-based measure of relationship, the inbreeding management resulted in allele frequency changes toward 0.5 giving a low rate of increase in homozygosity for the panel used for management, but not for unmanaged neutral loci, at the expense of a high genetic drift. When genomic relationship matrices were based on drift, following VanRaden and as in GCTA, drift was low at the expense of a high rate of increase in homozygosity. The use of IBD-based relationship matrices for inbreeding management limited both drift and the homozygosity-based rate of inbreeding to their target values. Genetic improvement per percent of inbreeding was highest when GOC used IBD-based relationships irrespective of the inbreeding measure used. Genomic relationships based on runs of homozygosity resulted in very high initial improvement per percent of inbreeding, but also in substantial discrepancies between drift and homozygosity-based rates of inbreeding, and resulted in a drift that exceeded its target value. The discrepancy between drift and homozygosity-based rates of inbreeding was caused by a covariance between initial allele frequency and the subsequent change in frequency, which becomes stronger when using data from whole genome sequence.

Management of genetic diversity is usually directed at maintaining the diversity that was present in some population, which serves as a reference point against which diversity in the future is compared. This reference population may be some population in the past or the current population. In the absence of genomic data, the accumulated change in diversity was predicted to be a loss, and could only be described by inbreeding coefficients ( F ) based on pedigree data. These coefficients are the expectations of the loss in genetic variance relative to the reference population in which all alleles are assumed to be drawn at random with replacement, i.e., the classical base population. This description as a loss of variance is strictly for additive traits, but individual allele frequency at a locus among individuals (i.e., 0, ½, 1) is an additive trait. In this perspective, the management of genetic diversity comes down to the management of inbreeding, in particular controlling the rate of inbreeding (Δ F ), or, equivalently, the effective population size: N e = 1/(2Δ F ) ( Falconer and Mackay, 1996 ).

Optimal management of inbreeding in breeding schemes is achieved by optimal contribution (OC) selection ( Meuwissen, 1997 ; Woolliams et al., 2015 ) that, by construction, maximizes the genetic gain made for a given rate of inbreeding. In the era of genomics, Sonesson et al. (2012) concluded that genomic selection requires genomic control of inbreeding, i.e., genomic optimal contribution selection (GOC). With OC, the management of diversity within the population uses the form 1 2 c A ′ c where A is wright’s numerator relationship matrix and c is a set of fractional contributions of candidates to the next generation, and with GOC a genomic relationship matrix G replaces A . This has direct correspondence with the substantial literature on the use of similarity matrices and the fractional contributions of species as measures of species diversity (e.g., Leinster and Cobbold, 2012 ). The similarity matrices in OC use the idea of relationships, which are the scaled (co)variances of breeding values between all pairs of individuals in a population past and present, which links to the wider canon of genetic theory.

In the pre-genomics era, relationships were based on pedigree and pedigree-based coefficients of kinship describing the probability of identity-by-descent (IBD) at neutral loci that are unlinked to any loci under selection. Within this subset of loci, IBD results in a redistribution of genotype frequencies away from Hardy-Weinberg proportions toward homozygosity by p 0 2 ( 1 - F ) + p 0 F ,  2 p 0 ( 1 - p 0 ) ( 1 - F ) , and (1 − p 0 ) 2 (1 − F ) + (1 − p 0 ) F for the genotypes AA, Aa and aa, respectively, where p 0 is the original frequency of the A allele ( Falconer and Mackay, 1996 ). This redistribution of genotype frequencies links the changes of heterozygosity [expected to reduce by a factor (1–F)], the within line genetic variance [also reducing by (1–F)], and the genetic drift variance of allele frequencies [ p 0 (1– p 0 )F] to the inbreeding coefficient describing the IBD of sampled alleles. These expected changes do not hold for loci linked to the causal variants of complex traits (QTL), where allele frequencies and genotype frequencies may change non-randomly, and cannot be explained by IBD predicted by pedigree alone.

When defining inbreeding as the correlation between uniting gametes, Wright (1922) assumed the infinitesimal model, which implies infinitesimal selection pressures with random changes in allele frequency. However, the genome is of finite size, and for complex traits with many QTL selection pressures will extend to neutral loci in linkage disequilibrium (LD) across the genome, and these associations to loci under selection result in non-random changes of allele frequencies. This is particularly the case for genomic selection schemes, where marker panels are large, but not infinitely large, dense and genome-wide, and designed to be in LD with all QTL, and where selection is directly for the markers included in the panel. In this setting unlinked neutral loci are likely to be rare, so the classical theory appears redundant.

Despite the apparent loss of a unifying paradigm, genomics opens up a choice of tools that could be used to describe genetic diversity that is wider in scope than the classical genetic variance and inbreeding. For example, tools based on genomic relationships ( VanRaden, 2008 ), runs of homozygosity ( de Cara et al., 2013 ; Luan et al., 2014 ; Rodríguez-Ramilo et al., 2015 ), and linkage analysis ( Fernando and Grossman, 1989 ; Meuwissen et al., 2011 ). Some genomic measures may be better suited for some purposes than others, and so the question arises of what is the purpose of the management of diversity in breeding schemes in addition to what tools to use. Furthermore, when considering tools for genomic inbreeding, there is a need to distinguish which aspect of inbreeding they depict (IBD, heterozygosity/homozygosity, or genetic drift), since in (genomic) selection schemes their expectations may differ from those derived from random allele frequency changes resulting in the genotype frequencies p 0 2 ( 1 - F ) + F p 0 ,  2 p 0 ( 1 - p 0 ) ( 1 - F ) , and (1 − p 0 ) 2 (1 − F ) + F (1 − p 0 ).

Most molecular genetic measures of inbreeding are based on the allelic identity of marker loci, and do not directly separate IBD from Identity-By-State (IBS). Genomic relationship matrices which are variants of VanRaden (2008) compensate for this by measuring squared changes in allele frequency relative to a set of reference frequencies. For the purposes of managing changes in diversity relative to the reference population these frequencies would be those relevant to this base generation ( Sonesson et al., 2012 ), although often the frequencies in the current “generation” are used ( Powell et al., 2010 ), or simply the subset of the population for which the genomic data is available; see Legarra (2016) for further discussion on these issues. Providing the base generation is used to define the reference frequencies at neutral unlinked loci ( p 0, k for locus k), the expectation of G VR2 (Method 2; VanRaden, 2008 ) is A , with all loci equally weighted after standardization using the base generation frequencies. In comparison, G VR1 (Method 1) can be viewed as simply re-weighting the loci by 2 p 0, k (1− p 0, k ): i.e., for a single locus, G VR1 and G VR2 yield identical relationship estimates, and extending to many loci G VR 2 uses the simple mean of the single locus estimates whereas G VR 1 uses the weighted mean with 2 p 0, k (1− p 0, k ) as the weights. Extending the argument of Woolliams et al. (2015) for G VR1 , since G VR2 is based on the squares of standardized allele frequency changes, and the management of diversity using G VR2 will constrain these squared standardized changes; this measurement of inbreeding will be denoted as F drift [see Eq. (1B) in Methods section for a more precise definition]. When using 0.5 as the base frequency for all loci, as sometimes proposed, the relationship matrix G VR 0.5 is proportional to homozygosity and molecular coancestry ( Toro et al., 2014 ). Hence, G VR 0.5 may be used to measure homozygosity-based inbreeding, F hom , and the loss of heterozygosity (1– F hom ).

The use of a genomic relationship matrix, G LA , based on linkage analysis for inbreeding management was suggested and studied by Toro et al. (1998) , Wang (2001) , Pong-Wong and Woolliams (2007) , Fernandez et al. (2005) , and Villanueva et al. (2005) . Here the inheritance of the marker alleles is used to determine probabilities of having inheriting the maternal or paternal allele from a parent at the marker loci instead of assuming 50/50 inheritance probabilities as in A . G LA thus requires pedigree and marker information, and IBD relationships are relative to the (assumed) unrelated and non-inbred base population as in A . In this way IBD is evaluated directly by G LA , and is not simply an expectation for neutral unlinked loci as described above for G VR2 . If two (base) individuals are unrelated in A then they are unrelated in G LA , whereas the other measures also estimate (non-zero) relationships for base population individuals. The marker data accounts for Mendelian segregation which may deviate from 50/50 probabilities through any linkage drag from loci under selection, or selective advantage. G LA can be constructed by a tabular method, similar to that for the pedigree based relationship matrix ( Fernando and Grossman, 1989 ), and software for the simultaneous linkage analysis of an entire chromosome is available (e.g., LDMIP (Linkage Disequilibrium Multilocus Iterative Peeling); Meuwissen and Goddard, 2010 ). G LA is a tool that specifically describes IBD across the genome, hence we will denote this IBD based estimate of inbreeding as F IBD .

A run of homozygosity (ROH) is an uninterrupted sequence of homozygous markers ( McQuillan et al., 2008 ). The exact definition of a ROH differs among studies as a number of ancillary constraints are added related to the minimum length of a ROH measured in markers and/or cM, minimum marker density, and in some cases an allowance for some heterozygous genotypes arising from genotyping errors. The idea is that a run of homozygous markers indicates an IBD segment, since it is unlikely that many consecutive homozygous markers are IBS by chance alone. The total length of ROH relative to the total genome length provides an estimate of F IBD from the DNA itself, and this estimate will be denoted F ROH . The reference population for F ROH is unclear, although by varying the constraint on the length of the ROHs the emphasis can be changed from old inbreeding, with short ROHs, to young inbreeding, with long ROHs ( Keller et al., 2011 ). F ROH may miss some relevant inbreeding since IBD segments shorter than the minimum length are neglected. On the one hand, F ROH is an IBD based measure of inbreeding, as it attempts to identify IBD segments (especially when ROHs are long), but on the other hand it is a homozygosity based measure of inbreeding since it is actually based on the homozygosity of haplotypes (especially when ROHs are short). However, F ROH is a measure of inbreeding in a single individual and is unsuitable for a measure of IBD within the population as a whole. Therefore integration of ROH into a GOC framework requires a pairwise measurement to form a similarity matrix, G ROH ( de Cara et al., 2013 ).

The aim of this study is to: (i) re-examine the goals of the management of genetic diversity in breeding schemes, and the molecular genetic parameters that may be incorporated into these goals; and (ii) compare alternative genomic- and pedigree-based measures of inbreeding and relationships for addressing the goals. In doing so the different tools discussed above and some novel variants will be compared for their ability to generate gain in breeding schemes while measures of inbreeding are constrained. Finally, conclusions are made with respect to the practical implementation of these tools for managing diversity and how the outcomes will depend on whether whole genome sequence (WGS) data is considered or marker panels.

Materials and Methods

The goals of the management of genetic diversity.

Managed populations, such as livestock, will generally have many desirable characteristics (related to production, reproduction, disease resistance, etc.). Some of these characteristics are to be improved (the breeding goal traits), without jeopardizing the others. The latter is the aim of the management of inbreeding. Specifically, breeding programs aim to change allele frequencies at the QTL in the desired direction. This ultimately results in loss of variation at the QTL as fixation approaches, but providing these changes are in the right direction this loss of variation is not a problem. However, genetic drift from our reference population and loss of variation at loci that are neutral for the selection goal are to be avoided for the following reasons. Firstly, to alleviate the risk of inbreeding depression through decreased heterozygosity, particularly for traits that are not under artificial selection but are needed for the healthy functioning of the animals. Secondly, deleterious recessive alleles may drift to high frequencies, and occur more frequently in their deleterious or lethal homozygous form; although mentioned separately this is a specific manifestation of inbreeding depression. In the genomics era, deleterious recessives may be identified and mapped ( Charlier et al., 2008 ), and if achieved recessive mutations may be selected against (at the cost of selection pressures), or potentially gene-edited. Nonetheless, simultaneous selection against many genetic defects diverts substantial selection pressures away from other traits in the breeding goal. Thirdly, loss of variation arising from selection sweeps for the current goal may erase variation for traits that are currently not of interest but may be valued in the future and so limit the future selection opportunities. Fourthly, genetic drift in the sense of random changes of allele frequencies, and thus random changes of trait values, which may be deleterious. This encompasses both the traits outside the current breeding goal and within it, where drift is observed as variability in the selection response. Moreover, large random changes in allele frequency may disrupt positive additive-by-additive interactions between QTL which have occurred due to many generations of natural and/or artificial selection (similar to recombination losses in crossbreeding; Kinghorn, 1980 ). In addition, random allele frequency changes may result in the loss of rare alleles, which implies a permanent loss of variation.

Measures for Management of Inbreeding

Whilst genomics offers molecular measures for direct monitoring, most obviously heterozygosity and frequency changes measured from a panel of anonymous markers, the strategy for management of these diverse problems using genomics does not follow directly. For example, increasing heterozygosity per se , achieved by moving allele frequencies of marker loci toward ½ is not solely beneficial, as while potentially ameliorating the aforementioned problems 1 and 3 it is deleterious for problems 2 and 4. Both these empirical measures of heterozygosity and the change of frequencies from drift can be considered to be measures of inbreeding and diversity. Wright (1922) states that a natural inbreeding coefficient moves between 0 and 1 as heterozygosity with random mating moves between its initial state and 0: therefore, if a locus k has initial frequency p 0 and current frequency p t,k then a measure of inbreeding is 1−( H t , k / H 0, k ) = 1−[2 p t , k (1− p t , k )]/[2 p 0, k (1− p 0, k )], which can be generalized by averaging loci to obtain F hom , i.e.,

where N SNP is the total number of loci. F hom can be negative when heterozygosity increases due to allele frequencies moving toward 0.5. Similarly, drift can be measured as δ p t , k 2 = ( p t , k - p 0 , k ) 2 , scaled by the expected value for complete random inbreeding, i.e., δ p t , k 2 / [ p 0 , k ( 1 - p 0 , k ) ] , and similarly averaged over loci to obtain F drift , i.e.,

and which is never negative. F drift is similar to the definition of F ST ( Holsinger and Weir, 2009 ), which is here applied to a single population over time instead of a sample of populations, and it is this empirical measure that is being directly addressed when using G V R 2 .

For locus k in the set of neutral loci with frequency p 0, k in the base population and frequency p t , k = p 0, k + δ p t , k in generation t, twice the frequency in generation t is 2 p t , k 2 + H t , k = 2 ( p 0 + δ p t , k ) , where H t , k = 2( p 0 + δ p t , k )(1− p 0 −δ p t , k ), which holds for all loci assuming random mating. With a sufficiently large subset of neutral loci with the same base frequency p 0 if E [δ p t , k | p 0 ] = 0 then taking expectations over this subset 2 E [ p t , k 2 ] + E [ H t , k ] = 2 p 0 and so 2 ( E [ p t , k 2 ] - p 0 2 ) + E [ H t , k ] = 2 p 0 ( 1 - p 0 ) . The first term is 2 v a r ( p t , k ) and the second is H t and dividing through by 2 p 0 (1− p 0 ) gives

Therefore if E [δ p t , k | p 0 ] = 0 over the range 0 < p 0 < 1, there is an equivalence of F drift with F hom irrespective of initial frequency, p 0 ( Falconer and Mackay, 1996 ): i.e., drift- and homozygosity-based inbreeding are expected to be the same if allele frequency changes are on average 0 irrespective of the initial frequency.

Using a form of GOC related to G VR1 (see Discussion), de Beukelaer et al. (2017) explore the management of diversity and derived the consequences for the rate of homozygosity, 2 ( δ p t , k 2 + 2 δ p t , k ( p 0 - 1 2 ) ) / H t , k . They suggested (supported by results below) that the term δ p t , k ( p 0 - 1 2 ) , which represents a covariance between allele frequency change δ p t , k and initial frequency p 0, k across the loci k , may be non-zero. Consequently, E [δ p t , k | p 0 ]≠0, and Equation [2] will no longer hold, and F drift ≠ F hom . Supplementary Information 1 shows that any deviation from Equation [2] for a general set of loci for which E [δ p t , k ] = 0 over the set, not necessarily with the same initial frequency, must be explained by a covariance between allele frequency changes and the original frequency cov(δ p t , k ; p 0, k ) and shows:

i.e., if there is covariance between initial allele frequencies and frequency changes, homozygosity and drift based inbreeding are no longer equal. Therefore this covariance will be important in determining the impact of genomic management, which aims to manage both the increase of homozygosity and genetic drift.

Supplementary Information 1 explores why completely random selection of parents (i.e., with no management) generates no covariance and how different broad management goals for diversity may generate a covariances of different signs. In particular, with completely random selection, most markers drift to the nearest extreme with the smaller change in frequency, but a minority will move to the opposite extreme resulting in the larger frequency change, giving a net result of no covariance. The consequence of using GOC based on G VR2 is that the latter large allele frequency changes are penalized more heavily, since they add as δ p t , k 2 to the elements of G VR2 and consequently to 1 2 c G ′ c . Hence, the hypothesis is tested below that G VR2 emphasizes the movement of MAF toward 0, and more generally allele frequencies move away from intermediate values toward the nearest extreme, resulting in c o v (δ p t , k ; p 0, k ) > 0 and v a r ( p t , k )/[ p 0 (1− p 0 )] + E [ H t , k / H 0, k ] < 1, contrary to expectations in Eq. (2).

Conversely if G 0.5 is used in GOC then there will be pressure to move allele-frequencies toward 0.5 resulting in increasing heterozygosity ( Li and Horvitz, 1953 ). Supplementary Information 1 shows that this results in c o v (δ p t , k ; p 0, k ) < 0, and thus F hom < 0, and F drift > 0, and v a r ( p t , k )/[ p 0 (1− p 0 )] + E [ H t , k / H 0, k ] > 1, again contrary to expectations in Eq. (2). Furthermore the implication of these considerations is that the covariance c o v (δ p t , k ; p 0, k ) is a property of the active management of diversity using squared frequency changes as in G VR2 (or G VR1 ) and not as a consequence of directional selection. This hypothesis was tested below in two ways: firstly by combining the management of diversity using G VR2 with randomly generated EBVs, and secondly by using a panel of markers for managing diversity that is distinct from the panel used for estimating GEBVs for genomic selection.

The term δ p t , k 2 / [ p 0 , k ( 1 - p 0 , k ) ] appearing in F drift can be viewed as an approximation to the squared total intensity ( i 2 ) applied to the marker, where i ≈δ p t , k /[ p 0, k (1− p 0, k )]. The approximation arises because the total selection intensity applied to a marker is not linear with frequency (see Liu and Woolliams, 2010 ). For example, after the initial generation, the intensity applied to alleles moved toward ½ is overestimated, since the denominator of i increases over time, which reduces the actual intensity applied. The opposite holds for those alleles moved toward the nearest extreme. Therefore a further hypothesis is that a relationship matrix built upon i 2 , G i(p) , rather than δ p t , k 2 may remove the covariance of the change in frequency with the initial frequency that is generated using G VR2 . More details on this and the calculation of G i(p) are given in Supplementary Information 2 .

In classical theory, the equivalence of F drift with F hom under random mating is an outcome of considering IBD, and management by IBD. The genomic relationship matrices based on allele frequency changes or functions of these changes no longer consider IBD as they only consider IBS. Supplementary Information 3 considers the IBD properties of the linkage analysis relationship matrix G LA which is derived from the markers. Considering the management of diversity over generations when using G LA , the conclusion of Supplementary Information 3 is that δ p t , k will now be determined by the properties of the base population and not through linkage disequilibrium generated in the course of the selection process. Therefore, the covariance between the change in frequency and its initial value is potentially avoided. This leads to a further hypothesis tested below that if G LA replaces G VR2 in GOC then F drift = F hom and v a r ( p t , k )/[ p 0 (1− p 0 )] + E [ H t , k / H 0, k ] = 1, as expected in Eq. (2); i.e., consideration of IBD restores the equivalence of F drift and F hom for a set of neutral markers. If A or a ROH-based G ROH replaces G LA the same hypothesis may be advanced given their focus on approximating IBD, however, both are approximations to the true genomic IBD that is tracked by G LA and so the equivalence may only be approximate.

In summary, there are a range of hypotheses to be tested on three categories of relationship matrix: those based on drift, changes in allele frequency or functions of them ( G V R 1 , G V R 2 ,and G i ( p ) ); those based on homozygosity exemplified by G 0.5 ; and those based on IBD ( G LA and A ). A relationship matrix based on ROH, G ROH , is a hybrid of the latter two, targeting IBD by measuring homozygosity of haplotypes.

Breeding Structure and Genomic Architecture

A computer simulation study was conducted to compare these alternative GOC methods. The simulations mimicked a breeding scheme using sib-testing, such as those used for disease challenges in fish breeding, which is similar to Sonesson et al. (2012) . The scheme had a nucleus where selection of candidates was entirely based on their genomic data and performance recording was solely on the full-sibs of the selection candidates which were also genotyped. This scheme may be considered extreme in the sense that the candidates themselves have no performance records, and is practiced in aquaculture to prevent disease infections within the breeding population. There were 2000 young fish per generation, and every full-sib family was split in two: half of the sibs became selection candidates and the other half test-sibs. The actual number of families and their size depended on the optimal contributions of the parents.

The genome consisted of 10 chromosomes of size 1 Morgan. Base population genomes were simulated for a population of an effective size of N e = 100 for 400 (=4 N e ) generations with SNP mutations occurring at a rate of 10 –8 per base pair per generation using the infinite-sites model. This resulted in WGS data for base population genomes that were in mutation-drift-linkage disequilibrium balance. The historical population size was chosen to equal the effective population size targeted for the breeding schemes and so avoid any effect of a sudden large change in effective population size. This resulted in 33,129 segregating SNP loci, which is relatively small in number due to the small effective size of 100. From these loci N SNP = 7000 were randomly sampled as marker loci for use in obtaining GEBV by genomic selection (Panel M); another distinct sample of 7000 loci were randomly sampled as additive QTL, which obtained an allelic effect sampled from the Normal distribution (Panel Q); and a further distinct sample of 7000 SNP loci were randomly sampled to act as “neutral loci” (Panel N), which were used to assess allele-frequency changes and loss of heterozygosity at neutral (anonymous) WGS loci, not involved in either genomic prediction or diversity management. In the majority of schemes Panel M was used for constructing genomic relationship matrices for both obtaining EBVs and diversity management. However, to test whether the non-neutrality of the SNPs used for genomic prediction interfered with their simultaneous use for diversity management, a further distinct panel of 7000 randomly picked loci (Panel D) was used for diversity management in some schemes.

True breeding values were obtained by summing the effects of the QTL alleles across the loci in Panel Q, before scaling them such that the total genetic variance was σ g 2 = 1 in the base population. Phenotypes were obtained by adding a randomly sampled environmental effect with variance σ e 2 = 1.5 , resulting in a heritability of 0.4. After the initial 400 unselected generations to simulate a base population ( t = 0), the breeding schemes described below were run for 20 generations, of which the first generation comprised random selection in order to create an initial sib-family structure.

Genomic Estimates of Breeding Values

GEBV ( g ^ ) were obtained by the SNP-BLUP method ( Meuwissen et al., 2001 ) where BLUP estimates of SNP effects were obtained from random regression on the SNP genotypes of Panel M coded as X ik = –2 p 0, k /√[2 p 0, k (1– p 0, k )], (1–2 p 0, k )/√[2 p 0, k (1– p 0, k )], or (2–2 p 0, k )/√[2 p 0, k (1– p 0, k )] for homozygote, heterozygote, and alternative homozygote genotypes, respectively, of the k th SNP of animal i , and p 0, k is the allele frequency of a randomly chosen reference allele of the k th SNP in generation 0. The model for the BLUP estimation of the SNP effects was:

where y is a vector of records; μ is the overall mean; X is a matrix of genotype codes as described above; b is a vector of random SNP effects [ a priori , b ∼ M V N ( 0 , σ g 2 N S N P - 1 I ) ], and e is a vector of random residuals [ a priori e ∼ N ( 0 , σ e 2 I ) ]. GEBV were obtained as g ^ = X b ^ where b ^ denotes the BLUP estimates of the SNP effects. This model is often implemented in the form of GBLUP using VanRaden (2008) Model 2, which assumes that all loci explain an equal proportion of the genetic variance. When simulating true breeding values, variances of allelic effects were equal across the loci, which implies that the high-MAF QTL explain more variance than the low-MAF QTL. Hence, there is a discrepancy between the simulation model and that used for analysis. However, such discrepancies always occur with real data. To separate the effects of selection and inbreeding management, one of the schemes described below randomly sampled GEBVs from a Normal distribution each generation.

Assessing the Rates of Inbreeding at Neutral Loci

F hom and F drift were calculated for each scheme, and since discrepancies were anticipated ( Supplementary Information 1 ) Δ F was also calculated from both heterozygosity and drift to give Δ F hom and Δ F drift . The calculations described below were done for all schemes with Panel N which were both functionally neutral in not influencing the breeding goal traits, and algorithmically neutral in not being involved in the breeding value prediction. Calculations were repeated for Panel M, and Panel D when used.

Heterozygosity

Calculation was based upon classical models where for generation t (Σ loci k H t , k / H 0, k )/ N SNP = 1− F hom = (1−Δ F ) t where Δ F is the rate of inbreeding, and N SNP the number of loci in the panel. A log transformation yields a linear relationship log⁡(Σ loci k H t , k / H 0, k )−log⁡( N SNP ) = t log⁡(1−Δ F )≈− t Δ F , where the approximation holds for small Δ F when using natural logarithms. This regression was calculated and provided both a test of constant Δ F hom and an estimate of Δ F hom from (−1) × slope of the regression.

At time t , F drift was calculated as Σ loci k ( p t , k − p 0, k ) 2 /[ p 0, k (1− p 0, k )]. Analogously with heterozygosity, classical theory was followed by taking logs of (1− F drift ) with Δ F drift estimated by −1 × slope from the regression on t .

Optimum Contribution Selection Methods

In optimal contribution selection, the rate of inbreeding is constrained by constraining the increase of the group coancestry of the selected parents, G ¯ = 1 2 c ′ G c , where G denotes the relationship matrix of interest for managing diversity among the selection candidates, and c denotes a vector of contributions of the selection candidates to the next generation, which is proportional to their numbers of offspring. Therefore the group coancestry is the average relationship among all pairs of the parents, including self-pairings, weighted by the fraction of offspring from the pair assuming completely random mating. Furthermore, the genetic level of the selected animals, g ¯ = c ′ g ^ , is maximized weighted by their number of offspring. Hence, the optimisation is as follows:

A number of relationship matrices were investigated for managing the diversity: (i) the pedigree-based relationship matrix A ; (ii) the genomic relationship matrix G VR 2 = X X ′/ N SNP ( VanRaden, 2008 ; Model 2) constructed using Panel M; (iii) the genomic relationship matrix G V R 1 = Z Z ′/Σ loci k H 0, k ( VanRaden, 2008 ; Model 1) constructed using SNP Panel M where Z i j = (−2 p 0 j ),(1−2 p 0 j ),or(2−2 p 0 j ); (iv) G 0.5 , a homozygosity based matrix of relationships, since its elements ( i,j ) are proportional to the expected homozygosity of progeny of animals i and j ( Toro et al., 2014 ); (v) G LA constructed from Panel M using linkage analysis ( Fernando and Grossman, 1989 ; Meuwissen et al., 2011 ); (vi) a novel relationship matrix G i(p) constructed from squared total applied intensities using Panel M (see Supplementary Information 2 ); (vii) the genomic relationship matrix G ROH based on ROH assessed using Panel M following the method of de Cara et al. (2013) (see Supplementary Information 2 ); (viii) a genomic relationship matrix G VR2 constructed using Panel D instead of M. In this replicated simulation study, the calculation of G LA by LDMIP ( Meuwissen and Goddard, 2010 ) was computationally too demanding and instead, a haplotype-based approach was adopted as an approximation (see Supplementary Information 2 ).

Implementation of Selection Procedures

The selection schemes simulated will be denoted by the relationship matrix used in GOC and the panel of markers used for SNP-BLUP and building the relationship matrix. The panel for SNP-BLUP was either “M”, or “∼” when using randomly generated GEBV. The latter implements a scheme without directional selection, and tests whether observed results are due to selection or due to diversity management. The panel for management of inbreeding was either “M,” “D,” or “∼” when using A which required no marker panel. Therefore a total of 9 schemes contribute to the results presented: 6 of which are of the form G (M,M) where G is either G VR 1 , G VR2 , G 0.5 , G LA , G i(p) , and G ROH ; with the remaining three being A (M,∼), G VR2 (M,D), and G VR2 (∼,M), where the first symbol in parentheses refers to EBV estimation and the second to diversity management. The schemes are summarized in Table 1 .

www.frontiersin.org

Table 1. The relationship matrices and marker panels that were used for the alternative breeding schemes.

For all schemes the target Δ F was set via the parameter K to 0.005 / generation, so the target effective population size was 100. Therefore the group coancestry of the parents was set in generation t to K t = K t −1 + 0.005(1− K t −1 ), where K 0 = 1 / 2 G ¯ and G ¯ denotes the average relationship of all candidates in generation 1 (the first generation with GOC selection). Each scheme was replicated 100 times by generating a new base population as described above. Simulation errors were reduced by simulating all alternative breeding schemes on each replicate of the initial generations, using the same Panels M, Q, N, and D, and the same effects for the QTLs. Each generation had random mating among males and females with mating proportions guided by the optimum contributions c .

G LA and A are mathematically guaranteed to be positive definite, and G VR 1 , G VR2 , G 0.5 , and G i(p) are guaranteed to be positive semi-definite, i.e., all eigenvalues λ i ≥0, as they are the cross-product of SNP genotype matrices ( X or Z ) with one eigenvalue of zero due to the centring of the genotypes. For the semi-definite matrices a small value (α = 0.01) was added to their leading diagonal to make them invertible, and positive definite to permit the use of the optimal contribution algorithm of Meuwissen (1997) . In contrast, G ROH is not guaranteed to be semi-positive definite since its elements are calculated one by one, and large negative eigenvalues for G ROH were observed empirically (results not shown). When using a general matrix inversion routine the achieved Δ F were much larger than 0.005/generation. Hence, G ROH was made positive definite by adding substantial values of α to its diagonals, chosen by trial and error. Starting from an initial value of α = 0.05, positive definiteness was tested by inversion using Cholesky decomposition, and if it failed then α was doubled if α < 1 or increased by 1 otherwise, until inversion was successful.

The distribution of MAF for the SNPs in the WGS of the founder population ( t = 0) observed in the simulations is depicted in Figure 1 . The four SNP panels, i.e., M, the SNP-BLUP panel, N, the neutral marker panel, Q, the QTL panel, and D, a second marker panel for genetic diversity management, are random samples from the SNPs depicted in Figure 1 . The MAF distribution is typical for that of whole genome sequence data with very many SNPs with rare alleles and relatively few SNPs with intermediate allele frequencies.

www.frontiersin.org

Figure 1. Histogram of the minor allele frequencies (MAF) of the SNPs in the whole genome sequence of the founder population ( t = 0) observed in the simulations following 4000 generations of mutation and random selection.

Equivalence of F drift and F hom

Table 2 shows for the alternative breeding schemes the drift- and homozygosity-based rates of inbreeding, together with the deviations F hom – F drift in generation 20. For classical inbreeding theory the expectation is that F hom = F drift = 0.095 for random mating. However, with two sexes there will be deviations which depend on the number of mating parents which are shown in Figure 2 and were approximately equally divided between males and females each generation. This has an impact in decreasing F hom at generation 20 below random mating expectations by approximately 1/(2T) where T is the total number of parents following Robertson (1965) . Therefore at generation 20, there is a classical expectation for F drift to exceed F hom by ∼0.001 for schemes G ROH (M,M) and A (M,∼), through ∼0.005 for G LA (M,M) to ∼0.01 for G VR2 (M,M).

www.frontiersin.org

Table 2. Rates of increase of homozygosity (Δ F hom ), drift (Δ F drift ), and the deviation F hom – F drift in generation 20 for different types of diversity measures for Panels M and N.

www.frontiersin.org

Figure 2. The total number of selected parents for each generation for different breeding schemes. The total is the number of animals with optimal contributions >0 required to achieve a fractional increase in the OC constraint of 0.005.

The deviations of F hom – F drift from 0 were significant for all the schemes, for both the SNP-BLUP Panel M and the neutral Panel N, and would imply significant deviations from the classical Eq. (2). The deviation F hom – F drift for G LA (M,M) was closest to the classical expectation, and was closer still after accounting for the degree of non-random mating that was present. Among the remaining schemes A (M,∼) most closely aligns to classical expectations. The results based on ROH which attempts to mimic IBD appears more similar to G 0.5 (M,M) which manages homozygosity, where F drift exceeds F hom , although the deviations of the G 0.5 (M,M) scheme are much larger, with F hom − F drift = −0.347 for Panel M which is more than a third of the maximum inbreeding coefficient of 1.

G VR2 (M,M), i.e., a commonly used GOC scheme, showed a large deviation opposite to that for G 0.5 (M,M) with F hom − F drift = 0.147 for Panel M, and 0.053 for Panel N, an excess of loss of heterozygosity relative to drift. Supplementary Information 1 shows this discrepancy must arise due to a covariance between the direction of allele frequency change and initial frequency, with a stronger drift to extremes than would be expected in classical theory. Figure 3 illustrates this covariance for a randomly chosen replicate, and shows the regression line ( P < 0.001); for this replicate the difference F hom − F drift = 0.055 in Panel N, which arose from a correlation of only 0.040. For G VR1 (M,M), which compared to G VR2 (M,M) weights the Panel M loci proportional to 2 p 0, k (1− p 0, k ), this covariance was weaker but was still observed. The result for G VR2 (M,D) showed that if the panel used for managing diversity (D) is distinct from that used for SNP-BLUP (M), the covariance in Panel M became similar to that for Panel N, as it is no longer directly managed for its diversity, and the outcome for the unmanaged neutral Panel N was almost identical to G VR2 (M,M). The hypothesis that the covariance arises solely as a property of the management by G VR2 , rather than as a consequence of the directional selection, was confirmed by the results for G VR2 (∼,M) where F hom still exceeded F drift . Managing the intensity in scheme G i(p) (M,M) did not remove the covariance but, in contrast to the other “drift” schemes, reversed its sign so that F drift exceeded F hom , which is in accord with the hypothesis that it introduces an increased “cost” of moving toward the extremes compared to G VR2 (M,M).

www.frontiersin.org

Figure 3. The covariance between the standardized change in allele frequency at t = 20 and the standardize frequency at t = 0 for the 7000 SNP loci in Panel N for a randomly chosen replicate. Standardization is by p 0 , k ( 1 - p 0 , k ) for locus k . The solid black line is the fitted linear regression y = 0.0083 + 0.0070×, with SES 0.0042 and 0.0021, respectively, and a Pearson correlation r = 0.040. For this replicate F drift = 0.123, F hom = 0.178, and twice the covariance was 0.0555. The upper x -axis shows the untransformed frequency.

Managing the Rates of Inbreeding

Table 2 shows Δ F drift and Δ F hom for the different schemes for Panels M and N, and Figure 4 shows F drift and F hom over time. Figure 4 shows that log(1- F drift ) is approximately linear with generation for all schemes, in contrast to log(1- F hom ) where some schemes, e.g., G ROH (M,M) show marked curvilinearity.

www.frontiersin.org

Figure 4. Changes in inbreeding coefficients F drift and F hom for the neutral loci of Panel N over time plotted on a logarithmic scale where a constant rate of inbreeding results in a linear increase of over time: (A) natural logarithm of (1–F hom ); and (B) natural logarithm of (1–F drift ).

For G VR2 (M,M), Δ F drift for Panel M was directly controlled and was on target at 0.005, but Δ F hom was more than double this target, due to the covariance described above. For Panel N, Δ F drift was greater and Δ F hom was less than observed for Panel M, so the difference was less extreme. The increase in Δ F drift was due to Panel N’s LD with QTL that was not accounted for by its LD with Panel M, while the decrease in Δ F hom was due to the allele frequencies for loci in Panel N being subject to weaker regulation due to their imperfect LD with those in Panel M. The same pattern of differences between Δ F drift and Δ F hom was observed in a less extreme form with G VR2 (∼,M) as here the imperfect LD between Panels M and N is still important but the more favored marker alleles in Panel M change randomly from generation to generation. The outcome for Δ F drift shown in Table 2 for G VR1 (M,M) for Panel M is greater than the target, as F drift and F hom weight all loci in a panel equally, whereas the management weights the drift by 2 p 0, k (1− p 0, k ), consequently the LD with QTL is more weakly constrained for loci with low MAF in Panel M, which is where the impact of the covariance is greatest ( Figure 3 ). This also explains the lower Δ F hom observed for G VR1 (M,M). The results for G i(p) (M,M) shown in Table 2 reflect the changed sign in the covariance in that Δ F hom was less than Δ F drift . Unlike G VR2 (M,M), the constraint applied was only indirectly related to F drift or F hom and so the achieved rates were not expected to meet the target, although Δ F hom was close to the target for Panel M.

As with G i(p) (M,M) the simulated management for the measures based on homozygosity, G 0.5 (M,M) and G ROH (M,M), did not explicitly control F drift or F hom , However, Δ F hom was close to the desired target for G ROH (M,M) when measured in both Panels M and N. G ROH (M,M) showed a curvilinear time trend for F hom mainly due to a negative Δ F hom during the first few generations, after which it increased with time and was rising faster than G LA (M,M) at the end of the period; in contrast Δ F drift was approximately linear. The accelerating Δ F hom maybe caused by ROHs failing to accumulate inbreeding as haplotypes recombine, so reducing the length of IBD segments below the thresholds implicit in ROH methods, while this older inbreeding is captured by F hom . To test this, the minimum length of a contributing ROH was halved to ∼3.5 from ∼7 Mb but results were nearly identical to those shown in Table 3 (result not shown). G 0.5 (M,M) has the highest F drift , because it explicitly promotes allele frequency changes to intermediate frequencies for all loci.

www.frontiersin.org

Table 3. Genetic gain (and its SE) after 20 generations of selection expressed in initial genetic standard deviation units, and inbreeding measured by homozygosity for Panel N of neutral loci at generation 20 for comparison.

In contrast to all other schemes, Δ F drift for G LA (M,M) was within 2% of the target for both Panels M and N (see Table 2 ) but was below target for Δ F hom for both panels. The discrepancy for Δ F hom is complicated by the dynamic pattern of the number of parents selected in this scheme (see Figure 2 ), which results in the expected heterozygosity being close to that for random mating in early generations, but ∼0.005 less than random mating in later generation as a result of the degree of non-random mating introduced by the smaller number of parents. Therefore estimating Δ F hom from observed heterozygosity will underestimate the true value and explains a substantial part of the observed deviation from the target value of 0.005. Figure 4 shows G LA (M,M) was lowest for F drift and F hom in generation 20 with near constant rates. The results from AOC were qualitatively similar except that both Δ F hom and Δ F drift exceeded the target rates by 40% in both panels. This is due to the hitch-hiking of neutral loci with the changes in QTL frequencies arising from the LD generated within families and is unaccounted by using expectations of IBD based on pedigree.

Genetic Gain

Table 3 shows the genetic gains of the schemes achieved after 20 generations of selection and Figure 5 shows the gain achieved over time as a function of F drift and F hom for the neutral markers in Panel N. Figure 5 allows comparisons to be made at the same F drift or F hom and offsets, in part, the unequal rates of inbreeding observed among the different schemes.

www.frontiersin.org

Figure 5. Genetic gain, Gt plotted against inbreeding for generations 1–20, where inbreeding is transformed to a logarithmic scale by –log(1- F t ) for F hom (A) or F drift (B) . For ΔF = 0.005, the target after 20 generations is shown (–log(1- F t ) = 0.1).

The genetic gains were very similar (within 0.3%) for the schemes G VR2 (M,M) and G VR2 (M,D) where the latter differs only in using a second marker panel for inbreeding management which was unambiguously neutral. Given the small difference in their inbreeding rate at the neutral loci in Panel N ( Tables 2 , 3 ), this indicates that separate panels of markers for gain and for diversity is unnecessary for such schemes. The G LA (M,M) scheme yielded significantly more genetic gain than G VR2 (M,M), at lower F drift and F hom . G ROH (M,M) and A (M,∼) yielded substantially more gain, but their F drift was also higher. The A (M,∼) scheme yielded the highest genetic gain of all the schemes compared, but, compared to its closest competitors, G LA (M,M) and G ROH (M,M), it also yielded more F drift and/or F hom .

It is clear from Figure 5 that the ranking of the schemes for achieved gain differs according to whether drift or homozygosity is considered: e.g., G ROH (M,M) and G i(p) (M,M) schemes yielded relatively high gains given F hom , but relatively low gains given F drift , whereas G VR2 (M,M) schemes yielded opposite results with low gains for F hom and relatively high for F drift . The gain for the G ROH (M,M) scheme in early generations was accompanied by negative F hom ( Figure 5A ). G LA (M,M) and A (M,∼) schemes performed relatively well as shown in both plots of Figure 5 , with G LA (M,M) schemes seeming to yield in both plots slightly more gain per unit of inbreeding than A (M,∼). Although, the A (M,∼) gain is high relative to its inbreeding, the inbreeding rates were substantially larger than the target rate (which can be seen from Figure 5 by the curves extending far beyond the target). The G LA (M,M) scheme achieves the target rate of inbreeding closely for Δ F hom and Δ F drift ( Table 2 ), and simultaneously converts inbreeding efficiently into genetic gain. Moreover, when testing genetic gains in generation 20 of the G LA (M,M) schemes to interpolated gains at the same overall inbreeding (average of F hom and F drift ) of the A (M,∼) and G ROH (M,M) schemes, the G LA (M,M) scheme yielded the highest gain in 65, respectively, 62 out of 100 replicates; i.e., generation 20 gains of G LA (M,M) were significantly higher than those of A (M,∼) and G ROH (M,M) ( P < 0.01) at the same averaged inbreeding level.

Number of Parents

Figure 2 shows the number of selected parents across the generations and shows that the schemes that use IBD based relationship matrices ( A , G LA ) and G ROH select most parents. The selected number of parents for G ROH (M,M) may be artificially large due to the additions to the leading diagonal of G ROH (on average 8.7) to make it positive definite. This process made the G ROH matrix diagonally dominant, and so reducing c ’ G ROH c is driven by selecting more parents in order to reduce the impact of these diagonal elements and not about avoiding the selection of related animals. Non-positive definite G ROH matrices could be inverted to obtain optimal solutions c , but these yielded much too high rates of inbreeding (result not shown) probably because optimal contributions c were found that resulted in negative c ’ G ROH c , which does not make sense and inbreeding was high and positive. Schemes using matrices constructed by the methodology of VanRaden (2008) ( G VR1 , G VR2 , G i(p) , and G 0.5 ) select fewest parents, implying that they are able to select relatively less related parents by their respective measure, and differences in relationships are relatively large in their respective matrices. Comparing results from Table 2 and Figure 2 suggests that the selection of relatively few parents is achieved by making use of the opportunities to induce covariances between allele-frequency-changes and initial frequencies that these schemes offer, which in turn affect the frequencies of heterozygotes.

Genetic Variance

Figure 6 shows the genetic variance for the trait calculated from the true breeding values of the individuals. The G 0.5 (M,M) scheme loses substantial genetic variance at an early stage, and this relatively low genetic variance is maintained throughout the 20 generations of selection. Therefore striving for allele frequencies of 0.5 at the loci in Panel M does not maintain variation at the QTL in Panel Q, which is in accord with the results for Panel N in Table 2 . The relatively low variance for A( M,∼) at generation 20 is a consequence of it relatively high genetic gain combined with its relative high rates of inbreeding. By generation 20, the G LA (M,M) scheme has lost least genetic variance, due to its rates of inbreeding not exceeding the target, and may explain why the G LA (M,M) scheme is very efficient in turning inbreeding into gain at the end of the selection period ( Figure 5 ).

www.frontiersin.org

Figure 6. The trait genetic variance of the individuals plotted over time.

Equivalence of Measures F hom and F drift

In the classical work of Wright (1922) two natural measures of inbreeding were introduced concerned with the extent of drift on the one hand (here represented by F drift and Δ F drift ) and heterozygosity on the other (here represented by F hom and Δ F hom ), and in classical theory with neutral loci unlinked to QTL these perspectives were identical and directly linked to the occurrence of IBD. The results of this study show that these measures of inbreeding can differ substantially in genomic optimum contribution schemes even when there are no QTL in the genome [ G VR2 (∼,M); Table 2 ]. This is because the management in these schemes is commonly directed at the observed homozygosity or drift of the marker loci being monitored. For example, schemes that limit the rate of increase of homozygosity (as represented here by G 0.5 ) induce a negative covariance between the change in allele frequency and the initial frequency, as an excess of minor alleles compared to classical expectations move toward intermediate levels. Conversely schemes managing drift and limiting changes in allele frequency (e.g., using G V R 2 ) induce a positive covariance between change in allele frequency and the initial frequency, as an excess of minor alleles tend to move toward the nearest extreme. Consequently, systematic discrepancies occur between Δ F drift and Δ F hom . These discrepancies are a property of the inbreeding management and not of selection per se , as they were unaffected by whether random GEBVs were used in the scheme or separate panels of SNPs were used for generating GEBV and management of inbreeding. In contrast to the management using the IBS allele frequencies of monitored markers, when IBD was used either via genomics information ( G LA ) or approximately ( A , uninfluenced by markers) the equivalence of Δ F drift and Δ F hom was re-established in the simulations, although not with G ROH which is targeted toward IBD but is based on the homozygosity of haplotypes.

The origin of these covariances between allele frequency changes and initial frequencies can be seen when considering the form of the relationship matrix and is explored in detail in Supplementary Information 1 . The negative covariance arising from G 0.5 explicitly measures allele frequencies as deviations from 0.5, not from the base frequency p 0, k and consequently gains in this measure of diversity (but not necessarily IBD, as discussed later) are obtained by moving frequencies toward 0.5 offsetting any opposing changes prompted by selection objectives. The positive covariance, for example with G V R 2 , arises because drift of an allele to the more distant extreme is more heavily penalized compared to completely random drift as the GOC with G V R 2 is constraining the square of the change. This will inevitably promote shifts to the nearest extreme, and more strongly so as p 0 deviates more from ½. Since G V R 1 is a re-weighting of the loci in G V R 2 by w k /Σ l o c i k w k for locus k , where w k = 2 p 0, k (1− p 0, k ), placing more weight on frequency changes for loci initially closer to ½, it would be expected the discrepancy between F drift and F hom would be less for G V R 1 than G V R 2 as observed in the simulations (see Table 2 and Figure 4 ). Moving to management using the total intensity applied over time ( G i ( p ) ) penalizes deviations that move toward the extremes more heavily than those toward intermediate frequencies (as d i / d p = [ p (1− p )] −1/2 ; Liu and Woolliams, 2010 ), and this changed the sign of the discrepancy although its magnitude was decreased compared to G V R 2 .

G V R 2 , which was used by Sonesson et al. (2012) , controlled Δ F drift and met the target for the panel used (see Table 2 ) but Δ F hom was much greater due to the covariance discussed above. This agreed with the findings of de Beukelaer et al. (2017) , where it was suggested that the covariance between change in frequency and its initial value could be the cause of this. However, these authors also reset the allele frequencies for the reference population in the G VR1 matrix every generation to the current generation frequencies, which implies that changes in allele frequency in each generation are constrained without reference to their accumulated change over earlier generations. In a continuous selection scheme, the allele frequency changes of successive generations are positively correlated; thus, although the variance of the change in allele frequency within a generation may have been on target, the variance of the cumulative allele frequency change over generations will exceed the target value due to these positive correlations, as observed in their study. This distinction in methodology will have affected all findings on GOC in the study of de Beukelaer et al. (2017) .

Sonesson et al. (2012) found that G V R 2 schemes achieved their target rate of inbreeding based on IBD using loci with 2N alleles scattered across the genome. Details of the founder populations used in their study were presented in Sonesson and Meuwissen (2009) , which revealed that their SNP-BLUP marker panel was selected for intermediate frequencies in order to mimic a typical SNP-chip marker panel. This is very different from the SNP-BLUP panel used here which was a random sample of whole genome sequence data, and hence dominated by extreme allele frequencies ( Figure 1 ). The strength of the covariance underlying the discrepancy between F drift and F hom depends on the distribution of ( p 0 - 1 2 ) , and so in Sonesson et al. (2012) any discrepancy would have been much reduced. In the context of the current results, it was most similar to using G V R 1 where the intermediate loci are more heavily weighted. Conclusions from these considerations are (i) that the discrepancies between the different measures of rates of inbreeding are extreme in WGS data, due to their extreme allele frequencies ( Figure 1 ); and (ii) the discrepancies are a property of the panel used to manage diversity and not the remaining loci, as the IBD-alleles used by Sonesson et al. (2012) have low MAF by construction. Hence, for typical SNPs from chips, the discrepancies between F drift and F hom are expected to be present but smaller than those in Table 2 .

Management of Diversity

An important aspect of a tool to manage diversity is that it is predictable in meeting its targets, and this can be examined for the marker panel, for the unmanaged neutral markers, and for F drift and F hom . In this respect, G VRn meets the target but only for F drift and only in the marker panel (i.e., not in the unmanaged panel) whereas G LA meets the target (with only minor deviations) for both F drift and F hom for both panels. All others failed to meet the target rate to a greater or lesser degree and would need to be calibrated, possibly in every generation, to meet the targets set at neutral loci. In practice, this would require as realistic as possible simulations of the practical breeding scheme using the current situation as a starting point.

A key management objective in breeding schemes is the efficient generation of gain from the genetic variance in the objectives, and conserving the variation at the (currently) neutral loci, and here the IBD-related schemes were best when compared to F drift or F hom of neutral loci. On an average of F drift and F hom , G LA was more efficient than G ROH , which gave different rates for Δ F hom and Δ F drift , would require regular calibration, and (in the current implementation following de Cara et al., 2013 ) always required very large number of parents, which in practice would usually demand additional scheme resources. Henryon et al. (2019) observed that using A appeared to be more efficient than using G V R 2 , and this was confirmed here. The differences between schemes using G LA and A were small when plotted against F drift or F hom but the G LA scheme was the only scheme tested here that combined high efficiency with rates of inbreeding close to and not exceeding the target rate of inbreeding of 0.005. This supports the conclusion of Sonesson et al. (2012) that genomic selection requires genomic control.

One consequence of entering the genomics era is that the meaning of diversity and its management in practice is more open to discussion, as the pedigree is no longer the only tool to measure and manage it. For example, the number of polymorphic loci could be used as a measure, which might underpin major concerns over the disappearance of known rare alleles in the scheme. Further, in the pedigree inbreeding framework, the measure used is the fraction of variance that is expected to have been lost from the reference base. In the genomic era, if the measure is simply defined as the genetic variance defined by IBS and maximized, there is scope for increasing diversity by the directional selection of loci toward intermediate frequencies as an objective. These measures have been explored elsewhere (see Howard et al., 2017 for a review). In general, attaching values (e.g., selection index weights) to genetic diversity is a very difficult task (e.g., Brisbane and Gibson, 1994 ; Wray and Goddard, 1994 ; Goddard, 2009 ; Jannink, 2010 ; Howard et al., 2017 ), which becomes especially clear in view of the aforementioned goals of diversity management, where diversity is required at many (hypothetical) traits simultaneously. Breeders have generally more of an idea about their target rate of inbreeding than on what weight to give to a diversity measure. Although the actual choice of the target rate of inbreeding remains somewhat arbitrary, guidelines have been developed over the years ( Woolliams et al., 2015 , for a review).

Here, it is argued that an over-riding objective for many populations such as livestock or zoo populations, beyond the breeding goals that underlie the selection on the EBV, is to manage over time the risks associated with the unmeasured attributes of a reference population (e.g., unrecognized deleterious recessives, drift in desirable holistic qualities, epistatic variance). In this respect, all approaches used in this study refer back directly to the established reference (base) population. As mentioned above, other perspectives may be advanced such as increasing the genetic variance at neutral loci by increasing heterozygosity (e.g., de Beukelaer et al., 2017 ). This could be achieved by the promotion of allele frequency changes toward intermediate values, as exemplified by G 0.5 in this study, however, this raises issues that require further consideration. Firstly, changes in allele frequency result from multiple copies of a subset of base generation alleles, so increasing frequency is promoting IBD based inbreeding (it is analogous to changing QTL frequency). Secondly, if carried out with a marker panel, then increasing heterozygosity of the marker loci does not necessarily increase heterozygosity among unmonitored neutral loci, which is the objective. In these simulations, the near avoidance of overall loss of heterozygosity in the marker panel by GOC 0.5 during selection was accompanied by much greater drift and more loss of heterozygosity in the unmonitored neutral loci than was achieved using IBD based inbreeding management. In contrast, the use of IBD in G LA has information on the unobserved heterozygosity and drift across all the unmonitored genome positions. It remains only a hypothesis that the management of heterozygosity and drift using IBS might perform better than IBD when WGS sequence data is available, with or without selection, although some studies have considered its use ( Eynard et al., 2015 , 2016 ; Gómez-Romano et al., 2016 ). The question how to weigh F hom and F drift across all loci in the genome when a key objective is to manage unknown or unmonitored risks remains open.

While this study has focused on schemes where loss of genetic diversity is managed next to the maximization of genetic gain, other schemes may be pure conservation schemes, where no genetic change (gain) is desired, but the goals for genetic management are the same; i.e., conserve genetic variation, avoid inbreeding depression, avoid the occurrence of recessive diseases, and avoid random changes in phenotypic traits related to drift from a valued reference population. Strictly, with pure random selection, drift and homozygosity based inbreeding are expected to be the same [Eq. (2); and Falconer and Mackay, 1996 ]. However, minimisation of allele frequency changes or minimisation of loss of heterozygosity based on using IBS may still result in discrepancies between drift and homozygosity based inbreeding measures arising from the covariances described above. In fact, the potential covariance between the change in allele frequency and the initial frequency is expected to increase, since the inbreeding management term is more important in pure conservation schemes. This would also hold for GOC schemes with selection that aim for an N e higher than our goal of N e = 100. The greater potential for discrepancy argues for the use of IBD-based measures of relationship ( G LA , or a more conservative use of A ) to maintain diversity in such genetic conservation schemes.

The approach adopted here has not favored genetic variation at some neutral loci more than others a priori . Of course, a weighted genomic relationship matrix could be implemented and/or the multiple relationship matrices and associated constraints could be used to simultaneously control the genomic variation in different types of loci ( Dagnachew and Meuwissen, 2016 ; Gómez-Romano et al., 2016 ). For example, a general G matrix covering the entire genome, and an additional G matrix controlling genetic diversity at e.g., the major histocompatibility complex, which is essential to the immune response of the animals. Alternatively, regions of the genome may be sought where average heterozygosity is to be increased (reduced) under the assumption that diversity is especially (or not) important in these regions. Regions with known recessive defects may be prioritized for diversity management, but direct inclusion of the known defects in the breeding goal seems more effective in controlling their frequencies. In practice, such regions with special emphasis for diversity management would need to be known a priori , and may only be effective if WGS was used for the relationships because, as shown here, what happens in a sample of loci does not necessarily predict what happens at loci outside that subset. Causative alleles of quantitative traits are quite evenly distributed across the genome ( Wood et al., 2014 ), and as argued here the main goals of diversity management address many anonymous, unknown loci and hypothetical traits simultaneously, which makes it very hard to achieve a worthwhile prioritization of genomic regions for diversity management.

• Contrary to classic inbreeding theory, inbreeding of unmanaged neutral loci as measured by drift ( F drift ) and by homozygosity ( F hom ) can differ very substantially, due to a covariance between the change in allele frequency and its initial frequency, leading to non-zero expected changes in frequency of a sign and magnitude determined by the initial frequency. Discrepancy between F drift and F hom occurs when inbreeding management is based on genomic relationship matrices (or similarity matrices) derived using IBS, but not when derived using IBD, which acts as a unifying concept for F drift and F hom .

• The covariance generated is expected to be larger for WGS data where allele frequencies are extreme with typical MAF close to 0, than for SNP (chip) panels where allele frequencies are generally closer to ½.

• The (genomic) selection component of OC schemes does not cause the difference between F drift and F hom .

• Using the same or a different panel for estimating GEBVs than for management of diversity in OC schemes makes only very small differences to genetic gain and the inbreeding in unmonitored neutral loci.

• Measures of genomic relationship can be classified as those based on changes in allele frequency change (e.g., G VR2 ) and directed at F drift ; those based on homozygosity (e.g., G 0.5 ) and directed at F hom ; and IBD based (e.g., G LA ); or combinations of these (e.g., G ROH ). The choice of the relationship matrix depends very much on what objective it should serve.

• OC schemes that limit F drift directly limit allele frequency changes, such as those using G VR2 , result in low Δ F drift at the expense of high Δ F hom . Schemes using G VR1 will be less extreme in this than G VR2 .

• OC schemes that limit Δ F hom (e.g., using G 0.5 ), result in very low Δ F hom at the expense of high Δ F drift but both F hom and F drift may exceed targets at unmonitored neutral loci.

• The OC scheme using G LA , an IBD based relationship matrix, was the only scheme investigated here that managed homozygosity and drift based inbreeding within the target rate of 0.5%, yielding an effective population size ∼100; for all other schemes, either Δ F drift or Δ F hom or both exceeded their target.

• The OC scheme using G LA yielded the highest gain per unit of inbreeding across both measures of inbreeding, closely followed by the scheme using A . The latter yielded high gain per unit of F but grossly exceeds target rates of inbreeding.

• The use of G LA in practice requires the development of fast algorithms for its calculation.

Data Availability Statement

The datasets generated for this study are available on request to the corresponding author.

Author Contributions

TM contributed to study design, performed the simulations, and wrote the draft manuscript. AS developed the simulation software and contributed to discussions and the writing of the manuscript. GG contributed to discussions and the writing of the manuscript. JW contributed to study design, alternative schemes and methods, and discussions and writing of the manuscript. All authors approved the final version of the manuscript.

We are grateful for funding from the Norwegian Research Council (Grant 226275/E40). JW would like to acknowledge funding from the European Commission under Grant Agreement 677353 (IMAGE) and BBSRC Institute Strategic Programe BBS/E/D/30002275.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank three reviewers for their very helpful comments.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020.00880/full#supplementary-material

Brisbane, J. R., and Gibson, J. P. (1994). Balancing selection response and rate of inbreeding by including genetic relationships in selection decisions. World Congr. Genet. Appl. Livest. Prod. 19:135.

Google Scholar

Charlier, C., Coppieters, W., Rollin, F., Desmecht, D., Agerholm, J. S., Cambisano, N., et al. (2008). Highly effective SNP-based association mapping and management of recessive defects in livestock. Nat. Genet. 40, 449–454. doi: 10.1038/ng.96

PubMed Abstract | CrossRef Full Text | Google Scholar

Dagnachew, B. S., and Meuwissen, T. H. E. (2016). A fast iterative algorithm for large scale optimal contribution selection. Gen. Sel. Evol. 48:70.

de Beukelaer, H., Badke, Y., Fack, V., and deMeyer, G. (2017). Moving beyond managing realized genomic relationship in long-term genomic selection. Genetics 206, 1127–1138. doi: 10.1534/genetics.116.194449

de Cara, M. A. R., Villanueva, B., Toro, M. A., and Fernández, J. (2013). Using genomic tools to maintain diversity and fitness in conservation programmes. Mol. Ecol. 22, 6091–6099. doi: 10.1111/mec.12560

Eynard, S. E., Windig, J. J., Hiemstra, S. J., and Calus, M. P. (2016). Whole-genome sequence data uncover loss of genetic diversity due to selection. Genet. Sel. Evol. 48:33.

Eynard, S. E., Windig, J. J., Leroy, G., van Binsbergen, R., and Calus, M. P. (2015). The effect of rare alleles on estimated genomic relationships from whole genome sequence data. BMC Genet. 16:24. doi: 10.1186/s12863-015-0185-0

Falconer, D. S., and Mackay, T. F. C. (1996). Introduction To Quantitative Genetics. Harlow: Pearson Education Limited.

Fernandez, J., Villanueva, B., Pong-Wong, R., and Toro, M. A. (2005). Efficiency of the use of pedigree and molecular marker information in conservation programs. Genetics 170, 1313–1321. doi: 10.1534/genetics.104.037325

Fernando, R. L., and Grossman, M. (1989). Marker assisted selection using best linear unbiased prediction. Gen. Sel. Evol. 21, 467–477.

Goddard, M. E. (2009). Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136, 245–257. doi: 10.1007/s10709-008-9308-0

Gómez-Romano, F., Villanueva, B., Fernández, J., Woolliams, J. A., and Pong-Wong, R. (2016). The use of genomic coancestry matrices in the optimisation of contributions to maintain genetic diversity at specific regions of the genome. Genet. Sel. Evol. 48:2.

Henryon, M., Liu, H., Berg, P., Su, G., Nielsen, H. M., Gebregewergis, G. T., et al. (2019). Pedigree relationships to control inbreeding in optimum-contribution selection realise more genetic gain than genomic relationships. Genet. Sel. Evol. 51:39.

Holsinger, K. E., and Weir, B. S. (2009). Genetics in geographically structured populations: defining, estimating and interpreting F(ST). Nat. Rev. Genet. 10, 639–650. doi: 10.1038/nrg2611

Howard, J. T., Pryce, J. E., Baes, C., and Maltecca, C. (2017). Invited review: inbreeding in the genomics era: Inbreeding, inbreeding depression, and management of genomic variability. J. Dairy Sci. 100, 6009–6024. doi: 10.3168/jds.2017-12787

Jannink, J. L. (2010). Dynamics of long-term genomic selection. Genet. Sel. Evol. 42:35.

Keller, M. C., Visscher, P. M., and Goddard, M. E. (2011). Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics 189, 237–249. doi: 10.1534/genetics.111.130922

Kinghorn, B. P. (1980). The expression of recombination loss in quantitative traits. J. Anim. Breed. Genet. 97, 138–143. doi: 10.1111/j.1439-0388.1980.tb00919.x

CrossRef Full Text | Google Scholar

Legarra, A. (2016). Comparing estimates of genetic variance across different relationship models. Theor. Popul. Biol. 107, 26–30. doi: 10.1016/j.tpb.2015.08.005

Leinster, T., and Cobbold, C. A. (2012). Measuring diversity: the importance of species similarity. Ecology 93, 477–489. doi: 10.1890/10-2402.1

Li, C. C., and Horvitz, D. G. (1953). Some methods of estimating the inbreeding coefficient. Am. J. Hum. Genet. 5, 107–117.

Liu, A. Y., and Woolliams, J. A. (2010). Continuous approximations for optimizing allele trajectories. Genet. Res. 92, 157–166. doi: 10.1017/s0016672310000145

Luan, T., Yu, X., Dolezal, M., Bagnato, A., and Meuwissen, T. H. (2014). Genomic prediction based on runs of homozygosity. Genet. Sel Evol. 46:64. doi: 10.1016/j.cancergen.2018.04.038

McQuillan, R., Leutenegger, A. L., Abdel-Rahman, R., Franklin, C. S., Pericic, M., Barac-Lauc, L., et al. (2008). Runs of homozygosity in European populations. Am. J. Hum. Genet. 83, 359–372.

Meuwissen, T. H. E. (1997). Maximizing the response of selection with a pre-defined rate of inbreeding. J. Anim. Sci. 75, 934–940.

Meuwissen, T. H. E., and Goddard, M. E. (2010). The use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole-genome sequence density genotypic data. Genetics 185, 1441–1449. doi: 10.1534/genetics.110.113936

Meuwissen, T. H. E., Hayes, B. J., and Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829.

Meuwissen, T. H. E., Luan, T., and Woolliams, J. A. (2011). The unified approach to the use of genomic and pedigree information in genomic evaluations revisited. J. Anim. Breed. Genet. 128, 429–439. doi: 10.1111/j.1439-0388.2011.00966.x

Pong-Wong, R., and Woolliams, J. A. (2007). Optimisation of contribution of candidate parents to maximise genetic gain and restricting inbreeding using semidefinite programming. Genet. Sel. Evol. 39, 3–25.

Powell, J. E., Visscher, P. M., and Goddard, M. E. (2010). Reconciling the analysis of IBD and IBS in complex trait studies. Nat. Rev. Genet. 11, 800–805. doi: 10.1038/nrg2865

Robertson, A. (1965). The interpretation of genotypic ratios in domestic animal populations. Anim. Prod. 7, 319–324. doi: 10.1017/s0003356100025770

Rodríguez-Ramilo, S. T., Fernández, J., Toro, M. A., Hernández, D., and Villanueva, B. (2015). Genome-wide estimates of coancestry, inbreeding and effective population size in the Spanish Holstein population. PLoS One 10:e0124157. doi: 10.1371/journal.pone.0124157

Sonesson, A. K., and Meuwissen, T. H. E. (2009). Testing strategies for genomic selection in aquaculture breeding programs. Genet. Sel. Evol. 41:37.

Sonesson, A. K., Woolliams, J. A. W., and Meuwissen, T. H. E. (2012). Genomic selection requires genomic control of inbreeding. Genet. Sel. Evol. 44:27.

Toro, M. A., Silio, L., Rodriganez, J., and Rodriguez, C. (1998). The use of molecular markers in conservation programmes of live animals. Genet. Sel. Evol. 30:585. doi: 10.1186/1297-9686-30-6-585

Toro, M. A., Villanueva, B., and Fernandez, J. (2014). Genomics applied to management strategies in conservation programmes. Livestock Sci. 166, 48–53. doi: 10.1016/j.livsci.2014.04.020

VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423. doi: 10.3168/jds.2007-0980

Villanueva, B., Pong-Wong, R., Fernandez, J., and Toro, M. A. (2005). Benefits from marker-assisted selection under an additive polygenic genetic model. J. Anim. Sci. 83, 1747–1752. doi: 10.2527/2005.8381747x

Wang, J. (2001). Optimal marker-assisted selection to increase the effective size of small populations. Genetics 157, 867–874.

Wood, A. R., Esko, T., Yang, J., Vedantam, S., Pers, T. H., Gustafsson, S., et al. (2014). Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186.

Woolliams, J. A., Berg, P., Dagnachew, B. S., and Meuwissen, T. H. E. (2015). Genetic contributions and their optimization. J. Anim. Breed. Genet. 132, 89–99. doi: 10.1111/jbg.12148

Wray, N. R., and Goddard, M. E. (1994). Increasing long term response to selection. Genet. Sel. Evol. 26:431. doi: 10.1186/1297-9686-26-5-431

Wright, S. (1922). Coefficients of inbreeding and relationships. Amer. Nat. 56, 330–338.

Keywords : inbreeding, genetic drift, optimum contribution selection, genetic diversity, genomic relationships, genetic gain

Citation: Meuwissen THE, Sonesson AK, Gebregiwergis G and Woolliams JA (2020) Management of Genetic Diversity in the Era of Genomics. Front. Genet. 11:880. doi: 10.3389/fgene.2020.00880

Received: 16 May 2019; Accepted: 17 July 2020; Published: 13 August 2020.

Reviewed by:

Copyright © 2020 Meuwissen, Sonesson, Gebregiwergis and Woolliams. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Theo H. E. Meuwissen, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Share full article

A graph showing the new genome map has 18 rows of 12 color bands, with each band representing a single nucleotide variant. They are colored in orange, purple, pink, blue, yellow, green, dark blue, and other colors.

Scientists Unveil a More Diverse Human Genome

The “pangenome,” which collated genetic sequences from 47 people of diverse ethnic backgrounds, could greatly expand the reach of personalized medicine.

The new pangenome reference resembles a corn maze, with alternative paths and side trails that allow scientists to explore a broader range of the genetic diversity found in people the world over. Credit... Darryl Leja/NHGRI

Supported by

By Elie Dolgin

  • Published May 10, 2023 Updated May 12, 2023

More than 20 years after scientists first released a draft sequence of the human genome, the book of life has been given a long-overdue rewrite.

A more accurate and inclusive edition of our genetic code was published on Wednesday, marking a major step toward a deeper understanding of human biology and personalized medicine for people from a wide range of racial and ethnic backgrounds.

Unlike the previous reference — which was largely based on the DNA of one mixed-race man from Buffalo, with inputs from a few dozen other individuals, mostly of European descent — the new “pangenome” incorporates near-complete genetic sequences from 47 men and women of diverse origins, including African Americans, Caribbean Islanders , East Asians, West Africans and South Americans.

The revamped genome map represents a crucial tool for scientists and clinicians hoping to identify genetic variations associated with disease. It also promises to deliver treatments that can benefit all people, regardless of their race, ethnicity or ancestry, researchers said.

“It’s been long needed — and they’ve done a very good job,” said Ewan Birney, a geneticist and deputy director general of the European Molecular Biology Laboratory, who was not involved in the effort. “This will improve our fine-grained understanding of variation, and then that research will open new opportunities toward clinical applications.”

Powered by the latest in DNA sequencing technology , the pangenome collates all 47 unique genomes into a single resource, providing the most detailed picture yet of the code that powers our cells. Gaps in the earlier reference are now filled, with nearly 120 million previously missing DNA letters added to the three-billion-letter-long code.

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

Advertisement

Frontiers for Young Minds

  • Download PDF

What Is Genetic Diversity and Why Does it Matter?

research articles on genetic diversity

All living things on Earth contain a unique code within them, called DNA. DNA is organised into genes, similar to the way letters are organised into words. Genes give our bodies instructions on how to function. However, the exact DNA code is different even between individuals within the same species. We call this genetic diversity. Genetic diversity causes differences in the shape of bird beaks, in the flavours of tomatoes, and even in the colour of your hair! Genetic diversity is important because it gives species a better chance of survival. However, genetic diversity can be lost when populations get smaller and isolated, which decreases a species’ ability to adapt and survive. In this article, we explore the importance of genetic diversity, discuss how it is formed and maintained in wild populations, how it is lost and why that is dangerous, and what we can do to conserve it.

Why is Everything and Everyone A Little Bit Different?

Earth contains millions of different species that all look different from one another. While some species look more similar to each other than others, like lions and tigers, they will still have differences between them. Even within each species, individuals look similar to each other but they are not identical. These differences and similarities are because of many small differences between individuals’ genes . All organisms have DNA and each individual’s DNA is organised into genes. These contain the instructions to build our bodies. This is similar to the way that letters are combined to make words that then make a story. DNA can be seen as the letters, genes the words, and their instructions are the story. Small differences in DNA might change blue eyes to green, or a butterfly’s wings from black to white, like how a word can change when you replace a letter.

The combined differences in the DNA of all individuals in a species make up the genetic diversity of that species. Genetic diversity causes individuals to have different characteristics, which we can see even in our groceries. Although all tomatoes belong to the same species, the tomatoes we eat are hugely diverse, ranging from giant beefeater tomatoes to tiny cherry tomatoes. There are also hundreds of apple varieties ( Figure 1 ), that range from red to green, tart to sweet, and some apples even have pink flesh inside! Genetic diversity is what makes these types of tomatoes and apples look so different [ 1 ]. Genetic diversity is also seen in animals. For example, dogs can be large enough to pull sledges or small enough to sit nicely on your lap. All dogs are from the same species, but they look different because of genetic diversity! Though often more difficult to see, genetic diversity is also extremely important in wild animals and plants.

Figure 1 - An example of genetic diversity in the food we eat.

  • Figure 1 - An example of genetic diversity in the food we eat.
  • All these apples are one species. Different alleles of the genes that control their colour cause the apples to be green, yellow, red, or almost purple. Differences in the alleles that control flavour make each type taste different.

How is Genetic Diversity Generated?

Changes to an individual’s DNA are called mutations ( Figure 2 ). Mutations can arise when mistakes are made while cells are copying DNA, like making a spelling mistake when copying a word. These mutations make up a species’ genetic diversity. Over generations, more and more mistakes are made, leading to more mutations. Most mutations are either harmful or have no impact at all, but sometimes these mutations can cause changes that are helpful for a species. The individuals that have these helpful mutations might have greater chances of survival, and have more babies as a result [ 2 ]. This is adaptation . When a mother and a father have babies, the DNA of their baby is a mix of the parents’ DNA. Babies have two copies of every gene in their DNA, one from each parent. Copies of the same gene with different mutations are called alleles . When parents make a sperm or an egg, alleles in each parent are shuffled and recombined, and only one allele of a gene ends up in each sperm or egg cell. When the reshuffled alleles from a mother and a father are combined when sperm and eggs join, new mixes of alleles are created in the babies [ 2 , 3 ]. The mixing of alleles allows for new combinations of mutations and characteristics, adding to a species’ genetic diversity ( Figure 2 ).

Figure 2 - (A) Genetic diversity is generated when mutations create new alleles over time.

  • Figure 2 - (A) Genetic diversity is generated when mutations create new alleles over time.
  • Mixing alleles from parents creates new combinations of alleles in their babies. Organisms that can clone themselves, like bacteria, can pass alleles to each other. Each coloured dot represents a different allele. (B) Genetic diversity can be lost when habitat loss divides populations or when buildings or highways isolate populations. (C) Creating protected areas where individuals from different populations can migrate and spread their genes can help a species to maintain its genetic diversity.

Not all species need a mother and a father to make a baby. Bacteria can clone themselves ( Figure 2 ) and directly pass their alleles from a parent to its identical clone [ 3 ]. Any mistakes in the parent’s DNA will be passed on to the clone. Amazingly, bacteria can also give alleles to each other, even if they are not related! This is a unique way simple species like bacteria can increase their genetic diversity, without relying on the mixing of alleles between a mother and a father [ 4 ].

Why is Genetic Diversity Important?

When a species has a lot of differences in its DNA, we say that genetic diversity is high [ 2 ]. In species with high genetic diversity, there are lots of mutations in the DNA, which cause differences in the way individuals look as well as differences in important traits that we cannot see [ 2 ]. This is called adaptation . For example, some types of apples can grow better in hotter environments, thanks to their genes. The variety of characteristics in species with high genetic diversity means they are more likely to successfully cope with changes in their environment. A great example of this is seen in the peppered moths during the industrial revolution [ 4 ]. Natural genetic diversity in peppered moths produced different wing colours, ranging from light to dark. Before the Industrial Revolution, peppered moths with light wings were more common because they had the best camouflage on white tree trunks. The Industrial Revolution caused a lot of air pollution that started to cover tree trunks, making them black. Light-winged moths were no longer camouflaged and were easy prey for birds. But dark-winged individuals were now hidden! This meant that dark moths had an advantage and were more likely to live long enough to have babies. The babies of dark moths were also dark because of the alleles they inherited from their parents, so they were also more likely to survive. The dark moths had higher fitness and became more common as a result [ 4 ].

What Happens When Genetic Diversity is Low?

When few mutations are found in the DNA of a species, genetic diversity is said to be low [ 2 ]. Low genetic diversity means that there is a limited variety of alleles for genes within that species and so there are not many differences between individuals. This can mean that there are fewer opportunities to adapt to environmental changes. Low genetic diversity often occurs due to habitat loss. For example, when a species’ habitat is destroyed or broken up into small pieces, populations become small. Small, fragmented populations can lead to loss of genetic diversity because fewer individuals can survive in the remaining habitat so fewer individuals breed to pass on their alleles. In small populations, the choice of mates is also limited. Over time, individuals will all become related and will be forced to mate with relatives. This is inbreeding . Inbred animals often have two identical alleles for their genes because the same gene was passed on from both parents. If this allele has harmful mutations, an inbred baby can be unhealthy. This is called inbreeding depression [ 2 ].

If genetic diversity gets too low, species can go extinct and be lost forever. This is due to the combined effects of inbreeding depression and failure to adapt to change. In such cases, the introduction of new alleles can save a population. This is called genetic rescue [ 2 ]. In the 1990s conservation scientists had to use genetic rescue to save the Florida panther, which was threatened by extinction due to low genetic diversity ( Figure 3 ) [ 5 ]. Very few Florida panthers remained and their genetic diversity was extremely low. Many Florida panther babies were sick because of inbreeding depression. A closely related panther with high genetic diversity was present in Texas. Texan panthers were moved to Florida to have babies with the Florida panthers. This increased genetic diversity because of the mixing of alleles we spoke about before. Soon after the Texan panthers arrived, many healthy kittens were born [ 5 ].

Figure 3 - (A) The Florida panther was once widespread, with high genetic diversity.

  • Figure 3 - (A) The Florida panther was once widespread, with high genetic diversity.
  • (B) Hunting and habitat loss reduced population size and resulted in very low genetic diversity and inbreeding. (C) Eight female panthers from Texas were moved to Florida to breed with Florida panthers. (D) When the Texas and Florida panthers bred, new alleles were introduced into the population, helping the Florida panther population become bigger and healthier over time.

What’s Happening to Genetic Diversity Around the World?

We hear a lot about the loss of species in the world, but we are also seeing a loss of genetic diversity within species. The increasing number of people on Earth and our increasing use of natural resources has reduced space and resources for wild species. Over time, many wild animal and plant populations have become smaller or more isolated. Many species have also gone through local extinctions. This has led to a global loss of genetic diversity. Scientists think that the genetic diversity within species may have declined by as much as 6% globally since the Industrial Revolution [ 6 ]. This means that many species are less able to adapt when facing new challenges, like climate change, pollution, and new diseases. If too much genetic diversity is lost, more and more species could become unhealthy and in need of conservation actions similar to the Florida panther. However, there are steps we can take to conserve and restore genetic diversity across many species.

How Do We Stop Genetic Diversity Loss?

We must preserve and protect genetic diversity. This can be done through the conservation of our remaining wild populations [ 2 ]. We can use nature reserves and wildlife bridges to reconnect wild populations that have become separated by our cities and highways. We can also restore habitats, because this will allow wild populations to get bigger. Sometimes we can even remove harmful stressors and pests so that populations can naturally regrow. We can also reintroduce species that have been lost from habitats they used to live in. Taken together, these strategies can help stop genetic diversity loss. It is important to protect genetic diversity because it is the foundation for healthy species. Healthy species are necessary for human health and for the health of the whole planet!

Gene : ↑ A section of DNA that contains the instructions for a trait.

Genetic Diversity : ↑ The overall diversity in the DNA between the individuals of a species.

Mutation : ↑ A change in an organism’s DNA. This can be a change of a single letter or a much bigger change of hundreds of letters at once.

Adaptation : ↑ The process of a species changing in order to better survive in its environment.

Alleles : ↑ Different variations of a gene caused by mutations. Many species have two alleles for every gene, one copy from each parent.

Inbreeding : ↑ Breeding between closely related individuals. Inbreeding often happens when populations are small and there are few options for mating. Inbred individuals are usually less healthy.

Inbreeding Depression : ↑ Inbred individuals share ancestors and are more likely to have identical copies of genes. If these genes contain harmful mutations, they will be expressed and cause lower health of inbred individuals.

Genetic Rescue : ↑ A conservation strategy, new individuals are moved into a population to increase genetic diversity and improve population health.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

[1] ↑ Meyer, R., and Purugganan, M. 2013. Evolution of crop species: genetics of domestication and diversification. Nat. Rev. Genet. 14:840–52. doi: 10.1038/nrg3605

[2] ↑ Frankham, R., Ballou, J. D., and Briscoe, D. A. 2002. Introduction to Conservation Genetics. Cambridge: Cambridge University Press. p. 617.

[3] ↑ Emamalipour M., Seidi K., Zununi V. S., Jahanban-Esfahlan A., Jaymand M., Majdi H., et al. 2020. Horizontal gene transfer: from evolutionary flexibility to disease progression. Front. Cell. Dev. Biol. 8:229. doi: 10.3389/fcell.2020.00229

[4] ↑ Cook, L. M., and Saccheri, I. J. 2013. The peppered moth and industrial melanism: evolution of a natural selection case study. Heredity 110:207–12. doi: 10.1038/hdy.2012.92

[5] ↑ Johnson, W. E., Onorato, D. P., Roelke, M. E., Land, E. D., Cunningham, M., Belden, R. C., et al. 2010. Genetic restoration of the Florida panther. Science . 329:1641–5. doi: 10.1126/science.1192891

[6] ↑ Leigh, D. M., Hendry, A. P., Vázquez-Domínguez, E., and Friesen, V. L. 2019. Estimated six per cent loss of genetic variation in wild populations since the industrial revolution. Evol. Appl. 12:1505–12. doi: 10.1111/eva.12810

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Genes (Basel)

Logo of genes

Genetic Diversity, Conservation, and Utilization of Plant Genetic Resources

Romesh kumar salgotra.

1 School of Biotechnology, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu, Chatha, Jammu 180009, India

Bhagirath Singh Chauhan

2 Queensland Alliance for Agriculture and Food Innovation (QAAFI), The University of Queensland, Gatton, QLD 4343, Australia

Associated Data

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Plant genetic resources (PGRs) are the total hereditary material, which includes all the alleles of various genes, present in a crop species and its wild relatives. They are a major resource that humans depend on to increase farming resilience and profit. Hence, the demand for genetic resources will increase as the world population increases. There is a need to conserve and maintain the genetic diversity of these valuable resources for sustainable food security. Due to environmental changes and genetic erosion, some valuable genetic resources have already become extinct. The landraces, wild relatives, wild species, genetic stock, advanced breeding material, and modern varieties are some of the important plant genetic resources. These diverse resources have contributed to maintaining sustainable biodiversity. New crop varieties with desirable traits have been developed using these resources. Novel genes/alleles linked to the trait of interest are transferred into the commercially cultivated varieties using biotechnological tools. Diversity should be maintained as a genetic resource for the sustainable development of new crop varieties. Additionally, advances in biotechnological tools, such as next-generation sequencing, molecular markers, in vitro culture technology, cryopreservation, and gene banks, help in the precise characterization and conservation of rare and endangered species. Genomic tools help in the identification of quantitative trait loci (QTLs) and novel genes in plants that can be transferred through marker-assisted selection and marker-assisted backcrossing breeding approaches. This article focuses on the recent development in maintaining the diversity of genetic resources, their conservation, and their sustainable utilization to secure global food security.

1. Introduction

Genetic diversity is the amount of genetic variability present among individuals of a variety or a population within a species. It is the product of the recombination of genetic material (DNA) during the inheritance process, mutations, gene flow, and genetic drift [ 1 ], and it results in variations in DNA sequence, epigenetic profiles, protein structure or isoenzymes, physiological properties, and morphological properties. The diversity among plant and animal populations is determined by the hereditary material present in the reproducing members of the population. Genetic diversity is the main driving force for the selection and evolution of populations. Within crop species, the selection of individuals can be natural or artificial, depending upon the variation present [ 2 ]. Genetic diversity can be distilled down to the alleles of a gene present in the population, their effects, and their distribution. Genetic diversity is crucial for a healthy population as it maintains different genes that could lead to resistance to pests, diseases, or other stress conditions. It also enables individuals to adapt to various biotic and abiotic stresses. Under environmental changes, different crop varieties survive due to the presence of genetic variation, which enables the varieties to adapt. However, the varieties with little or no genetic diversity could become susceptible to biotic and abiotic stresses. Genetic diversity helps breeders to maintain the crossbred varieties, which leads to sustaining the desirable traits of the varieties, such as quality characteristics and tolerance to various stresses.

In general, plant genetic resources (PGRs) are the total hereditary material, which includes all the alleles of various genes, present in a crop species, including horticulture and medicinal plants, and their wild relatives. They can also be defined as any type of reproductive or vegetative propagating material of the plant species. PGRs include newly developed varieties, cultivated crop varieties, landraces, modern cultivars, obsolete cultivars, breeding stocks, wild forms, weedy forms, wild species of cultivated crops, and genetic stocks, including current breeders’ lines, elite lines, and mutants. These are the building blocks for the genetic improvement of agricultural and industrial crops [ 3 ]. Most of the agro-industry and agro-processing sectors also rely on PGRs. These are the pillars of crop development programs, and world food security depends upon the extent of genetic diversity present in PGRs [ 4 ]. PGRs are used in crop improvement programs, particularly in the varietal developmental programs. These resources are also used in systematic studies, such as evolutionary biology, cytogenetic, biochemical, physiology, phylogenetic, ecological research, pathology, molecular studies, etc. PGRs encompass all cultivated, wild relatives of cultivated species, traditional cultivars, landraces, and advanced breeding lines of plants [ 5 ]. The demand for these PGRs will increase in the future to feed the ever-increasing global population. Moreover, the precedential increase in the world population has resulted in the over-exploitation of PGRs, which has led to the genetic erosion of important germplasm from habitats. Food security issues are of global significance, and genetic resources are being lost at alarming rates due to anthropogenic effects such as genetic erosion, over-exploitation of PGRd, population growth, and climate change. Moreover, with the development and introduction of high-yielding crop varieties, genetic diversity among plant genotypes is declining [ 6 ]. The situation is further exacerbated by the frequent recurrence of biotic and abiotic stresses, resulting in a huge loss of PGRs. To avoid this catastrophic situation, there is a need to protect these valuable resources from genetic erosion and use them judiciously. To meet the current, as well as future, global challenges, PGRs need to be explored, collected, conserved, and utilized sustainably. Additionally, the survey, exploration, collection, preservation, and sustainable utilization of PGRs in an organized way is the responsibility of all nations. Researchers, policymakers, and planners have already begun to plan for the proper conservation and sustainable utilization of PGRs for the benefit of society [ 7 ].

Maintaining diversity in PGRs is vital for the development and genetic improvement of crop varieties. Presently, the plant species extinction rate is skyrocketing, and life on Earth is facing a sixth mass extinction event caused by climate change and anthropogenic activities, which may lead to ecological collapse [ 8 ]. The Leipzig Declaration [ 9 ] emphasized saving the seed and planting material to avoid genetic vulnerability and shortage of food under adverse conditions. The diverse gene pool of plant species, such as wild species, landraces, breeding stock, etc., could hold the tools for survival and adaptation under adverse climatic conditions [ 10 ]. The conservation and sustainable utilization of these valuable resources is crucial to ensure food security for future generations. Genetic resources should be easily available to plant breeders for the continuous development of new crop varieties. PGRs are an important reservoir of disease- and insect/pest-resistant genes, through which improved and immune crop varieties can be developed. In the present scenario of climate change, PGRs have played a significant role in the development of climate-resilient crop varieties to strengthen food security [ 11 ]. By using PGRs, crop varieties are being developed with better yield and quality traits along with resistance to biotic and abiotic stresses, such as diseases, insect pests, flooding, salinity, and drought. Additionally, developing countries rely on PGRs to create more diverse crops. The need for PGRs has risen continually for developing varieties of different crops such as cereals, pulses, vegetables, fruits, and ornamentals [ 11 ].

Today, the conservation and sustainable use of PGRs is a priority of the global community to solve issues surrounding food security and other problems arising from increased population growth. In the future, these resources will completely vanish if proper and stringent PGR conservation practices and policies are not implemented [ 11 ]. This challenge can be overcome by bringing all stakeholders, including farmers, ethnobotanists, indigenous knowledge-holding people, plant breeders, NGOs, seed banks, and policymakers together to share information, create PGR diversity awareness, develop new technologies, and deploy systematic and scientific conservation. Biotechnological techniques, such as cryopreservation, molecular markers, high-throughput sequencing, and genetic engineering, have improved the conservation of endangered and rare PGRs. Priority should also be given to the exploration of local germplasm and underutilized crop species and the maximum utilization of traditionally local landraces, with the involvement of local people. The Convention on Biological Diversity (CBD) and international undertaking on PGRs [ 12 ] are working in harmony for the conservation and sustainable utilization of PGRs under the umbrella of the Earth Summit of the United Nations Conference on Environment and Development (UNCED). A well-planned strategic and forward-looking vision is required for the conservation and sustainable utilization of these genetic resources.

2. Importance of Genetic Diversity in Plant Genetic Resources

Genetic diversity is the genetic base for crop improvement [ 13 ]. Diverse PGRs enable plant breeders to develop or improve crop varieties with desirable qualities. While developing new cultivars, due consideration must be given to the farmers’ preferences, such as high-yielding varieties, quality, and resistance to diseases and insect pests. In ancient times, humans selected desirable genotypes based on natural genetic variability in the population [ 14 ]. The preference for the development of new crop varieties shifts over a period with environmental changes. Plant breeders develop climate-resilient varieties possessing all the desirable traits, including resistance to various biotic and abiotic stresses. Genetic diversity in the form of mutant lines, wild species, breeding stocks, etc., is used for the improvement and development of modern crop varieties [ 13 ]. For the development of climate-resilient varieties, novel genes tolerant to biotic and abiotic stresses need to be conserved for future use in breeding programs. Additionally, the plant genotypes possessing genes for quality traits and aesthetic properties should be preserved in the available germplasm. Genetic diversity within and between plant species allows plant breeders to select superior genotypes, which can then be used for the development of genetic stock for hybridization programs or the release of a crop variety [ 13 ]. Genetic diversity enables PGRs to adapt to varied climatic conditions [ 14 , 15 ]. Moreover, the negative impact of inbreeding in populations can be reduced by enhancing genetic diversity. Higher levels of genetic diversity in PGRs support resilience to adverse environmental changes, integrity, community structure, and ecosystem functions [ 16 , 17 ]. Additionally, it helps plant breeders to utilize genetically diverse parents in a breeding program to improve the productivity of varieties of agriculture and horticulture crops [ 18 ] ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is genes-14-00174-g001.jpg

Different sources of genetic diversity and their potential utilization in the development of new crop varieties.

Genetic diversity in plant species depends on the heritable variation present within and between populations. It occurs due to genetic variation in the nucleotide sequence of DNA, chromosome mutations, and recombination during sexual reproduction [ 19 ]. In the sexual reproduction of plant species, the F 1 and advanced generations are developed by crossing two or more diverse parents. The offspring developed from two genetically diverse parents possess genetic variations because of recombination during meiosis. Hence, genetically dissimilar offspring from parents are produced. However, this is the genetic material of individuals underlying the variability within, as well as between, species [ 20 ]. Generally, genetic diversity can be observed at three levels: diversity between species, diversity between populations within one species, and diversity between individuals within one population. It is genetic variability that provides evolutionary flexibility, resilience, and adaptability in plant species [ 21 ]. Before the identification of diverse parents in plant breeding programs, breeders and biotechnologists used a multitude of techniques for the characterization of the germplasm to know the genetic diversity [ 22 ]. For the characterization of the genetic diversity of PGRs and the identification of superior genotypes, various techniques are used, such as phenotypic or morphological traits, biochemical or allozyme techniques, and molecular techniques.

3. Factors Affecting Genetic Diversity

Genetic diversity changes over time owing to several factors. The main factors responsible for changes in genetic diversity are mutation, selection, genetic drift, and gene flow. Over time, natural and artificial selections play a substantial role in the choosing of superior genotypes, which significantly affects the gene and genotypic frequencies of the population [ 23 ]. As per Charles Darwin’s theory of evolution (1859), the desired genotypes are selected for and passed onto subsequent generations [ 24 , 25 , 26 ]. However, the domestication of desirable genotypes results from the superior genotypes being selected by farmers and breeders and neglects other undesirable genotypes. This leads to a reduction in inferior alleles over generations. During evolution, various morphological, physiological, and biochemical changes take place in plant species and can take different directions under domestication depending on the part of the plant used. Some plant species lose their sexual reproduction during selection for large size of the tuber or root, which is associated with selection for polyploid types, resulting in sterility. Some polyploid plant species, such as allohexaploid wheat and potato, show diploidization behavior during sexual reproduction. Some crops have been turned into annuals from their original form of perennials. In the domestication process, the complete genetic transformation of wild species occurs in the development of modern cultivars through natural and artificial selection [ 23 ]. After some time, some domesticated cultivars become susceptible to diseases and pests, which can be improved by incorporating genes from wild plant relatives [ 27 ]. During the process of domestication, desirable traits have been selected by breeders as per their preferences [ 27 , 28 ]. However, plant breeders prefer to choose crop varieties with a high yield, resistance to biotic and abiotic stresses, wide adaptation, non-shattering nature, large-sized seeds, early maturing, good quality traits, etc. [ 29 , 30 ]. The main factors affecting genetic diversity will be addressed in the following subsections.

3.1. Mutation

Mutations are sudden heritable changes that occur due to aberrations in the nucleotide sequence of DNA. A mutation is the source of genetic variation impacting the phenotype in crop species. Genetic diversity caused by mutations can have neutral, positive, and negative impacts on various characteristics of a plant species. Genetic variations caused by mutations in DNA are the principal cause of changes in the allele frequencies in a population besides selection and genetic drift. From the beginning, natural or spontaneous mutations have played a significant role in creating the genetic variation that has led to food security [ 31 ]. Mutations are the ultimate source of plant evolution when they frequently encounter environmental changes. Mutation rates proceed rapidly in response to environmental changes or even changes in the demographical locations related to the socio-economic conditions of the human population in a geographical area. Stress-inducible mutagenesis has been observed because of the use of different external inputs which accelerate adaptive evolution in plants. During mutagenesis, many kinds of genetic changes have been observed such as insertions, deletions, copy number variations, gross chromosomal rearrangements, and the movement of mobile elements. Earlier plant breeders utilized natural mutations as the main source of genetic variation for improving and developing crop varieties. However, modern technologies have accelerated the process by inducing mutation through mutagenesis The concept of mutation breeding was introduced to create more genetic diversity among crop species to improve traits such as disease and insect pest resistance, tolerance to abiotic stresses, and nutritional enhancement in crop varieties [ 32 ].

3.2. Selection

Natural and artificial selections act on the phenotypic characteristics of the plant species. The phenotypic expression of the plant species depends upon the heritable and non-heritable components in which the genotype–environment interaction also plays a significant role. The selection of superior genotypes depends on the availability of genetic variation present in the plant species. Artificial selection is effective only when sufficient genetic variation is present in the population. The genetic improvement of a genotype depends on the magnitude of genetic variability present in the population, as well as the nature of the association between different components. For example, the level of association of yield traits with other characteristics of the plant species enables the selection of various traits at a time [ 33 ]. Plant breeders make effective selection depending on the presence of substantial genetic variation in the population to enhance the maximum genetic yield potential of crop varieties [ 34 ]. It also helps in selecting better parents to be used in hybridization programs. Hence, the effective selection of genotypes in a population also depends on the degree of genetic variation in the population.

3.3. Migration

Migration is the movement of alleles from one species to another or from one population to another. It occurs through the movement of pollen and seed dispersal and planting material such as rhizomes, suckers, and other vegetative propagating materials. The rate of migration is affected by reproduction cycles and the dispersion of seeds and pollens. Migration can also occur through the moving or shifting of the germplasm from one area to another, which results in the mixing of two or more alleles through pollen and seeds [ 35 ].

3.4. Genetic Drift

Genetic drift is a mechanism in which the gene and allele frequencies of a population change due to sampling errors over generations. The sampling error changes the allele frequencies by chance, which ultimately changes the genetic diversity over generations. Every pollen grain has a different combination of alleles and can be carried by insects, wind, humans, or other means for hybridization with compatible flowers, largely determined by chance. Thus, in every reproduction cycle, the genetic diversity in crop species is lost at every generation through these chance events [ 36 ].

4. Factors That Cause Genetic Vulnerability

Over the past century, it has been observed that the genetic diversity in wild populations is declining globally [ 16 , 37 ]. Genetically distinct populations for most species are also declining due to the shrinkage of geographic ranges and lack of proper management and conservation practices [ 38 , 39 , 40 ]. Most genetic diversity is lost due to infrastructure development, climate change, habitat fragmentation, population reduction, overgrazing, and overharvesting [ 41 ]. Besides this, the following subsections describe the major components responsible for the genetic vulnerability of genetic resources.

4.1. Narrow Genetic Base of Crop Varieties

The main reason for genetic erosion and vulnerability is the cultivation of genetically uniform cultivars with a narrow genetic base. Indigenous or traditional crop varieties have a broad genetic base, and these cultivars can tolerate various biotic and abiotic stresses [ 42 ]. Traditional crop varieties have a low genotype–environment interaction, enabling the genotypes to withstand an epidemic of disease, insect pest incidence, and other adverse environmental conditions [ 43 ]. Moreover, pathogen races are less prone to infesting traditional varieties because of the broad genetic base of these varieties compared to the modern released varieties that have common parents. Hybrids have been developed by crossing different genetically uniform inbred lines, which significantly decreased genetic diversity. Additionally, most high-yielding crop varieties have been developed by crossing common parents possessing similar genetic backgrounds, which can significantly reduce the genetic bases of the varieties [ 22 , 44 ].

4.2. Wide Spread of Dominant Varieties

The widespread cultivation of a single crop variety over a large area causes genetic vulnerability. These varieties may perform well for a short period but may become susceptible to several diseases and pests. Vertical resistance occurs due to the presence of oligogenic or monogenic resistance, and horizontal resistance occurs due to the presence of polygenes. Sometimes the vertical resistance present in the modern cultivar may show resistance against a disease, but will become susceptible if the pathogen evolves. In vertical resistance, the race of the pathogen or the insect pest biotype interacts with the host and overcomes the monogenic resistance present in the modern cultivar, and the variety becomes susceptible to a particular pathogen or biotype.

4.3. Unplanned Introduction of New Plant Species

Sometimes, new high-yielding plant species are introduced and used in a breeding program without proper screening for disease and insect pest resistance, which may result in an unpredicted epidemic of diseases. An example of this is the unplanned introduction of the Texas male sterile (TMS) genotype of maize for the development of hybrid maize genotypes in the USA in 1970. Newly developed maize hybrids had all the desirable characteristics and resistance to most of the common maize diseases. The TMS hybrids were widely cultivated in the USA, covering more than 90% of the maize area. However, these hybrids were susceptible to fungal strains and southern corn leaf blight ( Helminthosporium maydis ). The southern corn leaf blight disease colonized and spread widely, and the whole maize crop was wiped out. If the TMS, a source of male sterility, had been tested and screened properly before use in hybrid breeding programs, or if the monoculture of TMS hybrids had been avoided, the spread of this epidemic could have been countered [ 45 ].

5. Conservation of Plant Genetic Resources

Since the beginning of agriculture, for a time, the selection, cultivation, and conservation of seeds of locally acclimated plants, also known as called “landraces”, were practiced [ 46 ]. This process continued until the rediscovery of Gregor Mendel’s work in the 20th century. This work led to the introduction of breeding programs for the development of high-yielding and stress (biotic and abiotic)-tolerant crop varieties. In the middle of the last century, it laid the foundation for the “Green Revolution” and brought about an exponential increase in agricultural production. However, this led to the replacement of landraces and the expansion of the monoculture cropping system. Over 75% of the genetic diversity in PGRs and 90% of the crop varieties were lost and disappeared from farmers’ fields [ 47 ]. Now, it is of paramount importance that the remaining PGRs be conserved to sustain the agricultural production system in this era of climate change, global environmental problems, and booming population growth [ 48 ].

Since the 16th century, more than 80,000 plant species have been collected and preserved in about 3400 gardens across the world [ 49 ]. The main objective of this effort is to conserve the PGR diversity and wild species of crop plants to be used in breeding programs. In the mid-20th century, PGRs for food and agriculture (PGRFA) were preserved ex situ in specialized repositories, often termed gene banks. These gene banks are focused on inter-and intra-specific crop diversity. Presently, more than 17,000 regional, national, and international institutions are dealing with the conservation and sustainable use of PGRFA [ 49 ]. Additionally, 711 gene banks and 16 regional/international institutions/centers are spread over 90 countries, conserving more than 5.4 million accessions from over 7051 genera. The focus is to conserve the crop species, including crops’ wild relatives, landraces, modern cultivars, genetic stock, and breeding materials [ 50 ]. However, various international treaties have been implemented in harmony with the CBD for conservation, sustainable utilization, equity in benefit-sharing, and the safe handling of genetic resources.

5.1. International Treaty on Plant Genetic Resources for Food and Agriculture

The International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA) came into force in 2004. ITPGRFA works in harmony with the CBD for sustainable agriculture and food security. The objective of the treaty is the conservation and sustainable use of plant genetic resources for food and agriculture and the fair and equitable sharing of the benefits arising from their use. The conservation and sustainable use of PGRFA are essential to achieving sustainable agriculture and food security, for present and future generations, and are indispensable for crop genetic improvement in adapting to unpredictable environmental changes and human needs.

5.2. Nagoya Protocol

The Nagoya Protocol, which came into force in 2014, aims to access genetic resources and encourage the fair and equitable sharing of benefits arising from their utilization. The Nagoya Protocol helps in ensuring benefit-sharing, creates incentives to conserve and sustainably use genetic resources, and therefore, enhances the contribution of biodiversity to development and human well-being.

5.3. Svalbard Global Seed Vault

The Svalbard Global Seed Vault situated in Norway safeguards duplicate seed varieties from almost every country in the world. The Seed Vault is owned and run by the Ministry of Agriculture and Food on behalf of the Kingdom of Norway and is established as a service to the world community. The Global Crop Diversity Trust provides support for the ongoing operations of the Seed Vault, as well as funding for the preparation and shipment of seeds from developing countries to the facility. The Nordic Genetic Resource Center (NordGen) operates the facility and maintains a public online database of samples stored in the seed vault. It provides insurance against both incremental and catastrophic loss of crop diversity held in traditional genebanks around the world. The Seed Vault offers long-term protection for one of the most important natural resources on Earth. The main purpose is to backup genebank collections to secure the foundation of our future food supply.

5.4. The Cartagena Protocol on Biosafety

The Cartagena Protocol on Biosafety’s goal is to provide safety in the handling of genetic resources, particularly genetically modified organisms. It is an international agreement that aims to ensure the safe handling, transport, and use of living-modified organisms (LMOs) resulting from modern biotechnology that may have adverse effects on biological diversity, while also taking into account risks to human health.

The ever-increasing demand resulting from the explosive growth rate of the human population worldwide, and global warming, have forced world communities to think about the sustainable use of PGRs. The conservation of PGRs, including landraces, obsolete varieties, breeding material, wild species, and their wild relatives, is of utmost importance to secure future food security [ 44 ]. The vanishing of valuable genetic resources invoked the world’s communities to explore, collect, and preserve PGRs and maintain genetic diversity, as well as sign the CBD event in Rio de Janeiro in 1992. The importance of PGRs and biodiversity conservation was the main international issue discussed at the convention [ 44 ]. The CBD was organized with three main objectives: (i) the conservation of biodiversity, (ii) the sustainable use of its components, and (iii) the equitable sharing of benefits arising from the use of genetic resources. There is an urgent need to conserve genetic resources for the welfare of human beings and future food security, and to avoid the loss of valuable novel genes. Effective policies should be implemented to evade the extinction of valuable PGRs. There are various methods to conserve biodiversity, such as (i) in situ conservation, (ii) ex situ conservation, and (iii) biotechnological strategies/approaches ( Figure 2 ). The genetic diversity in PGRs, in situ or on farms/fields, is creating awareness in society at large about the importance of agrobiodiversity. In situ and ex situ conservation are complementary strategies to prevent the mass erosion of genetic resources. The utilization of crop genetic diversity is necessary for the development and release of new, well-adapted, and improved varieties for global food security.

An external file that holds a picture, illustration, etc.
Object name is genes-14-00174-g002.jpg

Different strategies used for in situ and ex situ conservation of plant genetic resources.

5.5. In-Situ Conservation

In in situ conservation, genetic resources are conserved in their natural habitat, and the species are maintained in their original place. The plant species are conserved where they are found and are maintained in their original location [ 51 ]. In in situ conservation, the process of evolution is allowed to occur naturally with minimum interventions from humans. In this system, many wild plant species are conserved, especially forest and wild fruit crops. In situ conservation permits the plant species to evolve so that genetic diversity can be fostered. This process works via two methods: (i) farm/field conservation and (ii) genetic reserve conservation. Though both are concerned with the conservation and maintenance of diversity of genetic resources, on-farm conservation concerns traditional crop varieties or farming systems, while the latter deals with wild species in natural habitats [ 4 , 11 ]. In genetic reserve conservation, the area is defined by a location where genetic diversity has to be maintained through active and long-term conservation, such as a forest reserve area. In on-farm conservation, locally developed landraces are sustainably managed. Additionally, farmers conserve wild relatives and weedy forms within the existing farming system. Farmers select desirable plants for further cultivation; hence, a continuous process of evolution takes place. The in situ method of conservation allows the open pollination of different genotypes, and the resultant population of different genotypes possesses several alleles. However, to avoid natural calamities and the adverse effects of climate change, both in situ and ex situ conservation should be adopted complementarily [ 4 , 11 ].

5.6. Ex-Situ Conservation

Ex situ conservation is the conservation of different genetic resources outside their natural habitat. It involves the conservation of seed gene banks, plant tissue culture, cryopreservation, greenhouses, etc. It is the process of conserving endangered and overexploited genetic resources outside their natural habitat, which otherwise may experience habitat destruction and degradation, and every PGR may go extinct. Therefore, ex situ conservation is an alternate method of conserving valuable genetic resources [ 52 ]. In this method, PGRs are saved from extinction that would result from natural calamities, human interference, climate change, over-exploitation, and overutilization. The collected genetic resources should be well evaluated and characterized to avoid duplication, documented, and conserved under artificial conditions to be safe from external threats [ 4 ]. Among the various techniques of ex situ conservation, the seed storage technique is the most convenient and easiest for the long-term storage of seeds [ 4 , 11 ]. Orthodox seeds of food crops are used for storage as they can tolerate low temperatures and intense dehydration. In ex situ conservation, about 45% of the stored accessions are seed materials of cereal crops such as rice, wheat, maize, oat, triticale, rye, sorghum, and barley, followed by food legumes (15%), forages (9%), and vegetables (7%) [ 46 , 49 ].

Generally, the conservation of collected seeds is carried out in two ways: base collection and active collection. Base collection is the collection and maintenance of seed samples for long-term conservation. In this case, the seed samples are stored for the maximum time of seed viability at −18 to −20 °C [ 53 ]. In the base collection method, the moisture content of the seed to be stored should be between 3% and 7%, depending on the species. In the active collection method, the seed samples are stored for immediate use. Seed samples are stored for 10–20 years and should have at least 65% viability. In the active collection method, the moisture content varies from species to species, i.e., between 7% and 11% for seeds with good storability and between 3% and 8% for seeds with poor storability. It also depends on the temperature under which the seed samples are stored [ 54 ]. However, depending on the storage duration, these are categorized into three basic types: (i) long-term storage: when the seed samples are stored in facilities of base collection and are maintained at −18 to −20 °C; (ii) medium-term storage: when the period of storage is not more than 5 years, and seed samples are stored at a temperature between 0 °C and 10 °C with a relative humidity of 20–30%; and (iii) short-term storage: where the seed samples are stored for between 1 year and 18 months. For the latter, the temperature ranges between 20 °C and 22 °C, and the relative humidity should be 45–50%, where the seed can be stored for up to two years without losing its viability [ 54 ]. For long-term ex situ conservation, seed storage is the most low-cost and widely adopted storage method. It involves the desiccation of seeds and even storage in low-temperature conditions. However, the recalcitrant seeds and vegetatively propagated plant species do not survive under low temperatures like orthodox seeds. This method is significant for the conservation of forest and tree species. Even novel PGRs can be conserved in the home garden for future use in breeding programs. The ex situ conservation method enables the conservation of novel genes/alleles and ensures their sustainable use in crop improvement programs. The ex situ conservation of PGRs was started in the mid-20th century to slow the rapid loss of biodiversity with modern high-yielding crop varieties. The farmers replaced their traditional cultivars with improved ones [ 53 ]. This method is also helpful in the protection and conservation of wild relatives [ 55 , 56 ]. Ex situ conservation methods have been used for conserving important PGRs in several institutes [ 3 , 53 ] ( Table 1 ).

Important research institutes conserving and maintaining PGRs.

S. No.International Research InstituteMandate/Crops
1.International Rice Research Institute (IRRI), Los Banos, PhilippinesRice
2.Centre International de-Mejoramients de Maize (CIMMYT), Trigo, El Baton, MexicoMaize and wheat (triticale, barely, sorghum)
3.Center International de-agricultural Tropical (CIAT), Palmira, ColumbiaCassava and beans (also maize and rice), in collaboration with CIMMYT and IRRI
4.International Institute of Tropical Agriculture (IITA), Ibadan, NigeriaGrain legumes, roots and tubers, farming systems, cassava, banana, yam
5.Centre International de la Papa (CIP), Lima. PeruPotato, Andean root, and tubers
6.International Crops Research Institute, for Semi-Arid Tropics (ICRISAT), Hyderabad, IndiaSorghum, groundnut, pearl millet, Bengal gram, red gram
7.West African Rice Development Association (WARDA), Monrovia, LiberiaRegional cooperative rice research in collaboration with IITA and IRRI
8.International Plant Genetic Research Institute (IPGRI), Rome Italy Genetic conservation
9.National Bureau of Plant Genetic Resources, New Delhi, IndiaFruits, tubers, medicinal and aromatic crops, spices, bulbous crops
10.The Asian Vegetable Research and Development Center (AVRDC), TaiwanTomato, onion, peppers, Chinese cabbage
11.International Center for Tropical Agriculture (CIAT) ColumbiaCassava
12.The New Zealand Institute for Plant and Food Research Limited, New ZealandKiwifruit ( spp.)
13.Svalbard Global Seed Vault, NorwayAll crops from different countries

5.7. Biotechnological Approaches

Plant biotechnology tools provide new opportunities for the conservation of genetic resources using various in vitro culture techniques. Various biotechnological tools, such as cell and tissue culture and other micropropagation techniques, have greatly contributed to the storage and transportation of PGRs [ 57 ]. Cell and tissue culture techniques are in use for the mass multiplication and production of PGRs in a short time for further conservation and transportation under aseptic conditions. Apart from this, next-generation sequencing (NGS), cell fusion techniques, recombinant technologies, proteomic structural biology, protein engineering, and genome editing techniques have opened new avenues and options for conserving genetic resources with increased precision. These technologies help in the conservation of rare and endangered species, ornamental species, forest species, medicinal species, and other vegetatively propagating plant materials [ 51 ]. When the biological material of PGRs (such as seeds or organs) cannot be propagated and stored using traditional methods, biotechnological tools, such as in vitro culture, cryopreservation, and molecular biology, can be used. Sometimes, reproductive barrier problems existing in some endangered and rare plant species can be solved via biotechnological interventions [ 58 ]. The following subsections highlight the main biotechnological techniques used for PGR conservation, which are not possible under normal storage systems. These techniques also help in the conservation of elite and pathogen-free plants in the short-, medium- and long-term.

5.7.1. In Vitro Propagation

In vitro gene banks are where PGRs are stored in an artificial nutrient medium. This is an alternative method to conserving the vegetative propagated plant genetic material [ 59 , 60 ]. The in vitro conservation method is well recognized by global agencies such as the International Board for Plant Genetic Resources (IBPGR) for safe transportation under regulated phytosanitary control. The main advantages of this technique are insect- and disease-free material, mass multiplication, no genetic erosion, reduced space and labor requirements, and less time taken to obtain a new plant. This technique helps to scale up the production of quality planting material throughout the year. In in vitro methods, the callus is produced from explants such as seeds, leaves, tubers, shoots, and nodes, from which a whole plant is regenerated. In in vitro techniques, an effective conservation method is required once cultures are established and the plant genetic material is multiplied sufficiently. This can be achieved by regularly subculturing the plants onto fresh media. However, there is a risk that subculturing may lead to microbial contamination and chances of somaclonal variations.

The successful production and propagation of genetically stable plants from cultures are prerequisites for in vitro conservation. Shoots are used for slow-growth storage to avoid somaclonal variations. This slow-growth storage is optimal for the medium-term conservation of PGRs [ 61 , 62 , 63 ]. In this method, the targeted germplasm is stored under plant tissue culture conditions and maintained on nutrient gels for 1 to 15 years with intermittent subculturing. Several techniques are optimized to slow the rate of growth, such as low-intensity light with lower temperatures or a reduced photoperiod. Sometimes, the slow growth of cold-tolerant plant species is maintained by employing a temperature range of 0–5 °C, and for tropical plant species, a temperature range of 15–20 °C. The use of growth retardants in culture media and cutting the supply of oxygen is carried out at different levels to slow down the growth of plantlets [ 64 ].

5.7.2. Cryopreservation

The cryopreservation technique involves the storage of biological plant tissue for conservation at ultra-low temperatures (−196 °C), mostly using liquid nitrogen. In cryopreservation, the plant species can be stored for a long period as all the activities, such as cellular metabolism and cell division in recalcitrant seeds and vegetatively propagated plant material, stop. In this approach, no sub-culturing is required, and the chances of somaclonal variations are also reduced [ 65 , 66 ]. The cryopreservation technique ensures cost-effective and safe long-term conservation of plant species; a wide range of plant species can be stored using this technique. In cryopreservation, a cryotherapy technique is also applied to eradicate systemic plant pathogens. In this technique, only meristem cultures or shoot apices are recommended because of their high rate of viability following an extended storage time and because they are virus-free plant materials [ 67 ]. In the cryopreservation technique, the first step is to remove all freezable water content from tissues using osmotic dehydration or a physical approach, followed by ultra-rapid freezing [ 68 ]. The freezable water content can be removed using freeze-induced dehydration and vitrification methods. In vitrification, crystalized ice formation is avoided, and the liquid phase is directly converted into an amorphous phase [ 69 ].

5.7.3. DNA Banks

Advances in molecular biology have made the conservation of endangered and rare species complementary. Genetic resource conservation through DNA is a cost-effective form of conserving PGRs. In biodiversity, many species are difficult to conserve and are at the stage of extinction. DNA storage may be one of the best alternatives to conserve the genetic diversity of these resources, which could possess novel genes/alleles that could aid in future food security. In a gene bank, genomic fragments consisting of individual genes or entire genotypes are conserved in a gene library or a library of DNA samples. Genetic information can be stored in the form of DNA, RNA, and cDNA. These libraries are the primary source of important germplasm for future scientific research worldwide. DNA conservation is an alternative method of conserving PGRs, where the genetic materials are difficult to conserve or threatened because of wild populations or climate change [ 70 ]. The genetic material, in the form of DNA, can be stored at −20 °C for up to 2 years for short- and mid-durations. However, for long-term storage, the genetic material can be stored at −70 °C with the help of liquid nitrogen. For the preservation of DNA, there are some DNA banks, such as the Australian Plant DNA Bank of Southern Cross University, the Royal Botanic Garden (UK), the Leslie Hill Molecular Systematics Laboratory, and the US Missouri Botanical Garden. Among these, The Royal Botanic Garden (UK) is the oldest and the most comprehensive DNA bank, encompassing more than 20,000 DNA samples of all plant families. Like other techniques of PGR conservation, DNA conservation can neither constitute the whole plant from conserved DNA nor recover the original genotypes. In these techniques, conserved DNA in the bank is first artificially introduced into the somatic cells, and then, the whole plant is regenerated using in vitro tissue culture techniques [ 51 ].

5.7.4. Digital Sequence Information

Digitized molecular data are vital to numerous aspects of scientific research and genetic resource use. Substantial advances in DNA sequencing over the last decades hold great potential to enhance food security and the sustainable use of global biodiversity, benefiting the world’s poorest people. Digital Sequence Information (DSI) plays a crucial role in catalyzing research applications that can contribute to international societal and biodiversity conservation targets. There are concerns over access to genetic resources and the absence of benefit sharing by provider countries. Open access to DSI might exacerbate this, which is leading to increasing policy interventions and restricted access to genetic resources and DSI. However, benefit sharing related to DSI is difficult to identify and hindered by the lack of clear international governance and legislation, which, in turn, has led to a reluctance to make DSI publicly and freely available.

6. Utilization of Plant Genetic Resources in Crop Improvement

Before modern cultivated crop varieties, landraces had more genetic diversity. Modern varieties are developed for specific traits, such as high yield, disease resistance, insect pest resistance, stress tolerance, and the improvement of nutritional characteristics. The plant breeders select diverse parents from PGRs in crossing programs to develop new crop varieties [ 71 ]. New crop varieties take at least 8–11 years to develop and may last for 5–6 years under cultivation. However, these varieties can be improved further by incorporating novel genes/alleles from wild relatives or wild species. The wild relatives and landraces are rich sources of novel genes resistant to biotic and abiotic stresses, and these are easily crossable with the cultivated crop varieties [ 72 ]. PGRs can be used in breeding programs in four ways: (i) the development of pre-breeding materials to be used in traditional breeding methods, (ii) the development of genetic stock as a source of resistance to various biotic and abiotic stresses and quality traits, (iii) the characterization and identification of PGRs for male sterility for the development of hybrids [ 73 ], and (iv) the development of modern cultivars by transferring the gene of interest from different genetic resources to popular crop varieties. PGRs are also used to increase genetic variation in the breeding population, incorporate genes to reduce the bottlenecks of the varieties, and develop hybrids, i.e., composites or synthetics.

Genomic Tools for Efficient Use of Plant Genetic Resources

With the advent of modern biotechnological techniques, the efficiency of plant breeders has significantly increased in the development and improvement of crop varieties. Next-generation sequencing (NGS), high-throughput sequencing (HTS), and high-throughput phenotypic (HTP) techniques enable more efficient use of PGRs. Among the various available techniques, DNA-based techniques are more reliable and widely used in crop improvement programs. Unlike other markers, molecular markers are not influenced by the environmental changes and developmental stages of plants [ 74 ]. Molecular markers—such as restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphisms (AFLPs), inter-simple sequence repeats (ISSRs), diversity array technology (DArT), simple sequence repeats (SSRs), single nucleotide polymorphisms (SNPs), etc.—are widely used in the molecular characterization of genetic diversity present within and between plant populations. Among the various molecular markers, SSR markers are widely used to characterize genotypes [ 22 , 75 ]. DNA-based markers are more efficient in the evaluation of the genetic diversity of endangered and rare species. The main advantages of molecular markers are that any small sample of plant material can be used for genetic diversity analysis. Molecular markers have been widely used to study genetic diversity, and different core collections of PGR accessions have been developed in crops such as rice [ 66 , 76 , 77 ], wheat [ 78 ], mungbean [ 79 ], soybean [ 80 ], common bean [ 81 , 82 ], pigeon pea [ 83 ], chilli [ 84 ], potato [ 85 ], carrot [ 86 ], tomato [ 87 ], oil palm [ 88 ], cotton [ 89 ], mulberry [ 90 ], barnyard [ 91 ], legume crops [ 92 ], and other vegetable and horticulture crops [ 93 ].

With the advances in high-throughput sequencing techniques, SNP markers are preferred for use in crop improvement programs. Various QTLs have been identified by developing a biparental population using PGR populations [ 94 , 95 , 96 ]. Besides the molecular characterization of the germplasm of PGRs for genetic diversity studies, molecular markers are widely used in plant breeding approaches, such as (i) the molecular marker-assisted testing of breeding materials for parental selection, assessing the level of genetic diversity, studying heterosis, the identification of genomic regions under selection, and the assessment of cultivar purity and cultivar identity [ 35 , 97 , 98 , 99 , 100 , 101 , 102 ]; (ii) marker-assisted recurrent selection (MARS) [ 103 ]; (iii) marker-assisted backcross breeding (MABB) [ 104 ]; and (iv) marker-assisted gene pyramiding [ 105 ] and genomic selection for complex traits [ 106 ]. Important crops have had biotic and abiotic resistance and quality traits introgressed through MAS and MABC/MABB approaches [ 66 , 75 , 92 ] ( Table 2 ).

Improvement of various crops using biotechnological tools.

CropMolecular Breeding ApproachesTrait(s) ImprovedReference
RiceMABBBacterial blight resistance[ ]
MABBSemi-dwarf and bacterial blight resistance[ , ]
MABBBlast resistance[ ]
WheatMABBStripe rust resistance[ ]
MABBStem rust resistance[ , ]
MASLeaf rust resistance[ ]
MaizeMABBQuality improvement[ ]
CowpeaMABCMosaic virus (CpMV) resistance[ ]
Common beanMABBImproved drought adaptation[ ]
MABCAnthracnose resistance[ ]
SoybeanMAS and MABCSeveral soybean cyst nematodes and multiple disease-resistant genotypes[ ]
MABBPowdery mildew diseases resistance [ ]
MABCSoybean mosaic virus (SMV) resistance[ ]
MABCFree Kunitz trypsin inhibitor[ ]
MABCElimination of lipoxygenase-2,[ ]
PeanutMABCIntrogression lines showing higher yield and increased rust resistance[ ]
MABCResistance to nematode[ , ]
MABCEnhanced oleic acid[ , ]
ChickpeaMABCResistance to fusarium wilt[ ]
MABCResistance to Ascochyta blight[ ]
MABCDrought tolerance[ ]
MABCElimination of lipoxygenase-2,[ ]

MAS—marker-assisted selection; MABB—marker-assisted backcross breeding; MABC—marker-assisted backcross.

Biotechnological tools have been efficiently used for the improvement of susceptible crop varieties. However, for the sustainable utilization of genetic resources, advanced techniques, such as NGS, HPG, and HTP, should be used to develop new crop varieties to ensure food security in the near future.

7. Conclusions

To meet the ever-increasing demand for food production, crop diversification, climate-resilient farming, etc., PGRs should be used for sustainability for future food security. However, the efficient use of PGRs can help to meet these needs, and one of the major challenges in the PGR community is to improve access to PGR collections by increasing the amount of information available about collections, through conservation, by participating in pre-breeding activities, etc. A great challenge in the PGR community is the increased demand for PGRs in the wider farming and breeding community. Maintaining genetic diversity, the conservation of PGRs, and sustainable utilization should be the priority of national and international communities. The proper monitoring of genetic erosion and genetic diversity vulnerability is crucial to protecting rare and endangered plant species. As no single technique of conservation is perfect, there is a need to practice in situ and ex situ conservation complementarily. Technical as well as financial support should be provided to farmers and local people for the proper conservation of plant genetic resources. For the management and sustainable use of PGRs, the capacities of local communities, indigenous people, farmers, breeders, extension workers, and other stakeholders, including entrepreneurs and small-scale enterprises, should be strengthened. The proper evaluation, characterization, and documentation of endemic plant species and their exact habitats should be prioritized. More frameworks and policies should be implemented for the sustainable conservation of landraces and their wild relatives. Biotechnological tools should be used for the characterization of plant genetic resources, conservation, and their utilization in breeding programs. Allele/gene mining for important traits in wild species and wild relatives of crops should be given more importance. The effective utilization of plant genetic resources would contribute to solving constraints that limit crop productivity. High-throughput genotypic and phenotypic techniques should be used for the sustainable utilization of genetic resources for future food security.

Funding Statement

This research received no external funding.

Author Contributions

Conceptualization, R.K.S.; Writing—Original Draft Preparation, R.K.S.; Writing—Review and Editing, B.S.C.; Supervision R.K.S.; Funding Acquisition, B.S.C. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

What Is Genetic Diversity and Why Does it Matter?

  • December 2021
  • Frontiers for Young Minds 9:656168

Melissa Minter at The Royal Society for the Protection of Birds

  • The Royal Society for the Protection of Birds
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Colette Blyth at University of Adelaide

  • University of Adelaide

Laura Bertola at National Centre for Biological Sciences

  • National Centre for Biological Sciences

Abstract and Figures

Figure

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Riya Mishra

  • M. K. Shrivastava

Pawan K Amrate

  • J APPL ICHTHYOL
  • Sevan Ağdamar

Gülşah Saç

  • Ye-Ling Lao

Qun Zhang

  • ACTA ADRIAT

Cem Tolga Gurkanli

  • Kassiana Kehl

Ivan Ricardo Carvalho

  • Francine Lautenchleger

Jaidev Chauhan

  • Vijay Kant Purohit
  • Amandeep Kumar
  • Mohan Chandra Nautiyal

Bwsrang Basumatary

  • Muhammad Majiidu

Rahadian Pratama

  • Wen-Xiong Wang
  • Melissa Emamalipour

Khaled Seidi

  • Sepideh Zununi Vahed

Peyman Zare

  • Vicki L. Friesen
  • NAT REV GENET

Rachel S Meyer

  • Michael D. Purugganan

Laurence Cook

  • W.E. Johnson

Dave Onorato

  • Stephen J. O'Brien

Richard Frankham

  • Karina H. McInnes
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

diversity-logo

Journal Menu

  • Diversity Home
  • Aims & Scope
  • Editorial Board
  • Reviewer Board
  • Topical Advisory Panel
  • Instructions for Authors
  • Special Issues
  • Sections & Collections
  • Article Processing Charge
  • Indexing & Archiving
  • Editor’s Choice Articles
  • Most Cited & Viewed
  • Journal Statistics
  • Journal History
  • Journal Awards
  • Conferences
  • Editorial Office

Journal Browser

  • arrow_forward_ios Forthcoming issue arrow_forward_ios Current issue
  • Vol. 16 (2024)
  • Vol. 15 (2023)
  • Vol. 14 (2022)
  • Vol. 13 (2021)
  • Vol. 12 (2020)
  • Vol. 11 (2019)
  • Vol. 10 (2018)
  • Vol. 9 (2017)
  • Vol. 8 (2016)
  • Vol. 7 (2015)
  • Vol. 6 (2014)
  • Vol. 5 (2013)
  • Vol. 4 (2012)
  • Vol. 3 (2011)
  • Vol. 2 (2010)
  • Vol. 1 (2009)

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

Genetic Diversity and Molecular Evolution

  • Special Issue Editors

Special Issue Information

Benefits of publishing in a special issue.

  • Published Papers

A special issue of Diversity (ISSN 1424-2818).

Deadline for manuscript submissions: closed (30 August 2014) | Viewed by 88701

Share This Special Issue

Special issue editor.

research articles on genetic diversity

Dear Colleagues,

Genetic diversity is fundamental to species survival, to the continued evolution of new species and adaptation to changing environments. The study of genetic diversity is important for conservation biologists because ecosystems possessing a high degree of genetic diversity are generally the healthiest, most stable, and most able to adapt to changing environmental conditions. The existence of genetic diversity is necessary for evolution to occur and it is genetic variation that natural selection acts on. The maintenance of biodiversity has practical application because having a large degree of genetic variation among economically valuable commodities (e.g. agricultural crops, livestock etc…) increases the resistance of these resources to pests and disease.  Since the advent of various molecular techniques, these techniques have been widely used for characterizing genetic diversity, and studying molecular evolution, which have resulted in many new discoveries and upset many of our traditional views about the genetic diversity and evolution. Thus, it is a good time to summarize current status of genetic diversity and molecular evolution, and to discuss future perspective of this field.

Dr. Genlou Sun Guest Editor

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website . Once you are registered, click here to go to the submission form . Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Diversity is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

  • genetic diversity and conservation
  • population genetics
  • nucleotides diversity and ecogenomics
  • molecular evolution and phylogeny
  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here .

Published Papers (9 papers)

Jump to: Review

research articles on genetic diversity

Jump to: Research

research articles on genetic diversity

Graphical abstract

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Perspective
  • Published: 10 February 2022

A roadmap to increase diversity in genomic studies

  • Segun Fatumo   ORCID: orcid.org/0000-0003-4525-3362 1 , 2 ,
  • Tinashe Chikowore 3 , 4 ,
  • Ananyo Choudhury 3 ,
  • Muhammad Ayub   ORCID: orcid.org/0000-0002-7111-1571 5 ,
  • Alicia R. Martin 6 , 7 &
  • Karoline Kuchenbaecker 5 , 8  

Nature Medicine volume  28 ,  pages 243–250 ( 2022 ) Cite this article

21k Accesses

203 Citations

1397 Altmetric

Metrics details

  • Research data

Two decades ago, the sequence of the first human genome was published. Since then, advances in genome technologies have resulted in whole-genome sequencing and microarray-based genotyping of millions of human genomes. However, genetic and genomic studies are predominantly based on populations of European ancestry. As a result, the potential benefits of genomic research—including better understanding of disease etiology, early detection and diagnosis, rational drug design and improved clinical care—may elude the many underrepresented populations. Here, we describe factors that have contributed to the imbalance in representation of different populations and, leveraging our experiences in setting up genomic studies in diverse global populations, we propose a roadmap to enhancing inclusion and ensuring equal health benefits of genomics advances. Our Perspective highlights the importance of sincere, concerted global efforts toward genomic equity to ensure the benefits of genomic medicine are accessible to all.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

195,33 € per year

only 16,28 € per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

research articles on genetic diversity

Similar content being viewed by others

research articles on genetic diversity

The GenomeAsia 100K Project enables genetic discoveries across Asia

research articles on genetic diversity

Genomic data in the All of Us Research Program

research articles on genetic diversity

A state-based approach to genomics for rare disease and population screening

Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47 , D1005–D1012 (2019).

Article   CAS   PubMed   Google Scholar  

Patin, E. et al. Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America. Science 356 , 543–546 (2017).

Auton, A. & Salcedo, T. in Assessing Rare Variation in Complex Traits: Design and Analysis of Genetic Studies (eds Zeggini, E. & Morris, A.) 71–85 (Springer New York, 2015).

Fan, S., Hansen, M. E. B., Lo, Y. & Tishkoff, S. A. Going global by adapting local: a review of recent human adaptation. Science 354 , 54–59 (2016).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Majara, L. et al. Low generalizability of polygenic scores in African populations due to genetic and environmental diversity. Preprint at bioRxiv https://doi.org/10.1101/2021.01.12.426453 (2021).

Huang, Q. Q. et al. Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistanis and Bangladeshis. Preprint at https://www.medrxiv.org/content/10.1101/2021.06.22.21259323v1 (2021).

Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177 , 26–31 (2019).

Asimit, J. L., Hatzikotoulas, K., McCarthy, M., Morris, A. P. & Zeggini, E. Trans-ethnic study design approaches for fine-mapping. Eur. J. Hum. Genet. 24 , 1330–1336 (2016).

Article   PubMed   PubMed Central   Google Scholar  

Chikowore, T. et al. Polygenic prediction of type 2 diabetes in continental Africa. Preprint at bioRxiv https://doi.org/10.1101/2021.02.11.430719 (2021).

Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72 , 1883–1893 (2018).

Fatumo, S. The opportunity in African genome resource for precision medicine. EBioMedicine 54 , 102721 (2020).

Fatumo, S. et al. Discovery and fine-mapping of kidney function loci in first genome-wide association study in Africans. Hum. Mol. Genet. 30 , 1559–1568 (2021).

Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581 , 434–443 (2020).

Genovese, G. et al. A risk allele for focal segmental glomerulosclerosis in African Americans is located within a region containing APOL1 and MYH9 . Kidney Int. 78 , 698–704 (2010).

Rotimi, C. N. et al. The genomic landscape of African populations in health and disease. Hum. Mol. Genet. 26 , R225–R236 (2017).

Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9 . Nat. Genet. 37 , 161–165 (2005).

Gao, C. et al. Risk of breast cancer among carriers of pathogenic variants in breast cancer predisposition genes varies by polygenic risk score. J. Clin. Oncol. 39 , 2564–2573 (2021).

Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19 , 581–590 (2018).

Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100 , 635–649 (2017).

Scutari, M., Mackay, I. & Balding, D. Using genetic distance to infer the accuracy of genomic prediction. PLoS Genet. 12 , e1006288 (2016).

Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51 , 584–591 (2019).

Sengupta, D., Choudhury, A., Basu, A. & Ramsay, M. Population stratification and underrepresentation of indian subcontinent genetic diversity in the 1000 genomes project dataset. Genome Biol. Evol. 8 , 3460–3470 (2016).

Oh, S. S. et al. Diversity in clinical and biomedical research: a promise yet to be fulfilled. PLoS Med. 12 , e1001918 (2015).

Bentley, A. R., Callier, S. & Rotimi, C. N. Diversity and inclusion in genomic research: why the uneven progress? J. Community Genet. 8 , 255–266 (2017).

H3Africa Consortium et al. Research capacity. Enabling the genomic revolution in Africa. Science 344 , 1346–1348 (2014).

Tindana, P. et al. Community engagement strategies for genomic studies in Africa: a review of the literature. BMC Med. Ethics 16 , 24 (2015).

Hindorff, L. A. et al. Prioritizing diversity in human genomics research. Nat. Rev. Genet. 19 , 175–185 (2018).

Tan, S.-H., Petrovics, G. & Srivastava, S. Prostate cancer genomics: recent advances and the prevailing underrepresentation from racial and ethnic minorities. Int. J. Mol. Sci. 19 , E1255 (2018).

Article   PubMed   Google Scholar  

Reverby, S. M. Ethical failures and history lessons: the US Public Health Service Research Studies in Tuskegee and Guatemala. Public Health Rev. 34 , 1–18 (2012).

Article   Google Scholar  

Löwy, I. The best possible intentions testing prophylactic approaches on humans in developing countries. Am. J. Public Health 103 , 226–237 (2013).

Kraft, S. A. et al. Beyond consent: building trusting relationships with diverse populations in precision medicine research. Am. J. Bioeth. 18 , 3–20 (2018).

Global Forum for Health Research. The 10/90 Report on Health Research 2000 ( http://www.globalforumhealth.org/ ).

McGregor, S., Henderson, K. J. & Kaldor, J. M. How are health research priorities set in low- and middle-income countries? A systematic review of published reports. PLoS ONE 9 , e108787 (2014).

Sridhar, D. Who sets the global health research agenda? The challenge of multi-bi financing. PLoS Med. 9 , e1001312 (2012).

Mbaye, R. et al. Who is telling the story? A systematic review of authorship for infectious disease research conducted in Africa, 1980–2016. BMJ Glob. Health 4 , e001855 (2019).

Stein, C. M. Challenges of genetic data sharing in african studies. Trends Genet. 36 , 895–896 (2020).

Wright, G. E. B., Koornhof, P. G. J., Adeyemo, A. A. & Tiffin, N. Ethical and legal implications of whole-genome and whole-exome sequencing in African populations. BMC Med. Ethics 14 , 21 (2013).

Ascencio-Carbajal, T., Saruwatari-Zavala, G., Navarro-Garcia, F. & Frixione, E. Genetic/genomic testing: defining the parameters for ethical, legal and social implications. BMC Med. Ethics 22 , 156 (2021).

Choudhury, A. et al. Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans. Nat. Commun. 8 , 2062 (2017).

The Malaria Genomic Epidemiology Network. A global network for investigating the genomic epidemiology of malaria. Nature 456 , 732–737 (2008).

Choudhury, A., Sengupta, D., Aron, S. & Ramsay, M. in Africa, the Cradle of Human Diversity (eds Fortes-Lima, C. et al.) 257–304 https://doi.org/10.1163/9789004500228_011 (Brill, 2021).

Gurdasani, D. et al. Uganda Genome Resource enables insights into population history and genomic discovery in africa. Cell 179 , 984–1002 (2019).

Asiki, G. et al. The general population cohort in rural south-western Uganda: a platform for communicable and non-communicable disease studies. Int. J. Epidemiol. 42 , 129–141 (2013).

Fatumo, S. et al. Metabolic traits and stroke risk in individuals of African ancestry: Mendelian randomization analysis. Stroke 52 , 2680–2684 (2021).

Boua, P. R. et al. Novel and known gene–smoking interactions with cIMT identified as potential drivers for atherosclerosis risk in West-African populations of the AWI-Gen Study. Front. Genet. 10 , 1354 (2019).

Dlamini, S. N. et al. Associations between CYP17A1 and SERPINA6/A1 polymorphisms, and cardiometabolic risk factors in Black South Africans. Front. Genet. 12 , 687335 (2021).

Gómez-Olivé, F. X. et al. Regional and sex differences in the prevalence and awareness of hypertension: an H3Africa AWI-Gen Study across 6 sites in sub-Saharan Africa. Glob. Heart 12 , 81–90 (2017).

Nonterah, E. A. et al. Classical cardiovascular risk factors and HIV are associated with carotid intima-media thickness in adults from sub-Saharan Africa: findings from H3Africa AWI-Gen Study. J. Am. Heart Assoc. 8 , e011506 (2019).

Sengupta, D. et al. Genetic substructure and complex demographic history of South African Bantu speakers. Nat. Commun. 12 , 2080 (2021).

Aron, S. et al. The development of a sustainable bioinformatics training environment within the H3Africa bioinformatics network (H3ABioNet). Front. Educ. 6 , 356 (2021).

Acharya, A., Schrauwen, I. & Leal, S. M. Identification of autosomal recessive nonsyndromic hearing impairment genes through the study of consanguineous and non-consanguineous families: past, present, and future. Hum. Genet. https://doi.org/10.1007/s00439-021-02309-9 (2021).

Harripaul, R. et al. Mapping autosomal recessive intellectual disability: combined microarray and exome sequencing identifies 26 novel candidate genes in 192 consanguineous families. Mol. Psychiatry 23 , 973–984 (2018).

Khan, N. M. et al. Updates on clinical and genetic heterogeneity of ASPM in 12 autosomal recessive primary microcephaly families in Pakistani population. Front. Pediatr. 9 , 695133 (2021).

Khan, A. A. et al. p.arg102ser is a common Pde6a mutation causing autosomal recessive retinitis pigmentosa in Pakistani families. J. Pak. Med. Assoc. 71 , 816–821 (2021).

PubMed   Google Scholar  

Manolio, T. A. Using the data we have: improving diversity in genomic research. Am. J. Hum. Genet. 105 , 233–236 (2019).

Knight, H. M. et al. Homozygosity mapping in a family presenting with schizophrenia, epilepsy and hearing impairment. Eur. J. Hum. Genet. 16 , 750–758 (2008).

Hampshire, D. J. et al. MORM syndrome (mental retardation, truncal obesity, retinal dystrophy and micropenis), a new autosomal recessive disorder, links to 9q34. Eur. J. Hum. Genet. 14 , 543–548 (2006).

Atkinson, E. G. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet. 53 , 195–204 (2021).

Elsum, I. et al. Inclusion of indigenous Australians in biobanks: a step to reducing inequity in health care. Med. J. Aust. 211 , 7–9 (2019).

Kowal, E. & Anderson, I. Genetic research in Aboriginal and Torres Strait Islander communities: continuing the conversation (Lowitja Institute, 2012).

Thomson, R. J. et al. New genetic loci associated with chronic kidney disease in an indigenous australian population. Front. Genet. 10 , 330 (2019).

Gaskell, G. et al. Publics and biobanks: pan-European diversity and the challenge of responsible innovation. Eur. J. Hum. Genet. 21 , 14–20 (2013).

Klarin, D. et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat. Genet. 50 , 1514–1523 (2018).

Download references

Author information

Authors and affiliations.

The African Computational Genomics (TACG) Research Group, MRC/UVRI and LSHTM, Entebbe, Uganda

Segun Fatumo

The Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK

Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa

Tinashe Chikowore & Ananyo Choudhury

MRC/Wits Developmental Pathways for Health Research Unit, Department of Paediatrics, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa

Tinashe Chikowore

Division of Psychiatry, University College London, London, UK

Muhammad Ayub & Karoline Kuchenbaecker

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

Alicia R. Martin

Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA

UCL Genetics Institute, University College London, London, UK

Karoline Kuchenbaecker

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Segun Fatumo .

Ethics declarations

Competing interests.

A.R.M. has consulted for 23andMe and Illumina, and has received speaker fees from Genentech, Pfizer, and Illumina. All other authors declare no competing interests.

Peer review

Peer review information.

Nature Medicine thanks Ambroise Wonkam and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Karen O’Leary was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Fatumo, S., Chikowore, T., Choudhury, A. et al. A roadmap to increase diversity in genomic studies. Nat Med 28 , 243–250 (2022). https://doi.org/10.1038/s41591-021-01672-4

Download citation

Received : 17 October 2021

Accepted : 21 December 2021

Published : 10 February 2022

Issue Date : February 2022

DOI : https://doi.org/10.1038/s41591-021-01672-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Developing an optimal stratification model for colorectal cancer screening and reducing racial disparities in multi-center population-based studies.

  • Jianbo Tian

Genome Medicine (2024)

Novel genetic markers for chronic kidney disease in a geographically isolated population of Indigenous Australians: Individual and multiple phenotype genome-wide association study

  • Vignesh Arunachalam
  • Shivashankar H. Nagaraj

Recent advances in polygenic scores: translation, equitability, methods and FAIR tools

  • Ruidong Xiang
  • Martin Kelemen
  • Samuel A. Lambert

Reassessing human MHC-I genetic diversity in T cell studies

  • Roderick C. Slieker
  • Daniël O. Warmerdam
  • Ferenc A. Scheeren

Scientific Reports (2024)

An ensemble penalized regression method for multi-ancestry polygenic risk prediction

  • Jingning Zhang
  • Jianan Zhan
  • Nilanjan Chatterjee

Nature Communications (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

research articles on genetic diversity

Genomics Institute

Bear DNA study to measure impact of conservation actions on genetic diversity

Project to produce roadmap for implementing genomics into conservation-management plans.

Grizzly bear

(Photo by  Aditya Datta )

August 20, 2024

By  Mike Peña

The National Science Foundation will fund research at UC Santa Cruz that will examine the DNA of brown bears in the lower 48 states, where the iconic beast’s numbers have seen catastrophic declines over the last century. The research project will use genetic-sequencing technologies to study the effects of this rapid population decline, as well as the impacts of previous conservation-management actions.

Globally, current rates of extinction across all species are up to 1,000 times higher than historical rates, with over one million species currently threatened with extinction—many within decades. These losses have disproportionately affected megafauna, especially carnivores like the brown bear ( Ursus arctos ). There are approximately 200,000 brown bears of various subspecies left in the world, distributed across northern Eurasia and North America.

Brown bears, which the iconic grizzly is a subspecies of, have been listed by the Endangered Species Act as threatened since 1975. In the late 1800s, the brown-bear population throughout the lower 48 U.S. states was estimated at over 50,000. Today, there are less than 2,000.

In addition to habitat degradation and land-use change, humans have long hunted brown bears for multiple reasons. Compared to the evolutionary timescales that have historically shaped biodiversity, the time scales of human-caused changes are substantially shorter. How these rapid anthropogenic changes will affect global species diversity remains ambiguous without proper monitoring and meaningful measures of change, according to Joanna Kelley, a professor of ecology and evolutionary biology.

Joanna Kelley

“Though genomics can greatly aid conservation initiatives, a lack of testing has limited its impact over the last decade,” said Kelley, the project’s lead investigator. “At times, management recommendations are made based on genetic studies with little thought given to the limitations of the analyses, or the practical considerations for on-the-ground management.”

Enter conservation genomics

Genetics was first applied as a conservation tool in the late 1980s, when genetic markers were used to delineate breeding and management groups for “species-survival plans” in zoos, and to identify parent lineage in order to reconstruct pedigrees. As genetics turned into genomics—whole genome, array-based, reduced representation sequencing—many in the conservation community adopted various approaches to estimate population genetic diversity and “gene flow” among locally defined populations.

The International Union for Conservation of Nature stated in its most recent Global Biodiversity Framework that “genetic diversity of wild and domesticated species [should be] safeguarded, with at least 90 percent of genetic diversity within all species maintained.” However, beyond assisting in tracking managed groups of species, few studies have been done on how genomics can be used to effectively inform the conservation of wild populations, Kelley said.

Comprehensive data and collaboration

For this project, Kelley will use an extensive set of historic and modern brown bear samples to characterize genomic diversity over the last 200 years, how it has changed, and how or if management decisions have impacted the genomic landscape of the species. Her goal is to design conservation-action plans and investigate the consequences of past bottlenecks and long-term small population sizes. She will also recommend potential future management actions using genomic tools.

For the conservation-community more broadly, Kelley’s team will quantify the limitations of various types of genomic data for estimating population-genetic statistics and provide a better understanding of how these statistics relate to conservation status and management actions. For instance, they will assess how management decisions would change depending on which datasets are used, providing important guidelines to the conservation community. 

Over the five-year project, Kelley will partner with the U.S. Fish and Wildlife Service (USFW), the U.S. Geological Survey (USGS), UC Riverside, the University of Montana, and the Washington State University Bear Center. Her intention is that the genomic-monitoring system from this work will be immediately utilized in the field.

“A great example of this kind of database that’s relatable to us is 23andMe or Ancestry, where, when you get a new sample, you can place it on a map to determine genetic ancestry,” Kelley said. “If we can do the same thing and essentially build a 23andMe for bears, we can figure out where bears come from, what is their genetic ancestry. And if we’re thinking about conservation action, like management, we know where populations should be moved from and to.”

The final phase of the project will be the creation of a comprehensive genomic, user-friendly dataset across brown bear populations in the lower 48 states that will allow for the integration of genomic data into annual reports and strengthen future conservation management recommendations.

Kelley said the funding for this research comes at a critical time, given pending “rewilding” efforts into the Bitterroot ecosystem in Montana and Northern Cascades in Washington state. “We will investigate both historic and modern brown-bear populations, with a specific focus on populations for which conservation-management action is actively occurring,” she explained. “This research will go beyond the published page and be integrated into management decisions and policy. We are honored to be collaborating with conservation-action partners who have decades of experience in this realm.”

The project will draw its DNA samples from the longitudinal monitoring done by the federal agencies, biological collections provided by the Smithsonian Natural History Museum and the University of Montana Zoological Museum, as well as modern genetic data. About $1.8 million has been awarded across all partners to fund the project, with just over $500,000 going to UC Santa Cruz.

This research is one of 10 projects receiving a total of $16 million in funding under the  Partnership to Advance Conservation Science and Practice program , a first-of-its-kind collaboration between the National Science Foundation and the Paul G. Allen Family Foundation. Now in its second year, the program is designed to catalyze deep collaboration between researchers advancing basic science and conservation partners engaging in on-the-ground efforts.

For more information, see the  official announcement .

New insights shed light on the enigma of genetic diversity and species complexity

  • Published: 19 August 2024

Cite this article

research articles on genetic diversity

  • Zuobin Zhu 1 ,
  • Conghui Han 2 &
  • Shi Huang 1 , 3  

Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Camellato, B.R., Brosh, R., Ashe, H.J., Maurano, M.T., and Boeke, J.D. (2024). Synthetic reversed sequences reveal default genomic states. Nature 628, 373–380.

Article   CAS   PubMed   PubMed Central   Google Scholar  

de Manuel, M., Kuhlwilm, M., Frandsen, P., Sousa, V. C., Desai, T., Prado-Martinez, J., Hernandez-Rodriguez, J., Dupanloup, I., Lao, O., Hallast, P., et al. (2016). Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science 354, 477–481.

Exposito-Alonso, M., Booker, T.R., Czech, L., Gillespie, L., Hateley, S., Kyriazis, C.C., Lang, P.L.M., Leventhal, L., Nogues-Bravo, D., Pagowski, V., et al. (2022). Genetic diversity loss in the Anthropocene. Science 377, 1431–1435.

Article   CAS   PubMed   Google Scholar  

Fisher, R A. (1930). The Genetical Theory of Natural Selection. Oxford: The Clarendon press.

Book   Google Scholar  

Gao, H., Hamp, T., Ede, J., Schraiber, J.G., McRae, J., Singer-Berk, M., Yang, Y., Dietrich, A.S.D., Fiziev, P. P., Kuderna, L.F.K., et al. (2023). The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153.

Gibbs, R.A., Rogers, J., Katze, M.G., Bumgarner, R., Weinstock, G.M., Mardis, E.R., Remington, K.A., Strausberg, R.L., Venter, J.C., Wilson, R.K., et al. (2007). Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234.

Hu, T.B., Long, M.P., Yuan, D.J., Zhu, Z.B., Huang, Y. M., and Huang, S. (2013). The genetic equidistance result: misreading by the molecular clock and neutral theory and reinterpretation nearly half of a century later. Sci China Life Sci 56, 254–261.

Article   PubMed   Google Scholar  

Huang, S. (2012). Primate phylogeny: molecular evidence for a pongid clade excluding humans and a prosimian clade containing tarsiers. Sci China Life Sci 55, 709–725.

Huang, S. (2008). Inverse relationship between genetic diversity and epigenetic complexity. Nat Prec.

Kuderna, L.F.K., Gao, H., Janiak, M.C., Kuhlwilm, M., Orkin, J.D., Bataillon, T., Manu, S., Valenzuela, A., Bergman, J., Rousselle, M., et al. (2023). A global catalog of whole-genome diversity from 233 primate species. Science 380, 906–913.

Orr, H.A. (2000). Adaptation and the cost of complexity. Evolution 54, 13–20.

Wang, D., Fiebig, O.C., Harris, D., Toporik, H., Ji, Y., Chuang, C., Nairat, M., Tong, A.L., Ogren, J.I., Hart, S.M., et al. (2023). Elucidating interprotein energy transfer dynamics within the antenna network from purple bacteria. Proc Natl Acad Sci USA 120, e2220477120.

Yuan, D.J., Zhu, Z.B., Tan, X.H., Liang, J., Zeng, C., Zhang, J.G., Chen, J., Ma, L., Dogan, A., Brockmann, G., et al. (2014). Scoring the collective effects of SNPs: association of minor alleles with complex traits in model organisms. Sci China Life Sci 57, 876–888.

Zeller, E., Timmermann, A., Yun, K.S., Raia, P., Stein, K., and Ruan, J. (2023). Human adaptation to diverse biomes over the past 3 million years. Science 380, 604–608.

Zhang, S L, Xu, N, Fu, L T, et al. Comparative genomics of macaques and integrated insights into genetic variation and population history. bioRxiv, 2024, 2004: 5883799.

Google Scholar  

Download references

Acknowledgement

This work was supported by Xuzhou Basic Research Project (KC23009), Xuzhou Key Research and Development Plan (KC22096), and the National Natural Science Foundation of China (81701390).

Author information

Authors and affiliations.

Xuzhou Engineering Research Center of Medical Genetics and Transformation, Key Laboratory of Genetic Foundation and Clinical Application, Xuzhou Medical University, Xuzhou, 221004, China

Zuobin Zhu & Shi Huang

Department of Urology, Xuzhou Clinical School of Xuzhou Medical University, Xuzhou Central Hospital, Xuzhou, 221009, China

Conghui Han

Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, 410078, China

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Zuobin Zhu , Conghui Han or Shi Huang .

Rights and permissions

Reprints and permissions

About this article

Zhu, Z., Han, C. & Huang, S. New insights shed light on the enigma of genetic diversity and species complexity. Sci. China Life Sci. (2024). https://doi.org/10.1007/s11427-023-2610-2

Download citation

Received : 09 October 2023

Accepted : 04 May 2024

Published : 19 August 2024

DOI : https://doi.org/10.1007/s11427-023-2610-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

COMMENTS

  1. Determinants of genetic diversity

    Romiguier, J. et al. Comparative population genomics in animals uncovers the determinants of genetic diversity. Nature 515, 261-263 (2014). This study shows a comparative analysis of patterns of ...

  2. Genetic diversity goals and targets have improved, but remain

    Genetic diversity and adaptive potential within populations of all [wild and domestic] species is safeguarded, and all genetically distinct populations are maintained by 2030, and at least 99% of genetic diversity within populations is maintained by 2050 ... Taft HR, McCoskey DN, Miller JM, et al. Research-management partnerships: an ...

  3. The importance of genomic variation for biodiversity ...

    Genetic diversity influences biodiversity, and thus NCP, in two main ways: (1) through standing genetic variation (that is, the particular combination of genes and alleles present at a given time ...

  4. Global genetic diversity status and trends: towards a suite of

    N e determines the influence of genetic drift based on the amount of genetic diversity - when N e is small, genetic diversity is lost from a population faster over time and the random fluctuations in allele frequency caused by genetic drift can overrule the effect of natural selection (Charlesworth, 2009).

  5. Diversity in Human Genetics

    Diversity in Human Genetics. Most human genetic studies have been conducted in European populations, and the results often do not transfer well to other populations. This gap in research ...

  6. Insights into human genetic variation and population history ...

    Research Article. Share on. Insights into human genetic variation and population history from 929 diverse genomes. ... To add to our understanding of human genetic diversity, Bergström et al. generated whole-genome sequences surveying individuals in the Human Genome Diversity Project, which is a panel of global populations that has been ...

  7. Genetic diversity loss in the Anthropocene

    Although genetic diversity is a key dimension of biodiversity (), it has been overlooked in international conservation initiatives ().Only in 2021 did the United Nations (UN) Convention of Biological Diversity propose to preserve at least 90% of all species' genetic diversity (10, 11).Recent meta-analyses of animal populations with genetic marker samples have been used as proxies to quantify ...

  8. Global Commitments to Conserving and Monitoring Genetic Diversity Are

    Genetic diversity also underlies resilient and diverse ecosystems and is a resource for innovation and a margin of safety to protect the welfare of society in ... (Cook and Sgrò 2017), determine how frequently genetic research findings are used by nature management agencies (Bowman et al. 2016), improve dissemination and accessibility of ...

  9. Genetic diversity goals and targets have improved, but remain

    Genetic diversity among and within populations of all species is necessary for people and nature to survive and thrive in a changing world. Over the past three years, commitments for conserving genetic diversity have become more ambitious and specific under the Convention on Biological Diversity's (CBD) draft post-2020 global biodiversity framework (GBF). This Perspective article comments on ...

  10. Management of Genetic Diversity in the Era of Genomics

    1 Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, Norway; 2 NOFIMA, Ås, Norway; 3 The Roslin Institute and R(D)SVS, The University of Edinburgh, Edinburgh, United Kingdom; Management of genetic diversity aims to (i) maintain heterozygosity, which ameliorates inbreeding depression and loss of genetic variation at loci that may become of importance in ...

  11. Diversity and inclusion in genomic research: why the uneven progress?

    Why do genomic research in diverse populations? Motivations to conduct research in the context of genetic diversity are numerous. Increased inclusion facilitates the understanding of health disparities, new discoveries in biology, more accurate matching of diverse patients with safe and effective treatments, improved interpretation of genetic tests, and better tracing of human history.

  12. Embracing Genetic Diversity to Improve Black Health

    Banda Y, Kvale MN, Hoffmann TJ, et al. Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort ...

  13. Global genomic diversity for All of Us

    Global genomic diversity for All of Us. Nature Reviews Genetics 25 , 303 ( 2024) Cite this article. The National Institutes of Health (NIH) All of Us research programme has reported the data ...

  14. Genomic diversity improves disease discovery for all

    Genomic research that represents the global population is essential to ensure that discoveries are applicable to the broadest range of people. Despite growth in human genomic analyses over the past ~15 years, information from individuals of European ancestry still dominates genetic studies. Thus, representation across the spectrum of genetic ...

  15. Scientists Unveil a More Diverse Human Genome

    Published May 10, 2023 Updated May 12, 2023. More than 20 years after scientists first released a draft sequence of the human genome, the book of life has been given a long-overdue rewrite. A more ...

  16. What Is Genetic Diversity and Why Does it Matter?

    Genetic diversity is important because it gives species a better chance of survival. However, genetic diversity can be lost when populations get smaller and isolated, which decreases a species' ability to adapt and survive. In this article, we explore the importance of genetic diversity, discuss how it is formed and maintained in wild ...

  17. (PDF) Genetic Diversity: Its Importance and Measurements.

    Genetic diversity helps to adapt to environmental variability. Organisms live in complex environment that vary in spatial and temporal scale and. is characterized by several factors such as ...

  18. Genetic Diversity, Conservation, and Utilization of Plant Genetic

    Genetic diversity within and between plant species allows plant breeders to select superior genotypes, which can then be used for the development of genetic stock for hybridization programs or the release of a crop variety . ... Digitized molecular data are vital to numerous aspects of scientific research and genetic resource use. Substantial ...

  19. Ambitious survey of human diversity yields millions of undiscovered

    The work has also identified gaps in genetics research on non-white populations. The findings were published on 19 February in a package of papers in Nature 1 , 2 , Communications Biology 3 and ...

  20. What Is Genetic Diversity and Why Does it Matter?

    Genetic diversity has become a hot research topic due to its importance in the health of species and ecosystems. 66 Similarly, gene polymorphism is an essential topic in research because it ...

  21. Special Issue : Genetic Diversity and Molecular Evolution

    Genetic diversity is fundamental to species survival, to the continued evolution of new species and adaptation to changing environments. ... Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this ...

  22. Diversity and scale: Genetic architecture of 2068 traits in the VA

    We present genome-wide associations for 2068 traits from 635,969 participants in the Department of Veterans Affairs Million Veteran Program, a longitudinal study of diverse United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including non-European populations.

  23. A roadmap to increase diversity in genomic studies

    Given the immense and wide-reaching benefits of increasing diversity in genetic research, funders should reconsider such restrictions. In addition to eligibility restrictions, fewer researchers in ...

  24. Bear DNA study to measure impact of conservation actions on genetic

    The research project will use genetic-sequencing technologies to study the effects of this rapid population decline, as well as the impacts of previous conservation-management actions. ... The International Union for Conservation of Nature stated in its most recent Global Biodiversity Framework that "genetic diversity of wild and domesticated ...

  25. New insights shed light on the enigma of genetic diversity ...

    A global catalog of whole-genome diversity from 233 primate species. Science 380, 906-913. Article CAS PubMed Google Scholar Orr, H.A. (2000). Adaptation and the cost of complexity. Evolution 54, 13-20. Article CAS PubMed Google Scholar