Tracking crop varieties using genotyping-by-sequencing markers; a case study using cassava (Manihot esculenta Crantz)

Ismail Y. Rabbi1*, Peter A. Kulakow1, Joseph A. Manu-Aduening2, Ansong A. Dankyi3, James Y. Asibuo2, Elizabeth Y. Parkes1, Tahirou Abdoulaye1, Gezahegn Girma1, Melaku A. Gedil1, Punna Ramu4, Byron Reyes5 and Mywish K. Maredia6

Abstract cassava1

Background: Accurate identification of crop cultivars is crucial in assessing the impact of crop improvement research outputs. Two commonly used identification approaches, elicitation of variety names from farmer interviews and morphological plant descriptors, have inherent uncertainty levels. Genotyping-by-sequencing (GBS) was used in a case study as an alternative method to track released varieties in farmers' fields, using cassava, a clonally propagated root crop widely grown in the tropics, and often disseminated through extension services and informal seed systems. A total of 917 accessions collected from 495 farming households across Ghana were genotyped at 56,489 SNP loci along with a "reference library" of 64 accessions of released varieties and popular landraces.

Results: Accurate cultivar identification and ancestry estimation was accomplished through two complementary clustering methods: (i) distance-based hierarchical clustering; and (ii) model-based maximum likelihood admixture analysis. Subsequently, 30 % of the identified accessions from farmers' fields were matched to specific released varieties represented in the reference library. ADMIXTURE analysis revealed that the optimum number of major varieties was 11 and matched the hierarchical clustering results. The majority of the accessions (69 %) belonged purely to one of the 11groups, while the remaining accessions showed two or more ancestries. Further analysis using subsets of SNP markers reproduced results obtained from the full-set of markers, suggesting that GBS can be done at higher DNA multiplexing, thereby reducing the costs of variety fingerprinting. A large proportion of discrepancy between genetically unique cultivars as identified by markers and variety names as elicited from farmers were observed. Clustering results from ADMIXTURE analysis was validated using the assumption-free Discriminant Analysis of Principal Components (DAPC) method.

Conclusion: We show that genome-wide SNP markers from increasingly affordable GBS methods coupled with complementary cluster analysis is a powerful tool for fine-scale population structure analysis and variety identification. Moreover, the ancestry estimation provides a framework for quantifying the contribution of exotic germplasm or older improved varieties to the genetic background of contemporary improved cultivars.

Keywords: Cassava, Variety identification, Impact assessment, Genotyping-by-sequencing, Ancestry estimations