Discovery of genetic diversity left behind by the bottleneck of cotton domestication

Background
It is quite likely that most people go about their daily lives without considering the many ways in which they are enjoying the use of “single-celled epidermal seed trichomes”. These seed hairs, colloquially termed cotton fibers, comprise the foundation of the world’s most important textile plant and a highly flexible industrial product, being found in everything from our blue jeans and t-shirts to cotton swabs to our paper currency. Yet, much like an ear of corn or a head of broccoli, nature doesn’t contain plants that look like modern annualized row crops with long, strong, fine, white fibers bursting forth from “cotton bolls” (botanically, capsules) at maturity. Instead, these modern highly derived forms were domesticated over millennia of strong directional selection from wild plants that look, well, wildly dissimilar from the modern forms (Fig. 1). 

Fig. 1. An illustrative figure shows the sequential genetic bottlenecks that winnow diversity during crop domestication, in this case for upland cotton, Gossypium hirsutum. From left to right are shown the transitions of wild forms, to early cultigens, to landraces, and eventually to modern cultivars, over a history spanning approximately 4000 years.
Gossypium hirsutum, or upland cotton has  a domestication history that traces to about 4000 years ago (Viot & Wendel, 2023). Modern cultivated forms account for most global cotton today. Thousands of years of selection under domestication have transformed wild cotton from a perennial shrub or small large tree with small capsules and seeds bearing short, brownish fiber  into a modern annual row-crop plant with long strong white cotton fibers. This long domestication process has entailed sequential genetic bottlenecks, thus leading to a small gene pool in the crop plant relative to the wild populations. Although this generality is clear, it has yet to be adequately quantified, and it remains unknown how much diversity actually exists in nature. Wild cotton is thus an important genomic resource for cotton cultivar improvement.

Motivation 
Wild cotton today is a relatively uncommon plant, this rarity resulting from both intentional human activities, including habitat destruction and intentional eradication as a putative reservoir of boll weevil, and unintentional processes such as increased competition from other plants in fire and flood-controlled habitats and natural population turnover due to storms and natural succession. Truly wild cotton has a large aggregate but locally sporadic natural distribution in the drier coastal areas of the American tropics and subtropics, including areas of the northern Yucatan Peninsula, SW Florida, SW Puerto Rico, and a number of additional islands throughout the Caribbean. Initial domestication is thought to have occurred perhaps 4,000 years ago in the northern Yucatan (Viot and Wendel, 2023), with subsequent spread as a semi-domesticated plant throughout both its natural range and far beyond to its present status as the world’s most important fiber plant. Accompanying the many millennia of human-mediated dispersal and plant improvement was a complicated history of repeated escape of semi-domesticates back into natural settings, as well as hybridization with a second, independently domesticated cotton, G. barbadense.  These reestablished or naturalized semi-wild cottons are often difficult to distinguish from the rarer, truly wild populations. Thus, understanding the genomic diversity of wild cotton is challenging.  It also is important for our appreciation of the true scope of the natural gene pool, the portion of it that was captured by the domestication process, and by implication, the total diversity which was never incorporated into the crop plant and thus represents a precious natural gene pool reservoir. 
Recently, we used broad population sampling of a multitude of cotton varieties, wild populations, and semi-domesticated and feral forms to provide a comprehensive phylogenomic framework for global cotton diversity (Yuan et al., 2021). Using whole genome resequencing data, over 800 samples were genetically resolved into four genetic groups based on their domestication status (e.g., truly wild vs feral/primitively domesticated) and genetic relationships: one group includes wild forms; two clusters were resolved that include primitively domesticated and feral populations, these termed landrace1 (abbreviation LR1, mostly from  Caribbean islands and adjacent landmasses) and landrace2 (abbreviation LR2, from Central America); and modern crop cultivars from the major cotton growing regions of the world. We note that our sampling of wild cottons was limited to  relatively scant sampling that exists in germplasm collections and thus were from relatively few locaties, which likely insufficiently captures the scope of genetic diversity within wild G. hirsutum.

In this study
Near the northern limit of the natural range of wild cotton, Mound Key, located off the SW coast of Florida (USA), is an isolated small (51 hectare) island that harbors a population of G. hirsutum that appears truly wild (Fig. 2), being fully integrated into the native vegetation and with morphological characteristics that typify wild plants (shrubby to tree-like habit, small flowers and capsules, sparse and shorter brownish seed hairs). We collected samples from 25 individuals, and used whole genome sequencing to compare genetic variation in this population with the more global sampling of cotton diversity in Yuan et al (2021). 

Fig. 2. Collection of Mound Key cotton. a) Sampling with assistance from Park Manager Zachary Lozano, Park Biologists Justin Lamb and Karen Rogers, and biologist Jennifer McGann. b) The flower morphology of the Mound Key cotton population. c) The Multi-branching shrub of Mound Key cotton.
Analysis of the data showed that, interestingly, Mound Key cotton comprised a genetic group that was distinct from other known wild, landrace, and domesticated cottons. This surprising discovery suggests that the Mound Key cottons are not feral but are truly wild, representing a pocket of previously unrecognized wild genetic diversity. It is remarkable that sampling of wild cotton plants from one tiny island near the margin of the natural range of G. hirsutum, and in an area long suffering from the ecological impacts of habitat destruction, reveal a pocket of unsuspected and apparently novel genetic diversity. By implication, one wonders how much more undiscovered diversity remains to be discovered in remnant natural populations.

Fig. 3 Population genetic structure analysis of Mound Key and four a priori designated groups of Gossypium hirsutum, including cultivars, landrace 1 & 2, and wild cottons, using genomic SNPs. Each group is represented by a different color and shape in a) PCA and b) rooted neighbor-joining phylogenetic tree (with two G. mustelinum as outgroup). c) LEA genetic structure for three or four ancestral populations (K = 3 and K = 4). Each bar is labeled by an individual sample name (i.e., Accession/Group_SampleID) and filled with different lengths of colors corresponding to proportions of ancestral population signals.

Implications
Our discovery of novel diversity in relictual plant populations might not have been surprising, we have been studying species that have life history characteristics that promote retention of diversity (e.g., high effective population size, outcrossing), but instead, cotton populations are widely scattered, typically quite small, and with high levels of generalized inbreeding.  Our interpretation is that wild G. hirsutum exists as a series of loosely connected inbreeding populations that disperse widely to form geographically isolated populations, as exemplified by our paper for wild cotton. That these pockets of diversity are worth preserving, apart from their intrinsic value as integrated components of natural ecosystems, is also justified by their potential agronomic importance. Consider the significance of the observation, for example, that the Mound Key cotton population (Fig 2) were completely submerged for eight or more hours by the Hurricane Ian storm surge in late Sept., 2022, yet they are robust and flowering two years later. This speaks to their potential for revealing mechanisms of adaptation to salt stress in agronomic settings. Zooming out from this microcosmic example, wild relatives of the plants that sustain humankind represent critical genomic resources (Ebert and Engels, 2020) that deserve prompt attention in the face of ongoing habitat loss, both for our understanding of eco-evolutionary processes and for long-term genomic resource management and utilization.

Reference 
Viot, C. R., & Wendel, J. F. (2023). Evolution of the cotton genus, Gossypium, and its domestication in the Americas. Critical Reviews in Plant Sciences, 42(1), 1–33.
Yuan, D., Grover, C. E., Hu, G., Pan, M., Miller, E. R., Conover, J. L., Hunt, S. P., Udall, J. A., & Wendel, J. F. (2021). Parallel and intertwining threads of domestication in allopolyploid cotton. Advanced Science, 8(10), 2003634.
Ebert, A. W., and Johannes M. M .Engels. (2020). Plant biodiversity and genetic resources matter!. Plants 9, (12): 1706.

Hot Topics

Related Articles