Sequence capture in lodgepole pine and interior spruce

K.A. Hodgins, S. Yeaman, K. Nurkowski, R.D. Mellway, J.A. Holliday, L.H. Rieseberg, S.N. Aitken

Local adaptation is common in widespread conifer species and current reforestation policy reflects this through local seed sourcing and breeding programs. However, as the climate changes local tree populations may become maladapted to their environments. Our goal is to identify the genes responsible for climatic adaptation in western Canada’s two most economically important conifers, lodgepole pine (Pinus contorta) and interior spruce (Picea glauca, P. engelmannnii, and their hybrids). As the genomes of these species are very large (>20Gb), in the Adaptree Project, we are using sequence capture methods to target our sequencing efforts to regions of interest. To identify these regions, we developed a de novo transcriptome for each species and conducted an RNAseq expression study. We retained a single isoform of each gene that was expressed in our RNAseq study, or had gene ontology terms potentially related to climate. We also included candidate adaptive loci identified in previous studies and loci currently being mapped in white spruce (Arborea project). To avoid reduced hybridization efficiency resulting from probes spanning intron/exon boundaries, we aligned the transcripts to the draft white spruce genome (SMarTForests Project) and the draft loblolly pine genome (PineRefSeq Project) and identified these boundaries. We removed repetitive sequences as well as mitochondrial and chloroplast genes to avoid capturing sequences that have many copies within a single cell. We were able to identify 70,834 exons from 28,437 genes in pine (6,732 sequences with unidentified exon boundaries) and 75,799 exons from 35,957 genes in spruce (10,531 sequences with unidentified exon boundaries) for resequencing. We also identified non-coding, non-repetitive sequences from the draft genome of white spruce and from low-coverage whole genome shotgun sequence of lodgepole pine. These putatively neutral target regions will allow us to control for demographic history during our search for loci under selection. Our preliminary test of the NimbleGen sequence capture protocol was run on 12 interior spruce and 12 lodgepole pine. More than 90% of the target sequences had reads aligning to them and 40-60% of the reads were on target, demonstrating that the sequence capture protocol is enriching our libraries for the regions of interest. From these data we were able to identify ~600,000 SNPs in each species. These SNPs will be used for analyzing allele-environment and allele-phenotype associations. We will select ~25k of these SNPs and genotype a further ~5000 individuals per species. The objective of this project is to improve seed-transfer policy in response to climate change by comparing the adaptive genetic portfolio of seedlots from seed orchards and breeding programs to the climatic distribution and landscape genomics of natural populations.