User Guide

Data Analysis (SNP Calling)

Data analysis services may be requested on your project page by checking “SNP call” in the Deliverables window. If SNP calling services are requested, you must enter a link to the appropriate reference genome in the Approved Species window. If a reference sequence for a related species is entered, this species should be no more than ~5% diverged from the species we will be analyzing. If no reference genome is available enter “none” in the “link to reference” field.

Two different public pipelines are used to call SNPs:

(1) TASSEL-GBS v. 3.0 (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090346) that uses a reference sequence to identify single copy loci for SNP calling. SNPs are then called among members of the population submitted for genotyping. Because SNPs are not called against the reference sequence, the major alleles present in your population may or may not match the corresponding reference sequence alleles.

(2) UNEAK (http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003215) a network-based SNP calling pipeline that does not require a reference genome sequence. In general, we feel that better SNP calling is accomplished using the reference-based GBS pipeline.

Deliverables.

The analyzed data consist of an analysis report (pdf), a keyfile that relates barcodes to sample IDs, SNP calls per DNA sample (unfiltered and filtered SNPS are provided in vcf format), the SAM file (a list of sequence tags and their alignment coordinates if a reference sequence was used for analyzing data), text files with heterozygosity estimates and sample level DNA QC data (number of sequences obtained per DNA sample). The analysis report contains detailed explanations of the files included and analysis procedure along with sequence alignment (if applicable), coverage and missingness statistics and an MDS plot of genotypes.

Data filtering.

We provide both unfiltered and filtered SNPs (using thresholds of minor allele frequency (MAF) >1% and site missingness < 20%). This basic filtering eliminates many sequencing errors and low coverage loci. However, these filtering parameters are not appropriate for all populations and do not eliminate SNPs called from alignment of paralogs (i.e., loci with very high heterozygosities). Therefore, you will need to re-filter SNPs using TASSEL v5.0 (http://www.maizegenetics.net/#!tassel/c17q9) using criteria that are most biologically relevant for the population genotyped (i.e. use a MAF appropriate for your particular mapping population and eliminate overly heterozygous loci in diploid species).

Version 2 Updated 2016-02-12

Genomic Diversity Facility LIMS(Retired)User Guide

Data Analysis (SNP Calling)