The data package contains four datasets:
- No genome information was available for C. vestalis. In order to build a draft reference genome and to develop SNP assays, we sequenced the entire genome of C. vestalis on a single lane of paired-end sequences (2x100 bp) on an Illumina HiSeq 2000 (Illumina Inc., U.S.A.) instrument. The SNP discovery panel consisted of eight C. vestalis females, one from each of eight fields.Before assembly, Illumina reads were trimmed using an in-house Perl script that trims the sequence as soon as two consecutive bases have a quality score lower than 20. Reads that after trimming had a length smaller than 50 bp were removed from the analysis.
- From our list of putative SNPs across the C. vestalis genome, we selected 100 SNPs for genotyping assay development. We first selected the 200 largest scaffolds; they varied in length from 17-58Kb and contained a total of 7,878 SNPs. We then removed SNPs with a minor allele frequency (MAF) <0.2, SNPs that had another SNP within 50 bp up- or downstream, and SNPs with more than 2 alleles.
- SNP genotypes of 139 Cotesia vestalis females collected in Western Taiwan at 98 polymorphic SNPs
- Individual paired-end reads were aligned against the artificial Cotesia vestalis reference genome obtained from the de novo genome assembly using BWA. The resulting BAM file was then used for the identification of putative SNPs using SAMTOOLS and varFilter from the samtools.pl utility. We only considered nucleotide substitutions and ignored small indels. SNPs were filtered that had a mapping quality higher than 20, a minimum read depth of 3 and a maximum read depth of 90 (3x the average read depth, a strategy to avoid orthologous SNPs, e.g. in multi copy genes.