Bioinformatic Methods

Unigene V1 Assembly

Publicly available ESTs were downloaded from NCBI on Febuary 1st, 2010. These ESTs were trimmed to remove any vector sequence with cross_match. The sequences for each species were then assembled with the software package cap3 with a -p parameter value of 90. This is a very stringent assembly, leaning toward lower error but possibly higher redundancy left in the contigs.

SSRs

A multi-step procedure has been developed at the Clemson University Genomics Institute to facilitate SSR mining. The following steps are performed on each contig of an assembly:

  1. The script searches for repetitive patterns in the contig consensus sequence that match one of the following criteria:
    • 2 base pair motif (dinucleotide) repeated at least 5 times
    • 3 base pair motif (trinucleotide) repeated at least 4 times
    • 4 base pair motif (tetranucleotide) repeated at least 3 times
    • 5 base pair motif (pentanucleotide) repeated at least 3 times
  2. Primer3 was run with default parameters. The output is entered in the Excel spreadsheet and includes the forward primer, the reverse primer, the melting temperature for the forward primer, the melting temperature for the reverse primer, and the product size between the primers.
  3. Underlying evidence of polymorphism is reported in the Alignment column. The software looks for one of four characteristics: a 2bp or larger gap in consensus sequence in SSR region, multiple 1bp gaps in consensus sequence in SSR region, a gap at either end of consensus with another repeat of the motif at corresponding region of an underlying sequence, a gap of 2bp or more in an underlying sequence. Additionally, these sequences have enough flaking sequence to have primers that amplify the SSR region. This analysis helps to filter the list of potential SSRs down to manageable number, but manual examination and selection is still beneficial.

SNPs

PolyBayes version 3.0 was run utilizing the assembly and quality values associated with the sequences.

Specific parameters were:

  • polybayes.pl -inputFormat ace -readPhdFiles -filterParalogs -screenSnps -prescreenSnps -noconsiderAnchor

Primer3 was run with default parameters. The output is entered in the Excel spreadsheet and includes the forward primer, the reverse primer, the melting temperature for the forward primer, the melting temperature for the reverse primer, and the product size between the primers.