July 7 2011

Multigenome Release - D.fasciculatum, P.pallidum


genbank records
P.pallidum, P.pallidum mitochondrion, P.pallidum ribosomal
added CHADO Phylogeny module tables for storing tree data

Created scripts

loaded NCBI taxonomy from NCBI taxonomy dump files (for dicty nodes only) as phylogenetic tree
only 'species' nodes have record in organism table
added phylogenetic nodes and organisms for for pallidum and fasciculatum strains missing from NCBI taxonomy

For each GenBank record (keeping strain information)

loaded scaffolds (supercontigs)
  • named after genbank record
  • added genbank dbxref
  • add description from genbank (i.e. "Polysphondylium pallidum PN500 unplaced genomic scaffold PPL_scaffold2, whole genome shotgun sequence.")
loaded contigs (fake for mitochondrial and ribosomal) (
  • named after genbank record
  • added genbank dbxref
loaded genes
  • added gene product (excluded "hypotetical protein")
loaded mRNA & polypeptide features
added SACGB dbxref
added genbank dbxref
added EC dbxref (mitochondrial genes)
added 'codon start/translation_start' prop
  • added references (different for main genome and for ribosomal/mitochonrdial genomes)
loaded tRNA, rRNA features
  • added references (different for main genome and for ribosomal/mitochonrdial genomes)
created blast databases (ppal, dfas)
Gene Pages, Genome index pages, Contig pages, GBrowse for each genome

Image:Pallidum.png Image:Fasciculatum.png

Needs to be done

Recheck (with new data loaded by strains)

imported ESTs (ppal) [1]
  • aligned to genome 90% (4034 of 4452)
  • TODO: add reference (Gray,M.W. TBestDB [2] Polysphondylium pallidum)

Adding Strain annotations batchwise into database

  • Yulia created script to enter strain info from Excel table
  • This is essential to keep happening when the SC receives high volume deposits (there are still some in the pipeline)
  • It would take a lot of time to enter 80 or 100 strains one by one, and is more error prone
  • Since users ask for strains as soon as they see it in a new publication, it is also essential to happen in a timely manner after strains are stored.

Divya's Project

  • Creating a website per gene for Harry's data that at a later point can be added as a tab.
  • Graphs created by using FLOT


