July 7 2011

From DictyWiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 15:28, 7 July 2011 (edit)
Pfey@northwestern.edu (Talk | contribs)

← Previous diff
Revision as of 16:03, 7 July 2011 (edit)
YuliaBushmanova (Talk | contribs)
(Multigenome Release - D.fasciculatum, P.pallidum)
Next diff →
Line 5: Line 5:
: [http://www.ncbi.nlm.nih.gov/nuccore/ADHC00000000.1 D.fasciculatum] : [http://www.ncbi.nlm.nih.gov/nuccore/ADHC00000000.1 D.fasciculatum]
: [http://www.ncbi.nlm.nih.gov/nuccore/ADBJ00000000.1 P.pallidum], [http://www.ncbi.nlm.nih.gov/nuccore/AY700145 P.pallidum mitochondrion], [http://www.ncbi.nlm.nih.gov/nuccore/DQ340388 P.pallidum ribosomal] : [http://www.ncbi.nlm.nih.gov/nuccore/ADBJ00000000.1 P.pallidum], [http://www.ncbi.nlm.nih.gov/nuccore/AY700145 P.pallidum mitochondrion], [http://www.ncbi.nlm.nih.gov/nuccore/DQ340388 P.pallidum ribosomal]
 +
 +; added CHADO Phylogeny module tables for storing tree data
 +
 +== Created scripts ==
 +; loaded NCBI taxonomy from NCBI taxonomy dump files (for dicty nodes only) as phylogenetic tree
 +: only 'species' nodes have record in organism table
 +
 +; added phylogenetic nodes and organisms for for pallidum and fasciculatum strains missing from NCBI taxonomy
 +
 +=== For each GenBank record (keeping strain information) ===
; loaded scaffolds (supercontigs) ; loaded scaffolds (supercontigs)
* named after genbank record * named after genbank record
* added genbank dbxref * added genbank dbxref
* add description from genbank (i.e. "Polysphondylium pallidum PN500 unplaced genomic scaffold PPL_scaffold2, whole genome shotgun sequence.") * add description from genbank (i.e. "Polysphondylium pallidum PN500 unplaced genomic scaffold PPL_scaffold2, whole genome shotgun sequence.")
-* '''TODO''': add reference 
-* '''NOTE''': not searchable by dbxref/name 
; loaded contigs (fake for mitochondrial and ribosomal) ( ; loaded contigs (fake for mitochondrial and ribosomal) (
* named after genbank record * named after genbank record
* added genbank dbxref * added genbank dbxref
-* '''TODO''': add reference 
-* '''NOTE''': not searchable by dbxref/name, mitochondrial genome does not have contigs, need to create artificial one to display in gbrowse. 
; loaded genes ; loaded genes
-* added gene product (excl. "hypotetical protein")+* added gene product (excluded "hypotetical protein")
-* '''TODO''': add reference+
; loaded mRNA & polypeptide features ; loaded mRNA & polypeptide features
Line 27: Line 32:
: added EC dbxref (mitochondrial genes) : added EC dbxref (mitochondrial genes)
: added 'codon start/translation_start' prop : added 'codon start/translation_start' prop
-* '''TODO''': add reference+* added references (different for main genome and for ribosomal/mitochonrdial genomes)
; loaded tRNA, rRNA features ; loaded tRNA, rRNA features
-* '''TODO''': add reference+* added references (different for main genome and for ribosomal/mitochonrdial genomes)
; imported ESTs (ppal) [http://www.ncbi.nlm.nih.gov/nucest/EC763026.1] ; imported ESTs (ppal) [http://www.ncbi.nlm.nih.gov/nucest/EC763026.1]
Line 37: Line 42:
; created blast databases (ppal, dfas) ; created blast databases (ppal, dfas)
- 
- 
==Adding Strain annotations batchwise into database== ==Adding Strain annotations batchwise into database==

Revision as of 16:03, 7 July 2011

Contents

Multigenome Release - D.fasciculatum, P.pallidum

HOW CAN WE MANAGE A SMOOTH RELEASE SOON??

genbank records
D.fasciculatum
P.pallidum, P.pallidum mitochondrion, P.pallidum ribosomal
added CHADO Phylogeny module tables for storing tree data

Created scripts

loaded NCBI taxonomy from NCBI taxonomy dump files (for dicty nodes only) as phylogenetic tree
only 'species' nodes have record in organism table
added phylogenetic nodes and organisms for for pallidum and fasciculatum strains missing from NCBI taxonomy

For each GenBank record (keeping strain information)

loaded scaffolds (supercontigs)
  • named after genbank record
  • added genbank dbxref
  • add description from genbank (i.e. "Polysphondylium pallidum PN500 unplaced genomic scaffold PPL_scaffold2, whole genome shotgun sequence.")
loaded contigs (fake for mitochondrial and ribosomal) (
  • named after genbank record
  • added genbank dbxref
loaded genes
  • added gene product (excluded "hypotetical protein")
loaded mRNA & polypeptide features
added SACGB dbxref
added genbank dbxref
added EC dbxref (mitochondrial genes)
added 'codon start/translation_start' prop
  • added references (different for main genome and for ribosomal/mitochonrdial genomes)
loaded tRNA, rRNA features
  • added references (different for main genome and for ribosomal/mitochonrdial genomes)
imported ESTs (ppal) [1]
  • aligned to genome 90% (4034 of 4452)
  • TODO: add reference (Gray,M.W. TBestDB [2] Polysphondylium pallidum)
created blast databases (ppal, dfas)

Adding Strain annotations batchwise into database

  • Yulia created script to enter strain info from Excel table
  • This is essential to keep happening when the SC receives high volume deposits (there are still some in the pipeline)
  • It would take a lot of time to enter 80 or 100 strains one by one, and is more error prone
  • Since users ask for strains as soon as they see it in a new publication, it is also essential to happen in a timely manner after strains are stored.


Divya's Project

  • Creating a website per gene for Harry's data that at a later point can be added as a tab.
Personal tools