June 9 2011

From DictyWiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 21:46, 6 June 2011 (edit)
YuliaBushmanova (Talk | contribs)
(Importing D.fasciculatum, P.pallidum)
← Previous diff
Revision as of 15:37, 7 June 2011 (edit)
YuliaBushmanova (Talk | contribs)
(Importing D.fasciculatum, P.pallidum)
Next diff →
Line 46: Line 46:
== Importing D.fasciculatum, P.pallidum == == Importing D.fasciculatum, P.pallidum ==
-; loaded scaffolds (supercontigs) ([http://www.ncbi.nlm.nih.gov/nuccore?term=GL883006:GL883030 D.fasciculatum], [http://www.ncbi.nlm.nih.gov/nuccore?term=GL290983:GL291024 P.pallidum], [http://www.ncbi.nlm.nih.gov/nuccore/AY700145 P.pallidum mitochondrion], [http://www.ncbi.nlm.nih.gov/nuccore/DQ340388 P.pallidum ribosomal]) +; genbank records:
 +: [http://www.ncbi.nlm.nih.gov/nuccore/ADHC00000000.1 D.fasciculatum]
 +: [http://www.ncbi.nlm.nih.gov/nuccore/ADBJ00000000.1 P.pallidum], [http://www.ncbi.nlm.nih.gov/nuccore/AY700145 P.pallidum mitochondrion], [http://www.ncbi.nlm.nih.gov/nuccore/DQ340388 P.pallidum ribosomal]
 +; loaded scaffolds (supercontigs)
* named after genbank record * named after genbank record
* added genbank dbxref * added genbank dbxref
Line 100: Line 103:
;D.purpureum ;D.purpureum
-** submitted to GenBank, should gene models be updated?+** submitted to GenBank, should gene models be updated? ([http://www.ncbi.nlm.nih.gov/nuccore/ADID00000000.1])
** has assembly information (we do not show gaps/contigs currently) ** has assembly information (we do not show gaps/contigs currently)

Revision as of 15:37, 7 June 2011


Dicty Meeting 14-18 Aug

  1. Early Bird registration: June 14
  2. Abstract deadline: 15 June
  3. Petra: writing abstract for gene curation - sending draft imminent
  4. SAB meeting: time to get started
  5. Drink coasters as gifts - because we got grant? [1] 1200, round midweight: $320.00 plus shipping. 2500: $378.00, 5000: $495.00

GO prep for Protein2GO

General issues

  1. Discussed outstanding issues with Emily and Tony at GO meeting, and got login for curation tool.
  2. Question of annotations to papers not in PubMed: GOA tool allows annotations with DOI identifiers. Sidd is looking into extending our table to accept DOI numbers. I will then fix these annotations, updating out internal ref to DOI numbers, and this solution would be good for any new annotations in this category to come.
  3. Planned release end of June, so dictyBase can accept the file from GOa and the appending of tRNA, trxA/B and some other annoations goes smoothly.

Textpresso GO annotations

  1. PubMed Central seems to not be a good option as percentage of availability is too low (40%)
  2. Updated way to load papers: We need to manually load into our folder, and Arun gets access to upload regularly from there what's new
  3. Petra nneds to look at categories and maybe update with Dicty Assays etc., but easier when doing paper curation again soon.

Gene Curation Update

  • about 30 complicated genes with changes to go for Bob. Almost there!
  • Writing abstract for meeting, thinking about slides and numbers needed that I can't get myself easily, thinking about paper to write...

Release 2-20

  • Setting GAF workflow
  • Fix caching issue ?
  • Add Harry's data ?
  • Add display of orthologs EC numbers to gene page display

Software development future

Q? : How to fit them in timeline in sync with our plan chart

Importing D.fasciculatum, P.pallidum

genbank records
P.pallidum, P.pallidum mitochondrion, P.pallidum ribosomal
loaded scaffolds (supercontigs)
  • named after genbank record
  • added genbank dbxref
  • add description from genbank (i.e. "Polysphondylium pallidum PN500 unplaced genomic scaffold PPL_scaffold2, whole genome shotgun sequence.")
  • TODO: add reference
  • NOTE: not searchable by dbxref/name
loaded contigs (fake for mitochondrial and ribosomal) (
  • named after genbank record
  • added genbank dbxref
  • TODO: add reference
  • NOTE: not searchable by dbxref/name, mitochondrial genome does not have contigs, need to create artificial one to display in gbrowse.
loaded genes
  • added gene product (excl. "hypotetical protein")
  • TODO: add reference
loaded mRNA & polypeptide features
added SACGB dbxref
added genbank dbxref
added EC dbxref (mitochondrial genes)
added 'codon start/translation_start' prop
  • TODO: add reference
loaded tRNA, rRNA features
  • TODO: add reference
imported ESTs (ppal) [2]
  • aligned to genome 90% (4034 of 4452)
  • TODO: add reference (Gray,M.W. TBestDB [3] Polysphondylium pallidum)
created blast databases (ppal, dfas)

To discuss

naming - where to store strain names (Polysphondylium pallidum PN500, Dictyostelium fasciculatum SH3) - species column? (it is also used in url) - We don't have the D. discoideum strain anywhere! And for discoideum it's probably most likely that we get another strain. [Petra]
ESTs, mitochondria loading? (ppal)
abbreviations (3 letters for DDB/DDB_G) - DFA/DFA_G, PPA/PPA_G
conventional naming is "scaffold_#", which can be confusing for multiple species
used more meaningful GenBank locus/accession, i.e. GL290984
added GenBank external id, search has to be modified to allow search by dbxrefs on features not tied to genes (now features can be searched only by sequence ID (i.e. DDB0231574)
not imported for D.purpureum, data available for all organisms
used GenBank locus/accession as name, not true for mitochondrial genome (i.e. ADBJ01000006)
linkouts: to GenBank (via protein id), to SACGB [4]: via locus tag (results in search) [5] or via inner id (can be derived from fasta)?
skip "hypotetical protein" gene product?
import rferences and tie to features? or genes? or both
citations, track names
    • submitted to GenBank, should gene models be updated? ([6])
    • has assembly information (we do not show gaps/contigs currently)
  • existing search is hardcoded to search discoideum or purpureum data:
    • Gene Names/Synonyms - discoideum only
    • Gene IDs - any
    • ESTs - any
    • dictyBaseDPIDs - any but name comes from SITE_NAME env variable
    • external ids - dicty or all, depending on SITE_NAME env variable
    • Gene Product - not activated on multigenome, searches dicty only.
  • search results display is not suited for cross-species search
Existing search New search
can be rewritten to use species parameter, making search species-specific can be written to use both databases in order to make search species-specific.
would take less time but limit functionality would require complete rethinking of search strategies
would require both sites to use the same search would allow main site to use old search for the transition period
    • GenBank linkouts?

Stock center strains

List of REMI strains from Christopher Quang Dung Dinh/Adam Kuspa contains links to chromosmal location (i.e. [7]. Chromosome 2 coordinates are already incorrect due to the shift. External data is out of our control but it is possible to store this data on our side by create new feature representing single mutation point, this feature would have location on chromosome and will be linked with strain. Plan and estimate:

Write middleware for handling new feature
Figure out data model (5 days)
Figure out software interface (5 days)
Write software adaptor working with two schemas (3 weeks)
Display (gbrowse?, strain page) (2 weeks)
Personal tools