MAY 12 2011

From DictyWiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 14:31, 12 May 2011 (edit)
YuliaBushmanova (Talk | contribs)
(Multigenome release 2-19)
← Previous diff
Revision as of 14:43, 12 May 2011 (edit)
YuliaBushmanova (Talk | contribs)

Next diff →
Line 1: Line 1:
==GO prep for Protein2GO== ==GO prep for Protein2GO==
-===Related Actions=== 
-#Map EC to genes through ortholog data. Yulia, can you please update? 
-Action: Add display of orthologs EC numbers to gene page display (in next release) 
===General issues=== ===General issues===
Line 75: Line 71:
* Fix caching issue ? * Fix caching issue ?
* Add Harry's data ? * Add Harry's data ?
 +* Add display of orthologs EC numbers to gene page display
== Software development future == == Software development future ==

Revision as of 14:43, 12 May 2011


GO prep for Protein2GO

General issues

  1. Need to ask Emily about Chr2 repeat genes and annotations in GOA -- Reply from Emily: "If dictyBase's gp2protein state that the 2 gene identifiers should be mapped to the same UniProtKB accession, shouldn't dictyBase be able to use this mapping to supply the 2 gene identifiers with annotations from the UniProtKB accession?"
  2. Non protein-coding genes (RNAs): We have at least 35 that have manual, exp annotation(s). We need to tag them onto the file, and if we need to annotate more, will annotate in xls spreadsheet in GAF format and submit to Sidd.
  3. Evidene code expansion: We will link to the GO evidence code page [1]
  4. References: Since GOA will use GO references when there is no PMID (when we so far have internal refs) the question is if we translate it back to our internal refs when we import the file, or if we will use the GO refs and link out. Sidd says, first conversion, then later maybe linkout?

Textpresso GO annotations

Skype call Summary

Review dicty Paper Pipeline and Textpressso

Current Textpresso for dicty probably used more by dictyBase users than by curators

Papers have been added to the Textpresso corpus as curated; last year not so many because curation focus was on gene models

PubMed searches using keywords (e.g. Dictyostelium) find papers, PDFs are downloaded manually and relevant genes attached

This is a bottleneck; is there an easier way to do this, e.g. upload multiple papers at once?

Other Options for Paper Download

Can use scp or another file transfer protocol, or give Arun an account on a machine at dictyBase and he can get the papers via a script

Preprints are okay for Textpresso, so can download the full text as soon as it's available

Alternatively, automated downloads from PMC or directly from journal web sites could be put into place, although for downloading from journal sites, there are more specifics

dictyBase could provide Textpresso with relevant PMIDs and Textpresso could set up the download pipeline

PMC has a six-month delay, though, but right now that should not be a problem - we can revisit that in the future

Future Plans

Once new papers are in the corpus, perform CCC search on dicty papers from April 2010 - November 2010. This will allow dicty curators to look over the results and determine how well the searches are working for them.

Update dicty gene list (last update was August 2008), add synonyms, and consider if there are any gene names that might contribute to false positives (e.g., ER for TAIR)

Time frame: mid-June for generating source files for testing search results

Since then

  1. Petra sent up to date list with gene names and synonyms
  2. Petra sent PMIDs for uncurated 2010 papers

Gene Curation Update

about 200 complicated genes to go

Multigenome release 2-19

  • ready for release
  • should publication page header be changed/removed? Now t states "dictyBase Curated Paper" and can be misleading [2]

Release 2-20

  • Setting GAF workflow
  • Fix caching issue ?
  • Add Harry's data ?
  • Add display of orthologs EC numbers to gene page display

Software development future

Q? : How to fit them in timeline in sync with our plan chart

Importing D.fascicuatum

  • Data loading (scaffolds, contigs, genes) [7 days]
  • GBrowse setup [2 days]
  • Gene Page setup [3 days + color adjustments]
  • Search: cross species? (i.e. gene product search) [7 - 14 days]
  • BLAST: link out to D.fasciculatum site [3 days]
  • discoideum alignments (also have to be updated for purpureum) [5 days]
  • orthologs?

Stock center strains

List of REMI strains from Christopher Quang Dung Dinh/Adam Kuspa contains links to chromosmal location (i.e. [3]. Chromosome 2 coordinates are already incorrect due to the shift. External data is out of our control but it is possible to store this data on our side by create new feature representing single mutation point, this feature would have location on chromosome and will be linked with strain. Plan and estimate:

Write middleware for handling new feature
Figure out data model (5 days)
Figure out software interface (5 days)
Write software adaptor working with two schemas (3 weeks)
Display (gbrowse?, strain page) (2 weeks)
Personal tools