June 9 2011
From DictyWiki
(diff) ←Older revision | Current revision | Newer revision→ (diff)
Contents |
GO prep for Protein2GO
General issues
- Well on track. emily just sent another list of problematic annotations that I sent corrections to Sidd to implement.
Textpresso GO annotations
Skype call Summary
Review dicty Paper Pipeline and Textpressso
dictyBAse: Papers have been added to the Textpresso corpus as curated; last year not so many because curation focus was on gene models. PubMed searches using keywords (e.g. Dictyostelium) find papers, PDFs are downloaded manually and relevant genes attached. This is a bottleneck; is there an easier way to do upload? multiple papers at once?
Other Options for Paper Download
Can use scp or another file transfer protocol, or give Arun an account on a machine at dictyBase and he can get the papers via a script
Preprints are okay for Textpresso, so can download the full text as soon as it's available
Alternatively, automated downloads from PMC or directly from journal web sites could be put into place, although for downloading from journal sites, there are more specifics
dictyBase could provide Textpresso with relevant PMIDs and Textpresso could set up the download pipeline
PMC has a six-month delay, though, but right now that should not be a problem - we can revisit that in the future
Future Plans
Once new papers are in the corpus, perform CCC search on dicty papers from April 2010 - November 2010. This will allow dicty curators to look over the results and determine how well the searches are working for them.
Update dicty gene list (last update was August 2008), add synonyms, and consider if there are any gene names that might contribute to false positives (e.g., ER for TAIR)
Time frame: mid-June for generating source files for testing search results
Since then
- Petra sent up to date list with gene names and synonyms
- Petra sent PMIDs for uncurated 2010 papers
Gene Curation Update
about 190 complicated genes with changes to go. More comprehensive update when finally done.
Multigenome release 2-19
- ready for release
- should publication page header be changed/removed? Now t states "dictyBase Curated Paper" and can be misleading [1]
Release 2-20
- Setting GAF workflow
- Fix caching issue ?
- Add Harry's data ?
- Add display of orthologs EC numbers to gene page display
Software development future
- Hardware upgrade
- Core OS/VM upgrade
- Which one to choose ? Ubuntu/CentOs ?
- Ubuntu timeline
- CentOs timeline
- Probably ubuntu, it was suggested to go to hardy and then to lucid.
- But how do you take advantage of 64bit systems.
- Tentative release list
Q? : How to fit them in timeline in sync with our plan chart
Importing D.fasciculatum
- Data loading (scaffolds, contigs, genes) [7 days]
- GBrowse setup [2 days]
- Gene Page setup [3 days + color adjustments]
- Search: cross species? (i.e. gene product search) [7 - 14 days]
- BLAST: link out to D.fasciculatum site [3 days]
- discoideum alignments ?
- orthologs?
Stock center strains
List of REMI strains from Christopher Quang Dung Dinh/Adam Kuspa contains links to chromosmal location (i.e. [2]. Chromosome 2 coordinates are already incorrect due to the shift. External data is out of our control but it is possible to store this data on our side by create new feature representing single mutation point, this feature would have location on chromosome and will be linked with strain. Plan and estimate:
- Write middleware for handling new feature
- Figure out data model (5 days)
- Figure out software interface (5 days)
- Write software adaptor working with two schemas (3 weeks)
- Display (gbrowse?, strain page) (2 weeks)