MAR 10 2011

From DictyWiki

(Difference between revisions)
Jump to: navigation, search

Revision as of 17:50, 9 March 2011

Contents

GO issues to discuss and/or solve before doing Protein2GO

ISS

  1. Delete ISS annotations InterPo2GO (4704)
  2. ISS to CBS:SignalP, CBS:TargetP, CBS:TMHMM: (519) keep them and create a GO ref? Or delete as well? We now have domains, and I don't think we would annotate like this going forward? Emily also suggests to convert these to ISM
  3. ISS to orthologs [solved]: GO_REF:0000024 [1] will be widened so these can stay as they are.

IC

We have most IC annotations with an internal ref (dictyBase_REF:4), and there is no GO-REF for IC annotations. Since I saw at least one of those annotations that should have a PMID and not internal, I will look at each quickly and make a table for Sidd to change any annotations if needed. For those that are from ISS (and maybe IEA, will see) we'll have to write a new GO ref.

IEA

We should delete ALL IEAs. This includes InterPro2GO, SPKW2GO (seems very limited), IEA by BLAST. We don't have many left of those. GOA will provide InterPro2GO and SPKW2GO? [2]. We curators will not be able to manipulate (delete) IEAs anymore, and need to tell Emily to delete them.
Questions: Can we not show/filter IEAs that are redundant with experimental annotations? And can we also not show Manual 'ND' annotations if there is an IEA or any other?

ND

A few years back, when we annotated orthologs, we added ND annotations to make them comprehensive by the date indicated. But even now we have many ND annotations that have also some IEAs. Once we get more IEAs and regular updates on them, there will be a legacy of ND annotations that should be deleted. Maybe we should delete them all in this clean-up process.

General issues

  1. References: Since GOA will use GO references when there is no PMID (when we so far have internal refs) the question is if we translate it back to our internal refs when we import the file, or if we will use the GO refs and link out. Sidd says, first conversion, then later maybe linkout?
  2. Non protein-coding genes (RNAs): We have at least 35 that have manual, exp annotation(s), and we need to keep those somehow. Need to tag them onto the file, and if we need to annotate more, will annotate in xls spreadsheet in GAF format and submit to Sidd.
  3. If there are redundant IEAs, e'g. experimental annotation to 'nucleus' and IEAs to the same term from InterPro and SPKW, we should not show them on the gene page.
  4. Evidene code expansion: So far, our evidence codes on the GO tab link to this page: [3]. We need to add the new evidence codes. But this does not include the hierarchical nature, and maybe we should link to the GO evidence code page? [4]
  5. Emily asks if we want just D. discoideum annotations or also those to other Dictyostelium species (assuming there are just IEAs). Sidd already said one species per file, and for now it should be D. dicoideum only. I checked how many annotations there are for Dpur in QuickGO and there are just 47 [5]. I don't think it makes sense to import these few now (as it needs work to accomodate), and we can rethink when we do annotation pipelines for all species we'll have databases for.

Gene Curation Update

Update as of 16-FEB0-2011, 6:00 PM CST

  • Curated models: 10747
  • Pseudogenes: 451
  • Skipped: 260
  • Deleted: 277 (this never included those that are deleted in mergers, at least for me as I have that in one line, so definitely is quite a bit more)
  • Annotated RTE/TE: 517
  • Total (taken care of): 12252
  • Inspected/Curated genes (minus deleted): 11975
  • Not dealt with: 755
    • Skipped by Bob but inspected once and 'difficult' (these could be automatically annotated with 'skipped by Bob/date' so we can easily find them again and inspect): 227
    • From list with changes: 143 (85 have just one change that is often without consequences, only 33 have 3 and more changes)
    • Still necessary to inspect as minimum: 495 (not Bob's privately skipped and not those with >3 changes)

Post release

  • Chr2 coordinate shift for gbrowse2: How much coordinate shift we need to do?
  • Seems ok as is and correctly aligned, just a bit 'over-expressed' so it often goes through introns, more than we ever saw on other chromosomes. But we go to T-Browse if in doubt [Petra]

Multigenome release 2-19

  • Sync with dicty release 2-19
    • Run all database patches to get the same infrastructure as of dictyBase.
  • Updating web application library
    • Gene page
    • Blast
    • ID resolver
  • Bug in BLAST reported by Petra

Stock center strains

List of REMI strains from Christopher Quang Dung Dinh/Adam Kuspa contains links to chromosmal location (i.e. [6]. Chromosome 2 coordinates are already incorrect due to the shift. External data is out of our control but it is possible to store this data on our side by create new feature representing single mutation point, this feature would have location on chromosome and will be linked with strain. Plan and estimate:

Write middleware for handling new feature
Figure out data model (5 days)
Figure out software interface (5 days)
Write software adaptor working with two schemas (3 weeks)
Display (gbrowse?, strain page) (2 weeks)
Personal tools