MAR 10 2011

From DictyWiki

Jump to: navigation, search

Contents

GO prep for Protein2GO

ISS

  1. Deleted ISS annotations InterPo2GO (4704)
  2. Deleted ISS to CBS:SignalP, CBS:TargetP, CBS:TMHMM: (519)
  3. Deleted ISS to EC annotations: (33 / 28 genes)
  4. Deleted ISS to KEGG annotations: (11 / 10 genes)
  5. total ISS deleted: 5330
  6. Deleted IC annotations that are dependent on ISS that got deleted (132)
  7. ISS to orthologs [solved]: GO_REF:0000024 [1] will be widened so these can stay as they are.

IEA

  1. Deleted 10610 from test
  2. Got 47828 on test
  3. We should filter IEAs that multiple and redundant from the gene page, also those that are redundant with experimental annotations? And can we also not show Manual 'ND' annotations if there is an IEA or any other?

Analysis

  1. Automatic, simple, IEA/ISS exact matching
    1. 102 match
    2. 5228 do not match
  2. Manual Analysis
    1. 34 genes analyzed, for which we have 118 annotations compared, and a representative mixture from InterPro, Pfam, EC, KEGG, and TMHMM
    2. total a gain of 7 annotations. When we lost specificity for InterPro2GO/Pfam, it was always backed up by actual InterPro annotations and/or domain associations.
    3. Summary: All InterPro2GO have been replaced with either more accurate, or more up-to date IEAs. Many of these annotations on prod are invalid now, because either the InterPro lost the GO annotation, the InterPro domain got deleted, or the Dicty protein now hits a more general domain as they also changed 'stringency'. But we more often than not lost specificity when deleting EC annotations. However, the total number of 33 annotations is neglectable, and since some did get perfectly replaced, possibly also a sense-making update!

Conclusions

  1. Delete all ISS like on test on production.
  2. Redundancy of parent/child relationships: either create two tabs/sections for experimental and electronic annotations, or do not show parent terms if there is a child term, go to the finest granularity. For two tabs, default should be EXP view before IEA if there is EXP. Question is how to display if there is EXP to some aspect(s) but not all.
  3. Redundancy of identical terms:Collapse identical terms in gene summary page

Related Actions

  1. Map EC to genes through ortholog data


General issues

  1. Need to ask Emily about Chr2 repeat genes and annotations in GOA
  2. Need to clean up bad with annotations as sent from Emily
  3. Non protein-coding genes (RNAs): We have at least 35 that have manual, exp annotation(s), and we need to keep those somehow. Need to tag them onto the file, and if we need to annotate more, will annotate in xls spreadsheet in GAF format and submit to Sidd.
  4. Evidene code expansion: We will link to the GO evidence code page [2]
  5. References: Since GOA will use GO references when there is no PMID (when we so far have internal refs) the question is if we translate it back to our internal refs when we import the file, or if we will use the GO refs and link out. Sidd says, first conversion, then later maybe linkout?

Gene Curation Update

Update as of 10-MAR-2011, 6:00 PM CST

  • Curated models: 10887
  • Pseudogenes: 504
  • Skipped: 294
  • Deleted: 318 (this never includes those that are deleted in mergers, definitely an underestimate, but we can query at some point)
  • Annotated RTE/TE: 517
  • Total (taken care of): 12520
  • Inspected/Curated genes (minus deleted): 12202
  • Not dealt with: 817
    • Never inspected: 125
    • Skipped by Bob but not 'officially' 'difficult genes: 254
    • From list with changes: 438

Post release

  • Chr2 coordinate shift for gbrowse2: How much coordinate shift we need to do?
  • Seems ok as is and correctly aligned, just a bit 'over-expressed' so it often goes through introns, more than we ever saw on other chromosomes. But we go to T-Browse if in doubt [Petra]

Multigenome release 2-19

  • Sync with dicty release 2-19
    • Run all database patches to get the same infrastructure as of dictyBase.
  • Updating web application library
    • Gene page
    • Blast
    • ID resolver
  • Bug in BLAST reported by Petra

Stock center strains

List of REMI strains from Christopher Quang Dung Dinh/Adam Kuspa contains links to chromosmal location (i.e. [3]. Chromosome 2 coordinates are already incorrect due to the shift. External data is out of our control but it is possible to store this data on our side by create new feature representing single mutation point, this feature would have location on chromosome and will be linked with strain. Plan and estimate:

Write middleware for handling new feature
Figure out data model (5 days)
Figure out software interface (5 days)
Write software adaptor working with two schemas (3 weeks)
Display (gbrowse?, strain page) (2 weeks)


Quick GMOD meeting recap

Personal tools