APR 14 2011

From DictyWiki

Jump to: navigation, search

Contents

GO prep for Protein2GO

What 's done

  1. Deleted all ISS WITH anything other than an ortholog.
  2. Imported all IEAs from QuickGO/Dicty
  3. This created a lot of redundancy

Progress

  1. Fixed 46 problematic 'with' annotations (typos/wrong identifiers)
  2. Filtering: GOA can filter IEAs for us. The filtering steps are as follows:
    1. Where 2 or more InterPro2GO mappings have applied the same GO term to the same protein, we filter to display just one of the InterPro2GO annotations.
    2. Where annotations to a protein have been made by multiple IEA methods, we filter out those IEA annotations that have applied a less granular GO term.
    3. Finally, where annotations to a protein have been made by multiple IEA methods which have applied the same GO term, we have a method ranking, where we prefer some methods above others (see below list). Therefore EC annotations will be kept in preference to annotations made by all other methods and InterPro2GO annotations are always removed if another method supplies the same GO term to the same protein:
  • EC2GO
  • UniProtKB-SubCell 2GO
  • HAMAP2GO
  • Compara
  • UniProtKB-SPKW2GO
  • InterPro2GO

Filtering tested on one gene, mkcA. It works perfectly and no further action needed, i.e. at present we don't need to separate IEAs from experimental annotations, and do not need to fliter at dictyBase!

Related Actions

  1. Map EC to genes through ortholog data. Yulia, can you please update?

Action: Add display of orthologs EC numbers to gene page display (in next release)

General issues

  1. Need to ask Emily about Chr2 repeat genes and annotations in GOA -- Reply from Emily: "If dictyBase's gp2protein state that the 2 gene identifiers should be mapped to the same UniProtKB accession, shouldn't dictyBase be able to use this mapping to supply the 2 gene identifiers with annotations from the UniProtKB accession?"
  2. Non protein-coding genes (RNAs): We have at least 35 that have manual, exp annotation(s). We need to tag them onto the file, and if we need to annotate more, will annotate in xls spreadsheet in GAF format and submit to Sidd.
  3. Evidene code expansion: We will link to the GO evidence code page [1]
  4. References: Since GOA will use GO references when there is no PMID (when we so far have internal refs) the question is if we translate it back to our internal refs when we import the file, or if we will use the GO refs and link out. Sidd says, first conversion, then later maybe linkout?


TextPresso GO Component Annotations

  1. Call with Kimberly on March 23.
  2. There is a new Textpresso version and we need to update our link and 'dictyize' the header. [2]. It now allows for searching specific sections of a paper and contains more categories, which we can modify, or create a new.
  3. Curators will use wormbase Textpresso curation tool to approve annotations suggested by the pipeline. Kimberley said the tool can write a GAF file that we can add on to our GOA GAF.
  4. Problem: Loading of PDFs into Textpresso labor intense and hard to do systematically. We (Bob) currently load one by one when new papers are associated to genes. However, when a formatted PDF is not yet available, it will not be loaded until manually curated, and this is too late for semi-automatic GO component annotation. Sidd thinks we could automate import of those papers that are in PubMed Central. Needs to be explored.
  5. Another skype call with Kimberly and Arun is planned for later this month.

New Grant Funding

  1. Cut 'more genomes' planned for 16 w in 2014 (Petra)
  2. (Sidd)
  • Cut loading of any more extra genomes (Aim 2a).
Rather enhanced the existing dpur data.
Shift focus on loading resequenced genomes and strain variations.
  • Cut loading of protein-protein interaction data (Aim 2d).

Gene Curation Update

Update as of 14-APR-2011, 10:00 AM CST

  • Curated models: 11068
  • Pseudogenes: 566
  • Skipped: 352
  • Deleted: 429 (this never includes those that are deleted in mergers, definitely an underestimate, but we can query at some point)
  • Annotated RTE/TE: 517
  • Total (taken care of): 12932
  • Inspected/Curated genes (minus deleted): 12501
  • Not dealt with: 470
    • Skipped by Bob but not 'officially' 'difficult genes: 121
    • From list with changes: 349

Multigenome release 2-19

  • Sync with dicty release 2-19 (Done)
    • Run all database patches to get the same infrastructure as of dictyBase.
  • Updating web application library
    • Gene page (Working)
    • Blast
    • Bug in BLAST reported by Petra


Stock center strains

List of REMI strains from Christopher Quang Dung Dinh/Adam Kuspa contains links to chromosmal location (i.e. [3]. Chromosome 2 coordinates are already incorrect due to the shift. External data is out of our control but it is possible to store this data on our side by create new feature representing single mutation point, this feature would have location on chromosome and will be linked with strain. Plan and estimate:

Write middleware for handling new feature
Figure out data model (5 days)
Figure out software interface (5 days)
Write software adaptor working with two schemas (3 weeks)
Display (gbrowse?, strain page) (2 weeks)
Personal tools