Gene Ontology (GO) Curation

From DictyWiki

Jump to: navigation, search


General Info

General Guidelines

Always annotate to all three ontologies
In other words, annotate to biological_process, molecular_function, and cellular_component. If the term is not known, use the root term:
  • biological_process ; GO:0008150
  • molecular_function ; GO:0003674
  • cellular_component ; GO:0005575

Multiple annotations
  • In general, dictyBase practices multiple annotations to a term with different references.
  • When a paper shows the same thing twice using two very different assays resulting in the same annotation with two different evidence codes, annotate to that term twice. If the assays result in terms with different granularity, annotate to parent and child term.

The With/From column

Entering information in the with/from column requires certain formatting
  • When entering multiple database objects in the with field, do not enter the database for the first db object, however, you must enter the database for subsequent db objects. For example:
DDB:(selected db in drop-down)     DDB0185021|DDB:DDB0232349|DDB:DDB0215364
  • When using the IC (Inferred by Curator) evidence code, you must enter a GO ID in the with/from field. When doing so, it is imperative that you enter all seven digits of the GO ID. For example, enter GO:0003700 rather than GO:3700. Failure to do this will result in an improper display.

Entering database objects with the IPI evidence code
  • Even if there is only one possible gene product that can be entered in the with column (using a specific protein binding term, e.g., Rho GTPase binding, profiling binding, filamin binding etc., as opposed to GO:5515 protein binding), use IPI.
  • If there are multiple possible gene products that can be entered in the with column, use IPI with the specific gene product.

dictyBase Unpublished References


Use of Evidence Codes

IC (Inferred by Curator)
  • TAS versus IC: A more liberal use of the IC evidence code is a recurring theme from the Stanford GO Annotation Camp. IC is newer than TAS and so some people are just not accustomed to using it since TAS was in use for so long. TAS is now mostly used for statements in reviews.
  • When an author states that something is something-or-other (based on orthology or other evidence), rather than using TAS (or when ISS is inappropriate), use IC. Example genes: nox genes, fpaA/B.

ISS (Inferred from Sequence or Structural Similarity)
  • Change dictyBase unpublished (reference_no=10155) to a published reference when it is published. If the dictyBase unpublished ISS annotation is more granular than the published ISS annotation, it is okay to keep the dictyBase unpublished.
  • If the first paper for a gene has an ISS annotation and then a second paper shows experimental evidence for that same term, annotate with both the original ISS annotation and the new experimental evidence code.
  • If a single reference has ISS plus experimental evidence for a GO term, use only the experimental evidence code. However, if the ISS annotation is more granular than the experimental annotation, use both evidence codes.

TAS (Traceable Author Statement)
  • Do not make multiple TAS annotations for a gene to the same term.
  • Once an experimental evidence code (IDA, IMP, IPI, IEP, IGI) has been entered for an annotation (or more granular annotations), annotations using the TAS evidence code can be deleted.

ND (No Biological Data Available)
  • When a function/process/component is 'unknown,' use dictyBase ND (reference_no=9851).
  • When an author explicitly states that a function/process/component is unknown, annotate the gene product with ND using the reference_no of the publication (this is already part of the GO documentation).

TAS versus ISS

When authors state that a gene is involved in a certain process, however, it has never been explicitly shown in Dictyostelium and probably never will be, the evidence code to use is ISS, not TAS. See Curator Issue 0068

The NOT Qualifier

The NOT qualifier is only for truly unexpected results. For example, mlkA is by similarity and function classified as a CAM kinase but it has been shown that it is not regulated by Ca2+/calmodulin, hence the NOT annotations for GO:5516: calmodulin binding and GO:4685: calcium- and calmodulin-dependent protein kinase activity. Negative results from general tests should not be annotated.

Data Not Shown

"Data not shown" is acceptable for annotations with experimental evidence codes. Since publications are all peer-reviewed, these statements are presumably reliable.

Confusing GO Terms

  • cytoplasm vs. cytosol:
  • cell-matrix adhesion vs. cell-substrate adhesion:
  • cell motility vs. cell migration vs. chemotaxis:
  • development vs. fruiting body formation (or a more specific development term):
  • actin cytoskeleton vs. actin filament: If a gene product colocalizes with actin, annotate to 'actin cytoskeleton' rather than 'actin filament.' Based on the definition, the only gene product that should use 'actin filament' is actin.
  • microtubule cytoskeleton vs. microtubule: If a gene product colocalizes with microtubules, annotate to 'microtubule cytoskeleton' rather than 'microtubule.' Based on the definition, the only gene products that should use 'microtubule' are alpha and beta tubulin.

Similarity-Based GO Annotation

  • Using top hits from GOst search and BLAST vs. UniProt, nr, InterPro, and Pfam, look at GO annotations of these sequences; ISS with that database record and use reference "dictyBase 'Inferred from Sequence or structural Similarity' Unpublished" (reference_no=10155).
  • If no non-IEA/ISS/NAS annotations exist for these top hits, you may use the sequence record in the with column, but in this case try to find a reference that provides evidence for the process/function/component (need to import PMID first, then make this annotation).
  • If you have a good annotation for a function and you can logically and confidently infer that the gene product participates in a process or localizes to a cellular component based on other annotations, use the IC evidence code (for example, a protein annotated with function "DNA binding" can be annotated with IC component "nucleus").
  • Alternatively, if you have good hits with InterProScan or ProSite, you may ISS with those records that have GO annotations. (See also InterPro2go and EC2go mappings.)
  • ISS may be done with molecular_function, however, biological_process and cellular_component terms must be used carefully. Very general process terms may be used, and component terms should be discussed.
  • As with curation of previously identified genes, annotate all second generation genes to all three ontologies (function, process, component). If there is nothing to ISS in one or more ontologies, and no IEAs exist, annote with "unknown."
  • biological_process unknown ; GO:0000004
  • molecular_function unknown ; GO:0005554
  • cellular_component unknown ; GO:0008372
  • ISS only with non-IEAs and non-ISS.
  • Don’t ISS with ISS; this dilutes the meaning of annotations (not to mention the level of confidence for ISS annotations from other databases).
  • You can "mix and match" -- all GO annotations don’t need to be from the same reference (i.e., the process can come from one and function from another).
  • Not all GO annotations from the "with" record need to be ISSed to the Dicty gene; some annotations contain extraneous information that is not relevant.
  • See also the GO list e-mail exchanges (April 5, 2004 and November 29, 2002).

dictyBase internal GO references

  • ND: dictyBase 'No biological Data' Unpublished (reference_no=9851)
  • ISS: dictyBase 'Inferred from Sequence or structural Similarity' Unpublished (reference_no=10155)
  • IC: dictyBase 'Inferred by Curator' Unpublished (reference_no=11067)
  • NAS: (unpublished information from authors) dictyBase (2005) 'Personal communication to dictyBase' Unpublished (reference_no=11050; note this reference changes each calendar year)

GO issues in need of further discussion

  • Use of 'cytosol' vs. 'cytoplasm.' Cytosol is part of cytoplasm.
  • GO:0005737 cytoplasm: All of the contents of a cell excluding the plasma membrane and nucleus, but including other subcellular structures.
  • GO:0005829 cytosol: That part of the cytoplasm that does not contain membranous or particulate subcellular components.
  • Annotation of controls/markers in experiments PMID: 15800059. If a result for the investigated gene(s) is based on the (also shown) experiment with a known gene product, the latter serves as a control and to interpret the results for the gene product(s) in question. These ‘control’ genes should not be annotated. As an exception, when the known gene does not have any experimental annotation for that term, the ‘control’ experiment can be used to add that annotation (in order to add high quality anotations to more genes efficiently).
  • Use of IGI: what constitutes a "genetic interaction?" Obviously a double mutant is a genetic interaction, but what about overexpression of one protein in a mutant background of another protein? This is also an issue with the literature topics: Genetic Interactions vs. Mutant/Phenotypes.
  • 'Colocalizes_with' is still a tricky issue. Our general consensus is that if something is shown truly transiently, we should use that qualifier. If something is present in a particular location throughout the course of the experiments in a paper, do not use the qualifier. What about "fuzzy" annotations?
  • Use of two different evidence codes for the same term in one reference is a common practice. What about parent/child terms using the same reference and the same evidence code?
  • IMP vs. IDA (issue #0054).
  • ND and IEA in same GO aspect (issue #0063). Similar issue: unknown and IEA (issue #0055).

Gene Ontology Resources

return to SOPs Index

Personal tools