General Curation Guidelines
From DictyWiki
Contents |
Gene Name/Synonym
Published Sequencing Center names:
If a paper refers to a gene by its Sequencing Center name (e.g., JC3V2_0_01611, BC5V2_0_01947), add the Sequencing Center name as a synonym, but ONLY if that is the only identifier. If the dictyBaseID is also provided, do not include the Sequencing Center name as a synonym. This is not ideal, however, it occurs infrequently.
Gene Product
See ideas for changing annotations of Gene Products.
Unknown:
When the function/process of a gene product is unknown, and it does not contain any functional protein domains, use the gene product 'unknown.' Gene products such as 'hypothetical' and 'protein of unknown function' are unacceptable in our GenBank submission.
Conserved domains
Gene products are often taken from InterPro domain names, with the addition of '-domain-containing protein'. Enzyme names are also often used. Do not use parenthesis in gene products, for example 'zinc-containing alcohol dehydrogenase (ADH)'. Add the ADH as protein synonym rather than in the gene product; the correct gene product would then be 'zinc-containing alcohol dehydrogenase'.
Domain of unknown function:
Use of Pfam DUF (domain of unknown function) or UPF (uncharacterized protein family) is allowed in the gene product field if no other descriptive gene product exists. Gene product can also be "DUFXX domain-containing protein"; see InterPro record for family/domain information. Examples:
- DUF1325 family protein
- DUF185 family protein
- UPF0102 family protein
Conserved protein of unknown function:
In cases where a protein is a clear ortholog/highly similar to proteins from other species but its function/process is undetermined, use the gene product 'unknown.' Use the description field to explain the sequence similarity, e.g., 'conserved hypothetical protein' or 'conserved hypothetical Dictyostelium protein.'
Description
(empty)
Name Description
Use of the Name Description field:
Inclusion of the name description is mandatory, even if it is redundant with the gene product. The rationale behind this is that name descriptions are often difficult to find, and we would like to provide this information whenever possible.
Letters and numbers in the Name Description:
When the last letter/number of a gene name is important, include it in the name description. When the last letter/number is arbitrary, do not include it. Examples:
sad = <b>S</b>ubstrate <b>AD</b>hesion vasP = <b>VA</b>sodilator <b>S</b>timulated <b>P</b>hosphoprotein
Changing the Wiki page
After changing a gene name and/or making a Curated Model, the Wiki page must be updated:
- Move the page to the new gene name.
- Edit the page to link to the new dictyBaseID and/or name.
- Genes on the Chr 2 duplication: redirect wiki pages from the genes on the second repeat onto the page of the genes on the first repeat. See rnrB_1/rnrB_2: Redirect on the wiki rnrB_2 page
Annotation issues in need of further discussion
- Curation of papers using inhibitors (issue #0041).
- Descriptions containing "similar to" (issue #0042).
- Abbreviations in the gene product and description (issue #0043).
- "Disease gene-related" literature topic (issue #0044).
- Pseudogene annotation (issue #0056).
- "Homolog" in gene product (issue #0059).
- "Unpublished" sequences for Curated Model annotation (issue #0060).
Useful Links for Curation
- Biocurator.org
- Dictyostelium Anatomy Ontology (DDANAT)
- Gene Ontology (GO) Consortium
- Generic Model Organism Database (GMOD)
- GMOD SOPs
- HUGO Gene Nomenclature Committee (HGNC)
- MetaCyc
- National Center for Biomedical Ontology
- Open Biomedical Ontologies
- PATO
return to SOPs Index
