Gene and gene product nomenclature

From DictyWiki

Jump to: navigation, search

Contents

Gene Name/Synonym

The dictybase nomenclature guidelines are at: http://dictybase.org/Dicty_Info/nomenclature_guidelines.html

In general, curators perform the following steps to determine the identity of gene products
  1. BlastP vs. SwissProt and/or nr
  2. Use the AmiGO BLAST search with protein sequence
  3. InterProScan and Pfam search for protein families and domains
  4. FingerPRINTScan for motif fingerprints in families
  5. PSORT and Target P for cellular localization
  6. TMHMM for transmembrane domains
  7. Blast the ISS protein vs. dictyBase, make sure it’s the top hit and/or take into account (and annotate if appropriate) other hits
  • Dictyostelium gene names are lower case
  • If there are other Dictyostelium genes of that family, use that name, using the following letter or numbers Procedure for naming genes).
  • Check nomenclature for other species, in particular Human Genome Nomenclature Committee [1]. Do not use the human gene name if it refers to a mammalian-specific process or a disease. If the origin of the gene name is known, include this information in the Name Description field.
  • If it is clear that it is the SINGLE ortholog (used loosely, not in a strict evolutionary sense) of a family member, then give the gene that letter/number. If it is unclear which letter/number family member it corresponds to, give the gene a letter, starting with A. (Creation of novel gene names should be discussed with the other curators and/or researchers.)
  • When a gene name is not completely clear, keep the DDB_G name from naming or consult with fellow curators.

Gene Product

See ideas for changing annotations of Gene Products.

  • Look at the UniProtKB page to find what names was given to that gene product. Even if the protein has been named by uniProtKB curators, look for other potential protein names:
  • Keeping the rules about Gene Names in mind, use the expanded gene name if there is evidence pointing to a likely homolog/ortholog/paralog. (Provided that the gene name is related to its functionality, not a mutant phenotype.)
  • Use lowercase letters for gene products, except in cases where it is standard to use upper case, such as acronyms.
  • Use the "Search Gene Product" tool to reduce redundancy of the gene products.
  • When in doubt, always use the broadest classification possible (i.e. instead of putative CAM kinase use putative serine/threonine protein kinase).
  • Include both general and specific information if available (e.g., kinases must all have "protein kinase" somewhere in a gene product and information regarding membership in some subfamily, if applicable; multiple gene products are encouraged if they keep it searchable).
  • When creating new gene products, keep the user in mind: what are they likely to search for?


gene product X

Example: ribonucleotide reductase, small subunit

  • Highest hit Dicty vs. UniProt/nr/GOst
  • Highest hit other spp. protein vs. dictyBase (reciprocal hit), all other hits insignificant
  • High level of identity in pairwise BLAST (over length of protein). Start with >35% identity over >80% length of the protein. If there are examples of genes we would like to annotate that fall below that, we should discuss them. Also look for conserved patterns of conserved protein domains.
  • Genes we expect to be essential and that are present as a single copy can be annotated "Gene X" even if the 35% identity over 80% length rule is not true; for example, if there is only one RNA polymerase.

putative gene product X

Example: putative AGC protein kinase

  • High level of confidence that this protein is a member of a particular group/family/subfamily but lower level of overall identity and/or best reciprocal hit test is inconclusive.

X DOMAIN-CONTAINING PROTEIN

Example: BZIP domain-containing protein

  • Similarity in conserved functional domains only (no similarity over length of protein).

UNKNOWN

  • When the function/process of a gene product is unknown, and it does not contain any functional protein domains, use the gene product 'unknown.' Gene products such as 'hypothetical' and 'protein of unknown function' are unacceptable in our GenBank submission.

DOMAIN-CONTAINING PROTEINS

  • Gene products are often taken from InterPro domain names, with the addition of '-domain-containing protein'. Enzyme names are also often used. Do not use parenthesis in gene products, for example 'zinc-containing alcohol dehydrogenase (ADH)'. Add the ADH as protein synonym rather than in the gene product; the correct gene product would then be 'zinc-containing alcohol dehydrogenase'.

DOMAIN OF UNKNOWN FUNCTION

  • Use of Pfam DUF (domain of unknown function) or UPF (uncharacterized protein family) is allowed in the gene product field if no other descriptive gene product exists. Gene product can also be "DUFXX domain-containing protein"; see InterPro record for family/domain information. Examples:
    • DUF1325 family protein
    • DUF185 family protein
    • UPF0102 family protein

CONSERVED PROTEIN OF UNKNOWN FUNCTION

  • In cases where a protein is a clear ortholog/highly similar to proteins from other species but its function/process is undetermined, use the gene product 'unknown.' Use the description field to explain the sequence similarity, e.g., 'conserved hypothetical protein' or 'conserved hypothetical Dictyostelium protein.'

FAM's and TMEM's

  • If the protein is a good ortholog of the human protein, those names are accepted.



return to SOPs Index

Description

Descriptions can be derived from any of these sources, plus general information about the gene product’s process/function/component
  • EC reaction descriptions
  • UniProt descriptions/functions
  • General functions from data from other organisms
  • Example: similar to human LYST and mouse Beige proteins (lvs genes)

Name Description

Use of the Name Description field:

Inclusion of the name description is mandatory, even if it is redundant with the gene product. The rationale behind this is that name descriptions are often difficult to find, and we would like to provide this information whenever possible.

Letters and numbers in the Name Description:

When the last letter/number of a gene name is important, include it in the name description. When the last letter/number is arbitrary, do not include it. Examples:

sad = <b>S</b>ubstrate <b>AD</b>hesion
vasP = <b>VA</b>sodilator <b>S</b>timulated <b>P</b>hosphoprotein


Useful Links for Curation

and links to other nomenclature guidelines: http://www.uniprot.org/docs/nomlist



return to SOPs Index

Personal tools