Bob's notes for Gene and gene product nomenclature
The dictybase nomenclature guidelines are at: http://dictybase.org/Dicty_Info/nomenclature_guidelines.html
Should nomenclature guidelines, Procedures for naming genes, and this page be merged? Why is information spread over 3 pages?
- Bob's preliminary additions
- Evidence gathering for assigning gene product and gene name
- The BLASTP of the protein sequence at NCBI and/or UNIPROT to provide support for the gene model also provides evidence for the gene product. If there are conserved domains at NCBI, these will be displayed graphically at the top of the page. Next, review the individual pairwise alignments to get a general sense of the significance of the matches. Questions to ask yourself at this point are:
- Are the matches full length and of significant percent identity?
- Are the gene product names assigned to the matches reasonably uniform?
- From which species are the matches? The taxonomic range of the matches is of interest for the description.
- Also run a BLASTP of the protein against all Dicty proteins to identify paralogs which you may want to curate simultaneously.
- Review the inparanoid link to identify orthologs. It is often useful to open the UNIPROT link for select orthologs, eg H. sapiens, S. cerevisiae if they are available from this page.
- Also open the UNIPROT link to review their version of the Dicty entry. Check for gene name and gene product name.
- Review the graphical display of InterProScan results available at the Protein Information tab. Output from panther, pfam, FingerPRINTS and other applications link to the source application to allow easy review. Note that matches here represent entire protein superfamilies or families, specific protein domains, as well as shorter protein motifs. Much information is compiled here and the curator is advised to review all to gain of sense their significance. HMMs from PFAM and TIGRFAMs are very useful in assigning gene product names.
- Search for cellular localization signals and transmembrane domains at CBS prediction services and PSORTII
- It is sometimes useful to do a specific pairwise alignment of the dicty protein with orthologs or close matches, especially if the match is a reviewed entry from UNIPROT which contains curated sequence annotation, eg, enzyme active sites. A comparison of the dicty protein with a reviewed protein allows identification of amino acid residues which may be crucial for the function of the protein-particularly enzymes.
- Blast the putative ortholog back against Dicty to ensure that the query protein is the best match in Dictybase.
Need some instructions here as to how to best identify and define an ortholog-as opposed to a homolog, etc.
- Search at COILS, others for coiled-coil prediction
- Gene name guidelines
- Dictyostelium gene names should be lower case (with exception noted below)
- should contain at least 3 characters and preferably no more than 7 characters.
- letters which designate a subunit should be uppercase, eg eid3D, ncapD2, but note that Greek letters should be lower case, and letters representing a word which describes a subunit, such as "regulatory" or "catalytic", should be lower case.
- If there are other Dictyostelium genes of that family, use the same prefix name, then subsequent letters or numbers Procedure for naming genes).
Maybe incorporate text from above on this page?
- Check nomenclature for other species, in particular Human Genome Nomenclature Committee . Do not use the human gene name if it refers to a mammalian-specific process or a disease. If the origin of the gene name is known, include this information in the Name Description field.
- If it is clear that it is the SINGLE ortholog (used loosely, not in a strict evolutionary sense) of a family member, then give the gene that letter/number. If it is unclear which letter/number family member it corresponds to, give the gene a letter, starting with A. (Creation of novel gene names should be discussed with the other curators and/or researchers.)
Above is better described in Gene Nomeclature Guidelines, so maybe link to there here.
- When a gene name is not completely clear, keep the DDB_G name from naming or consult with fellow curators.
Review original text and add suggestions