Pseudogene Annotation Guidelines
From DictyWiki
Curators will conform to these guidelines by October 1, 2006. See the List of pseudogenes for genes to fix.
Contents |
Identifying Pseudogenes
- Description
- The dictyBase curators have established guidelines for the identification of pseudogenes in the Dictyostelium genome. In general, pseudogenes are genes that appear to be truncated based on sequence similarity or GC content. Also, a pseudogene always needs to be very similar to another gene in the genome but not necessarily identical since it is not subjected to pressure.
- If any of the following criteria are met, your gene should be annotated as a pseudogene
- An apparent frameshift exists in the sequence, causing a stop in what appears to be coding sequence.
- An early stop codon in the sequence, causing a stop in what appears to be coding sequence.
- The absence of a start codon in what appears to be coding sequence. (These might be harder to detect as 5' introns are easier to miss.)
- In addition, the gene MUST be similar to another gene or a family of genes in the Dictyostelium genome.
- Other notes
- The above list is intended to serve as general guidelines for pseudogene identification. The decision to designate a gene as a 'pseudogene' is a difficult one and is ultimately up to curator discretion.
- Be conservative with the pseudogene designation. Pseudogenes may be excluded from certain analyses.
- POINT TO DISCUSS: should we say we need at least 2 mutations, or one mutation AND sequence verified?
Annotation of Pseudogenes
The dictyBase curators have established the following guidelines for the annotation of pseudogenes in the Dictyostelium genome.
Gene name
- All pseudogenes are named with the suffix "_ps"
- Prefix should be the first three letters for a gene family plus the next letter/number in the series.
- If a pseudogene has one definitive "sister gene" in the genome, the pseudogene may have the same name as the sister gene followed by "_ps" (this should be discussed on a case-by-case basis)
- If there is more than one pseudogene per "sister gene", number the pseudogenes (see sslA example below).
- If a pseudogene belongs to an unnamed family, add "_ps" to its geneDDB0xxxxxxx name.
- Published pseudogene names are stored as synonyms (RIGHT???).
Examples:
- FNIP repeat-containing protein kinase family: fnkA, fnkB, fnkC, fnkD, fnkE are genes; fnkF_ps and fnkG_ps are pseudogenes
- Actin: act1 through act29 are genes; act30_ps is a pseudogene
- Two "sister genes": sslA_ps1 and sslA_ps2
- Uncharacterized gene family: geneDDB0229955 is a gene; geneDDB0230119_ps is a pseudogene
Synonym
Synonyms are allowed for pseudogenes, however, DO NOT enter them in the Protein Synonym field.
Gene product
Gene product = pseudogene
Description
Write "putative pseudogene" at the beginning of the description to serve as a warning to the user, followed by gene family information.
Name Description
If the gene name corresponds to that of a gene family (such as act*), add the name description for that family; include "pseudogene".
Curated Model
- A Curated Pseudogene is created for the gene. When you go to the feature page of the gene prediction there is a special button to create a pseudogene.
- You come to a page where you can add support as deemed appropriate (EST, sequence similarity, personal communication, etc.), also check off 'Incomplete support.'
- NOTE that there should be some sequence similarity with a least another Dicty gene.
- Enter the coordinates of the pseudogene. The pseudogene has just a start and a stop (no introns/exons), so the boundaries of the gene have to be determined first.
- Click 'commit' and you are done. In Gbrowse the pseudogene appears as a light grey continous bar.
Gene Ontology
Do not annotate pseudognes with Gene Ontology terms. If IEAs exist, delete them.
Pseudogene Resources
- Pseudogene.org
- Pseudogene definition at Wikipedia
- Junk DNA definition at Wikipedia
- Hirotsune et al. (2003)
- Sequence Ontology (SO) definition of pseudogene (SO:0000336):
- A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their "normal" paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its "normal" paralog). Link
return to SOPs Index
