Pseudogene Annotation Guidelines

From DictyWiki

Jump to: navigation, search

Curators will conform to these guidelines by October 1, 2006. See the List of pseudogenes for genes to fix.

Contents

Identifying Pseudogenes

Description
The dictyBase curators have established guidelines for the identification of pseudogenes in the Dictyostelium genome. In general, pseudogenes are genes that appear to be truncated based on sequence similarity or GC content. Also, a pseudogene always needs to be very similar to another gene in the genome but not necessarily identical since it is not subjected to pressure.


If any of the following criteria are met, your gene should be annotated as a pseudogene
  • An apparent frameshift exists in the sequence, causing a stop in what appears to be coding sequence.
  • An early stop codon in the sequence, causing a stop in what appears to be coding sequence.
  • The absence of a start codon in what appears to be coding sequence. (These might be harder to detect as 5' introns are easier to miss.)
  • In addition, the gene MUST be similar to another gene or a family of genes in the Dictyostelium genome.
Other notes
  • The above list is intended to serve as general guidelines for pseudogene identification. The decision to designate a gene as a 'pseudogene' is a difficult one and is ultimately up to curator discretion.
  • Be conservative with the pseudogene designation. Pseudogenes may be excluded from certain analyses.
  • We should have at least 2 mutations if the sequence cannot be verified to create a pseudogene (see lip3 example below)

Annotation of Pseudogenes

The dictyBase curators have established the following guidelines for the annotation of pseudogenes in the Dictyostelium genome.

Gene name

  • All pseudogenes are named with the suffix "_ps"
  • Prefix should be the first three letters for a gene family.
  • If a pseudogene has one definitive "sister gene" in the genome, the pseudogene may have the same name as the sister gene followed by "_ps" (this should be discussed on a case-by-case basis)
  • If there is more than one pseudogene per "sister gene", number the pseudogenes (see sslA example below).
  • If a pseudogene belongs to an unnamed family, add "_ps" to its geneDDB_Gxxxxxxx name.
  • Published pseudogene names that do not conform to this nomenclature are stored as synonyms.

Examples:

  • Discoidin family: pseudogenes dsc_ps1 and dsc_ps2 have close sequence similarity to all discoidin genes: dscA, dscC, dscD, and dscE
  • FNIP repeat-containing protein kinase family: fnkA, fnkB, fnkC, fnkD, fnkE are genes; fnkF_ps and fnkG_ps are pseudogenes
  • Actin: act1 through act29 are genes; act30_ps is a pseudogene
  • Two "sister genes": sslA_ps1 and sslA_ps2
  • Uncharacterized gene family: DDB_G0274181 is a gene; the neighboring DDB_G0274179_ps is a pseudogene (named by adding a _ps to it's sequence ID)

Examples for NOT creating a pseudogene:

  • DDB_G0268964 (lip3) has one premature stop but no sequence support. An artificial gap was introduced and the gene should be re-inspected when sequence support becomes available.

Synonym

Synonyms are allowed for pseudogenes, however, DO NOT enter them in the Protein Synonym field.


Gene product

Gene product = pseudogene


Description

Write "putative pseudogene" at the beginning of the description to serve as a warning to the user, followed by gene family information.


Name Description

If the gene name corresponds to that of a gene family (such as act*), add the name description for that family; include "pseudogene". ! I noticed never add a name description for pseudogenes, as it's all explained in the description. What about you??? Petra!

Curated Model

  • A Curated Pseudogene is created for the gene. When you go to the feature page of the gene prediction there is a special button to create a pseudogene.
  • You come to a page where you can add support as deemed appropriate (EST, sequence similarity, personal communication, etc.), also check off 'Incomplete support.'
  • NOTE that there should be some sequence similarity with a least another Dicty gene.
  • Enter the coordinates of the pseudogene. The pseudogene has just a start and a stop (no introns/exons), so the boundaries of the gene have to be determined first.
  • Click 'commit' and you are done. In Gbrowse the pseudogene appears as a light grey continous bar.

Gene Ontology

Do not annotate pseudognes with Gene Ontology terms. If IEAs exist, delete them.

Curation Status

Pseudogenes are marked as 'comprehensively annotated'. In the summary paragraph section, enter: <curation_status>Gene has been comprehensively annotated, 10-NOV-2007 PF</curation_status>

Pseudogene Resources

A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their "normal" paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its "normal" paralog). Link



return to SOPs Index

Personal tools