Bob's notes for Gene Model curation
Determining the correct gene model
- Curated Model = manually curated gene model
- To determine the correct gene model
- Bob's preliminary notes on procedure for gene model curation.
- I did not perform steps 1 and 2 in Gene Model Curation. Check to see if this should have been done.
- Begin with BLASTP of protein sequence at NCBI and/or UNIPROT and review the matches. Entire-length protein matches between the query and the subject sequence with no or few gaps provides sequence similarity support for the gene model. Look for sequence support for the beginning and end of the protein. Also observe gaps closely. Gaps that occur consistently in one region of the match in a multi-exon protein may be your first clue of incorrect exon/intron boundaries in the gene model. Individual pairwise alignments between the dicty and match proteins with, eg the EMBOSS tool may also be helpful. Given the prevalence of low complexity sequence in Dicty genes, it may also be helpful to adjust the blast parameters, specifically turning filtering on or off to obtain the best protein alignment.
- When there are few matches at NCBI or UNIPROT, perform a BLASTP of the dicty protein vs. the D. purpureum database to check for support. It may be necessary to search both the GeneID and JGI datasets.
- Now open the Gbrowse window to look for EST and solexa support. First, check to see if the gene models from the original sequencing center and the Geneid reprediction match. Also compare the EST and solexa support with the diagram of the gene model-checking to see if they support the 5' and 3' ends, and the exon/intron boundaries of the gene.
- May need to blastn the dicty CDS vs. the Dicty ESTs, or vice versa to verify the EST support. This is an important step as ESTs can potentially align non-specifically in GBrowse.
- It is useful to extend the range of the diagram by zooming out to include all the intergenic space between the gene and interest and it's nearest flanking neighbor genes. Then if needed use the zoom in feature to view just the gene model and the intergenic space upstream and downstream.
- Under Reports and Analysis, toggle to "Download the Decorated Fasta" and select the GO button. Note that it will be necessary to Configure the fasta file and also select the "Flip" button for Crick strand genes. Now review the decorated fasta file checking the start site, splice donors, and terminators. Check splice donors [consensus for Dicty: (C/A)AG | GT(A/G)AGT] and splice acceptors [consensus for Dicty: (T/C)NN(C/T)AG | (A/G)] and start site (ATG; -3, -6, and -9 are typically A, upstream is AT rich with CG islands). Also note that a dicty intron should not be shorter than 60 bp.
- It may also be useful to search the decorated fasta at the geneid site to repredict the gene for additional confirmation of the gene model.
Creating a Curated Model
- To create a Curated Model
- Bob's preliminary notes
- Go to Curate Gene from dictyBase Curator Central. Enter Gene Name.
- Scroll down to the Features section and click 'Edit' for the Gene Prediction (Source = Sequencing Center; Deleted? = N). A new window will open.
- Click 'Create dictyBase Curated Gene.'
- A new feature will be created and will be identical to the Gene Prediction (gene sequence and structure). It is automatically the primary feature. Record feature number of old and new features (sometimes features can get lost, so it is a good idea to have these numbers just in case).
- Click 'Curate New Feature' to add information to the Curated Model (see Feature Curation Tool for details).
- If the Sequencing Center Gene Prediction is the correct gene model, you may skip ahead to Step 8.
- If the Sequencing Center Gene Prediction is NOT the correct gene model, load the Curated Model in Apollo and make changes accordingly. (A link here to an SOP for changing a gene model in Apollo might be useful.)
- After your satisfactory Curated Model has been created, return to the Gene Curation Page and refresh the page; there should now be at least two features (Sources = Sequencing Center and dictyBase Curator).
- Select "Edit Paragraph" to describe the curation status. A new window or tab will open. If Basic Annotations, add the note "Basic annotations have been added to this gene, DAY-MON-YEAR Curator's initials"; if comprehensively annotated, add the note: "Gene has been comprehensively annotated, DAY-MON-YEAR Curator's initials". Select the Submit button. (Question: Is there an SOP describing the difference between basic and comprehensive annotations-if so, a link should be provided here; if not, a description should be written.)
- Return to the Gene Curation Page and refresh the page: the curation status note should now appear in the Paragaraph section.
- Write a private Curator Note beginning with "DAY-MON-YEAR Curator's Initials//" then describing additional information and observations beyond that added in the standard annotation fields.
- Write public Curator Notes when applicable.
- Add gene product name, description, gene name, name description, protein synonyms, etc. depending on whether the gene model is being annotated to the basic or comprehensive level. (see Feature Curation Tool for details).
- Refresh the Gene Page for the gene. The gene should now have the Curated Model as its primary feature.