Genes with Sequence Discrepancies

From DictyWiki

Jump to: navigation, search

The following is a list of genes in which curators have found putative sequence problems, either by comparison with ESTs or because of the presence/absence of particular sequence features.

Gene name || dictyBaseID || Chromosome || Start Position || Stop Position || Strand || Notes || Curator

Contents


Sequence discrepancies with supporting evidence

Supporting evidence includes independent sequence (ESTs, GenBank records, unpublished DNA) and sequence similarity with other spp.


  • vps26 || DDB0205116 ||3 || 5326311 || 5328458 || -1 || ESTs show different 3' sequence || KP

atcaattattGGTAGGTGTTacaagcgaaaaattagg

  • DDB0238065

According to similarity to DDB0238064, the 2nd intron should not exist; however if we remove it the ORF would go out of frame (in/del near the end of the 2nd intron?)

  • geneDDB0237971 || EST ddc9k17 (DDB0110048) has a 1 nt insertion compared with the genomic sequence; an artificial 2 nt intron has been created in this gene model to compensate.
  • sslA_1/sslA_2 || two good ESTs (ddc4i05, dda5d10) show there is a 1 nt insertion in the genomic sequence || PF
  • DDB0233185 || two decent ESTs show the start should be 11 nnt upstream || PG
  • ucr (DDB0238608) || several ESTs show there are many mismatches in the 5'UTR and N-terminus of this gene, also affecting the start codon; this also belongs to the next category - missing start || PF
  • trxD || there is just one EST that has several mismatches, and there is a non-consensus splice site, which could be the result of sequence mistakes in the genomic || PF
  • DDB0238630 || 2 genes merged using two arbitrary, wrong introns (Intron 3 and 4) to stay in frame. This is a conserved gene and sequence around these introns needs to be checked || PF
  • DDB0238663 || has good ESTs that extend into gene prediction intron, but only second ATG of what's now a one exon gene could be used. Check sequence of 5' end of this gene || PF
  • cpiC (DDB0252666 )|| GenBank mRNA (AB189920) has 5 T's not 6 (ttatttttaa, not ttattttttaa) and PMID 16328887 shows the phenylalanine at position 94 is the C-terminal residue. || PF
  • DDB0266832 gene merger due to insertion of one T in the genomic sequence (agtgaaaatataatgTttaactgaaaaag), which created a premature stop. The one EST and D. purpureum both confirm the merger, so the T for sure is wrong. || PF
  • cdc73 (DDB0267069) 1 nt (C) insertion in first exon (ACTTTAAATcATGGTGCTT), supported by one EST (DDB0084207). ||PF
  • DDB_G0293832 is at the end of a contig and about 300 nt, 5' end of the gene, is missing. THe gap between contigs is longer than represented in dictyBase. Supported by ESTs SSE559. || PF

Sequence discrepancies with missing features

Features include start codon, stop codon, splice donor/acceptor.


  • JC2V2_0_00112: This is most likely a pseudogene as it is *very* similar to upstream gene (DDB0232137). Translating the region just upstream of exon 3 creates a protein that aligns almost perfectly with the N-terminus of DDB0232137, however, there is an in-frame TAA: KIKFINY*LLFFIIFLN. 09-15-05 kp

- this gene has ESTs (but not in the contentious region)

  • DDB0238623_ps: To make a gene model the second intron must splice on AA rather than AG. This one is either a pseudogene or has a genome sequence error. PG

Sequence discrepancies with no evidence

This group includes sequences that have been flagged as problematic based on curator inference.

  • DDB0204606: the first intron appears to be coding. Moreover, this gene has no obvious 'sister gene' so unlikely to be a pseudogene. PG
  • DDB0238423: 1 nt deletion at or around nt 36; curated as a pseudogene. PG
  • DDB0238737: has no ESTs but sequence similarity suggests the 3'end is missing PG 11-9-2007
  • DDB0238738: ESTs has many differences, but the ESTs look poor quality PG 11-9-2007

[Back to Internal Documentation]

Personal tools