OCT 11 2010

From DictyWiki

Jump to: navigation, search


Comments on MAKER

  • Quest would be suitable for this job [Talk to dong about it]
    • However need to do a test run on avery(suggested by dong also)
  • FYI: cross_match software and repeatmasker database has to be obtained after e-mail registration (takes at least a week to happen)
  • Overall: Including setup estimated 7 days for the first batch of runs

Next release 2.19 (Dec. 17th)

Migrating reference tables and GO to Chado
  • Migrate the batch reference update script. [Done]

[Working on]

  • Migrate the dicty reference page to new code and test it.
  • Migrate the reference search.
Interpro domains for all dicty genes (Yulia)
We store both domains that are integrated into InterPRO and are unintegrated (i.e. TMHMM and SignalP domains are not integrated). Some signatures reported in InterProScan are marked as unintegrated but still have valuable information that should be stored and made searchable..
problem encountered: signature names seem to change qrequently. Waiting for responce from Interpro with explanations (expected behavior is to have stable data in between InterPro releases)
Automate curation stats -Excel (Yulia)
Our links from the genome browse don't work anymore (at least for me - Petra), e.g. [1]. And it would be nice to link to the EnsemblProtists from our gene pages. Example link, [2]

working link for EmsemblProtists for GBrowse http://protists.ensembl.org/Dictyostelium_discoideum/Location/View?db=core;r=1:1000-30000

Fix Pseudogene - Gene association bug (Yulia) [Done]
Pseudogene features cannot be associated with another gene, which needs fixing. See issue [3] [Petra]
Link outs to GenBank (Yulia) [Done]

The GenBank links on the Gene pages (see for example sadA gene) we link out to our own version of the sequence (for example sadA AY178767). This is misleading and also unnecessary. We should link to GenBank directly and remove the tab for the GenBank sequence.

  • Do we keep GenBank sequences in BLAST options?
Stock Center User requests (Yulia) [Done]
  • Add shopping cart to each strain and plasmid page.
    • SC search pages
    • strain catalog
    • strain page
  • Make shopping cart 'sticky' so it remembers what's in it. Action: extended cookies lifespan to 3 month (was set to expire with session end)

Gene Model curation update

We have currently (Monday, 10:45 AM) 9716 curated models. When we got Gareth's data, Sidd created lists of gene with changes (1060) and genes without changes (about 2600). We took those big lists and focus on the genes without changes, regardless of former categorization, because categorizing them by support would again take Sidd's time. However, there are many very complicated genes and gene families and I started an 'uncuratable' list (currently has 7 genes, of which two are curated but clearly wrong, and I'm contemplating to delete). By dealing with these complicated genes I also discover obviously wrong curated models [Petra].
Yulia created Gareth sequences as private BLast database on test. Should be put on prod.

  • Petra prelim. stats: 49 inspected: 29 curated of which 6 were changed; 2 moved to list with changes; 8 skipped (some after short inspection only; 6 put on uncuratable list, 4 deleted.
  • Bob prelim. stats: 146 inspected: 70 curated of which 12 were changed; 76 skipped. Of the 76 skipped, approx. 30 may be genes that should be deleted; 5-10 are genes that have support but no ORF, so may belong on uncuratable list.

Genes/Gene families that do have support but no ORF

  • First gene family (on uncuratable list now):
    • DDB_G0269068 - short intron wrong; very similar to DDB_G0275481, somewhat similar to DDB_G0290811, DDB_G0268984
    • DDB_G0268984 - looks like fusion partner of DDB_G0269080, lots of good ESTs but no ORF
    • DDB_G0269080 - fusion partner of DDB_G0268984, lots of good ESTs but no ORF
    • DDB_G0290811 - similar to DDB_G0268984
    • DDB_G0275481 - from C2 and skipped by Pascale, very similar to DDB_G0269068, DDB_G0290811, DDB_G0268984
    • DDB_G0275905 - from C2, very bad model, very similar to DDB_G0269080
    • DDB_G0295759 - this is a very badly curated gene - needs fix - similar to DDB_G0269080, DDB_G0275905
  • Second gene family:
    • DDB_G0282331 - very short, seems definitely pseudogene fragment
    • DDB_G0282329 - has intron that doesn't look like intron but checked with Gareth Blast and sequence (stop) confirmed.
    • DDB_G0282333 - already curated, but wrong, there is a downstream geneID prediction that is part of this gene. But stop confirmed by Gareth seq.

Deleted Genes (or that should be deleted)

In general, small genes (<70 aa) that do not get repredicted or at least not wirth the same model, and are no similar to anything. Also, the gene structure is generally weired (no good start/stop and or splice sites and intron sequence); or when bits are next to RTEs, it often belongs to those but we don't curate. recently deleted [Petra]: (there will be many more)

  • DDB0190017 |Protein| on chromosome: 1 position 2062288 to 2062491


  • DDB0190018 |Protein| on chromosome: 1 position 2063026 to 2063160


  • DDB0187248 |DDB_G0287045|Protein| gene: DDB_G0287045 on chromosome: 4 position 5226996 to 5227253


  • DDB0184170|DDB_G0292026 |Protein|gene: on chromosome: 6 position 1043977 to 1044119


  • DDB0206119|DDB_G0280681 |Protein|gene: on chromosome: 3 position 3626196 to 3626422


  • DDB0215678|DDB_G0283491 |Protein|gene: on chromosome: 4 position 732168 to 732343



  • Curators should curate genes without changes from the no changes list with good support as quickly as possible. They should only curate genes that need edits and mergers that can be fixed quickly and easily.
  • When curators inspect gene models that they do not curate, they should make notes describing the observed problems.
  • Yulia will add a button to the curation tool that we can select when we do not make a curated model. The note should go into a public note below the GBrowse image and will read: "This gene has been inspected by a curator, but there is inadequate support support to make a curated model. 11-OCT-2010 PF" [Date/Initials]

Separating floatigs

Do we still have plans to do that? Maybe independent of that, I found a short contig in the 2F concatemer (DDB0215024), that's for sure a duplicate of an integrated chr 2 region. There were 4 genes on the contig of which two were tiny I deleted those, DDB_G0294182 (7 aa), DDB_G0294184 (21 aa - conserved piece occuring in 7 genes on chr 2,3 and 5). Then there are two genes that are predicted exactly like on chr2: DDB_G0271418/DDB_G0294188 and DDB_G0271416/DDB_G0294186. I blasted the genomic sequence and there are 7 mismatches, mostly at the ends of the small 2F contig, but I definitely think these are redundant. [Petra]


Intact- Uniprot export files

On 9/28/10 4:50 PM, Samuel Kerrien wrote:

Brilliant, do get in touch again if we can help further.


On 28 September 2010 15:09, Pascale Gaudet <pgaudet@northwestern.edu> wrote:

Dear Sam,
I can't find a reply to this email - but I wanted to let you know we have discussed 
the UniProt export files and we can use them to integrate the IntAct data into dictyBase.
Best regards,


Recently DONE

  • Harry MacWilliams : custom FASTA containing coding and pseudogene sequences fith flanking regions [Yulia]
  • RNASeq internal blast database [Yulia]
  • AX4 sequence variations from Gareth - Generate list of not yet curated genes having sequence variations [Sidd] [Done]
  • Attribution of annotations
  • [User_Annotation link now on front page and people ho commented were very positive. The idea is that people add themselves to the wiki. {Petra-DONE]

Action: Warren is going to look at the submission form (not sure what that was from?

  • Stock Center: Distributing HL5 and charging for strains is off the book for now.
Personal tools