June 9 2011

From DictyWiki

Jump to: navigation, search

Contents

Dicty Meeting 14-18 Aug

  1. Early Bird registration deadline: June 14 - Who is going?
  2. Abstract deadline: 15 June - is ready, just waiting until deadline becasue protein-coding gene count could still slightly change.
  3. RNAseq view in GBrowse 2 is still private (needs VPN for access). We should at least make this freely viewable, so if people ask after presenting the curation tool at the meeting, one can give them the URL.
  4. SAB meeting: soon time to get started
  5. Drink coasters as gifts - because we got grant? [1] 1200, round, midweight: $320.00 plus shipping. 2500: $378.00, 5000: $495.00

GO prep for Protein2GO

  1. Discussed outstanding issues with Emily and Tony at GO meeting, and got login for curation tool.
  2. Question of annotations to papers not in PubMed: GOA tool allows annotations with DOI identifiers. Sidd is looking into extending our table to accept DOI numbers. I will then fix these annotations, updating out internal ref to DOI numbers, and this solution would be good for any new annotations in this category to come.
  3. Planned release end of June, so dictyBase can accept the file from GOa and the appending of tRNA, trxA/B and some other annoations goes smoothly.

Textpresso GO annotations

  1. PubMed Central seems to not be a good option as percentage of availability is too low (40%)
  2. Updated way to load papers: We need to manually load into our folder, and Arun gets access to upload regularly from there what's new
  3. Petra needs to look at categories and maybe update with Dicty Assays etc., but easier when doing paper curation again soon.

Gene Curation Update

  • Less than 20 complicated genes with changes to go for Bob. Almost there!
  • Abstract for meeting, thinking about slides, gene examples and numbers needed, thinking about paper to write...

Release 2-20

  • Working on obo update script.
  • TODO:
    • Polish GAF loading script to include GAF references and DOI without pubmed.
    • Modify GAF dumping script to skip ncRNA's

Importing D.fasciculatum, P.pallidum

genbank records
D.fasciculatum
P.pallidum, P.pallidum mitochondrion, P.pallidum ribosomal
loaded scaffolds (supercontigs)
  • named after genbank record
  • added genbank dbxref
  • add description from genbank (i.e. "Polysphondylium pallidum PN500 unplaced genomic scaffold PPL_scaffold2, whole genome shotgun sequence.")
  • TODO: add reference
  • NOTE: not searchable by dbxref/name
loaded contigs (fake for mitochondrial and ribosomal) (
  • named after genbank record
  • added genbank dbxref
  • TODO: add reference
  • NOTE: not searchable by dbxref/name, mitochondrial genome does not have contigs, need to create artificial one to display in gbrowse.
loaded genes
  • added gene product (excl. "hypotetical protein")
  • TODO: add reference
loaded mRNA & polypeptide features
added SACGB dbxref
added genbank dbxref
added EC dbxref (mitochondrial genes)
added 'codon start/translation_start' prop
  • TODO: add reference
loaded tRNA, rRNA features
  • TODO: add reference
imported ESTs (ppal) [2]
  • aligned to genome 90% (4034 of 4452)
  • TODO: add reference (Gray,M.W. TBestDB [3] Polysphondylium pallidum)
created blast databases (ppal, dfas)

To discuss

Organism
  • abbreviations (3 letters for DDB/DDB_G) - DFA/DFA_G, PPA/PPA_G
  • where to store strain names (Polysphondylium pallidum PN500, Dictyostelium fasciculatum SH3), if to store at all. Common practice is to use species column. It is currently used to identify organism (as well as common name), i.e it is used in URLs like
genomes.dictybase.org/pallidum/gene => genomes.dictybase.org/pallidum PN500/gene  
genomes.dictybase.org/pallidum/gene => genomes.dictybase.org/pallidum/PN500/gene

- We don't have the D. discoideum strain anywhere! And for discoideum it's probably most likely that we get another strain. [Petra]

  • mitochondrial genome for pallidum belongs to different strain (CK8 vs PN500 for the genome), same way dicty mitochondrial genome [4] belongs to AX3 strain
Contigs
not imported for D.purpureum, data available for all organisms
Genes
linkouts: to GenBank (via protein id), to SACGB [5]: via locus tag (results in search) [6] or via SACDB internal id (can be derived from fasta)?
D.purpureum
  • submitted to GenBank [7]
  • has assembly information (we do not have/show gaps/contigs for dpur)
Search
  • existing search is hardcoded to search discoideum or purpureum data:
    • Gene Names/Synonyms - discoideum only
    • Gene IDs - any
    • ESTs - any
    • dictyBaseDPIDs - any but name comes from SITE_NAME env variable
    • external ids - dicty or all, depending on SITE_NAME env variable
    • Gene Product - not activated on multigenome, searches dicty only.
  • search results display is not suited for cross-species search
Existing search New search
can be rewritten to use species parameter, making search species-specific can be written to use both databases in order to make cross-species search.
would take less time but limit functionality would require complete rethinking of search strategies
would require both sites to use the same search would allow main site to use old search for the transition period

Stock center strains

List of REMI strains from Christopher Quang Dung Dinh/Adam Kuspa contains links to chromosmal location (i.e. [8]. Chromosome 2 coordinates are already incorrect due to the shift. External data is out of our control but it is possible to store this data on our side by create new feature representing single mutation point, this feature would have location on chromosome and will be linked with strain. Plan and estimate:

Write middleware for handling new feature
Figure out data model (5 days)
Figure out software interface (5 days)
Write software adaptor working with two schemas (3 weeks)
Display (gbrowse?, strain page) (2 weeks)
Personal tools