The Illinois Mangrove Project | v 3 7/2/12 |
Home | Transcriptome database | R mangle v3 | About MTDB | Rhizophora germination movies | Help | About us |
Field samples were processed immediately in a swamp-mobile facility under RNase free conditions.
The core of this mangrove transcriptome database lies in 230K sequences from Rhizophora mangle and 305K Heritiera littoralis obtained by 454 sequencing. The average read size was 208 bp. The database contains 25.5K contigs from R. mangle and 32K from H. littoralis. For full details of this sequencing project, see Dassanayake M, J. S. Haas, H. J. Bohnert, J. M. Cheeseman (2009) Shedding light on an extremophile lifestyle through transcriptomics. New Phytologist 183: 764-775 (doi: 10.1111/j.1469-8137.2009.02913.x)
But briefly… this is what we did
Plant tissues were harvested from both the greenhouse and the field and stored in liquid nitrogen or RNAlater. R. mangle field samples were collected at Twin Cays, Belize, including leaves, roots, hypocotyl peels, young and mature propagules, and flower buds of stunted and tall individuals and P-fertilized stunted plants. The greenhouse plants originated from this field site. Greenhouse samples included young leaf buds and shoot meristems, mature buds, stipules, young leaves, mature leaves, senescing leaves, young stem, fine roots, old, thickened roots, mature stem bark, and prop root tips. Greenhouse salinities ranged from 2% to 100% of full seawater. Overall, 68 different tissue types, growth conditions, and development stages were extracted for R. mangle. H. littoralis tissue samples were taken from young and mature leaves, roots, buds, and young stems of three year old saplings in the greenhouse; the were from an estuary on the southwest coast of Sri Lanka.
RNA was isolated using the Agilent Plant RNA Isolation Mini Kit, treated with recombinant DNase I and further processed with Norgen RNA clean-up and concentration kits. Equal amounts of mRNA from different tissue types were pooled for each species. mRNA was selected using Oligotex mRNA mini Kits, and converted to cDNA using the SMART cDNA synthesis protocol.
To improve coverage and sequencing of rare transcripts, cDNAs were normalized, then nebulized and size selected for an average size of 400-500bp. FLX specific adapters were ligated to the cDNA. Sequencing was done with a 454/Roche GSFLX Genome Sequencer.
Contig assembly, after sequencing, was done using the Phrap assembly program.
Sequence annotation was based on sequential BLAST, beginning with the NCBI non-redundant (nr) protein database limited to Arabidopsis thaliana. Sequences with no hits were searched against the nr protein database limited to all plants, then using BLASTn, first against A. thaliana and then against all plants. In practice, the e-values for more than 90% of the annotated sequences were less than e-10.
Gene Ontology (GO) annotations were assigned based on the similarity to A. thaliana sequences; KEGG pathway annotations were assigned based on appropriately annotated plant reference genomes in NCBI.
top
And for other species…
MTDB was further expanded to include 24061 cDNA sequences from 26 additional mangrove species which were downloaded from NCBI nucleotide, protein, and EST databases. These were all the available sequences as of October, 2008. In order to make MTDB exploitable in functional genomics, entries containing promoter, intron and intergenic spacer regions used primarily in microsatellite studies were excluded. For species with fewer than 1000 ESTs, we manually excluded duplicate records and predetermined pseudogenes.
Because many of these sequences were originally annotated with functionally uninformative names, e.g. clone numbers, or were assigned functions based on the project objective alone, e.g. "salt tolerant protein," we re-annotated all sequences as described above.
A final note: nearly fifty percent of all sequences shared no homology, within our annotation criteria, with any other sequence in GenBank, and therefore, could be novel mangrove sequences. This highlights the large proportion of the unexplored sequence space in mangroves.
top
Database structure
The data model itself is built around five tables: sequences, queries, proteins, genes, and gene accessions. The basic structure, based on 7 indices, was designed to allow interlinked searches between tables. Entries in the "sequence" table that have a hit in GenBank databases are included in the "queries" table. Every sequence in the queries table is associated with an entry in the "proteins" table. Sequences in the queries table that have a match to an entry in NCBI reference genomes are included in the "genes" table. Genes table entries contain GO and KEGG functional assignments whenever possible.
top