Introduction
SmashCell is a software framework that automates the major steps in the analysis
of microbial genome sequences: assembly, gene prediction and functional annotation. It
is designed to facilitate parameter and algorithm exploration at each of these steps and
generates graphs to compare the results of each combination. SmashCell was been developed
to analyse single-cell amplified genomes, however is also suitable ordinary microbial genomes
and low-complexity metagenomes. For more complex metagenomes you should check out
SmashCommunity.You can get a better idea of what SmashCell is by reading
the
introduction and what it does by reading the
tutorial.
Documentation
The
Manual is divided into the following sections:
- Introduction
- An overview of SmashCell, describing the data model and some of the conventions used in SmashCell.
It also contains detailed installation instructions. You should read this first.
- Tutorial
- A series of worked examples that demonstrate the features of SmashCell.
- Framework Components
- Documentation on some of the code used in SmashCell including automatically generated schema graphs for the databases used by SmashCell
The documentation is designed to be viewed in a browser, however you can also get a
pdf version of the documentation from the
downloads page. This documentation is for the
0.1.0 version of SmashCell, you can view the development version
here.
Getting SmashCell
SmashCell is can be downloaded for local installation and is also
available for download as a virtual machine, please see the
installation
section for details.
Using SmashCell
From the command line
The following command calculates the 4-mer frequences in the contigs of an assembled genome, carries
out PCA on the resulting matrix and saves the results as a graph:
(SmashCell)[user@localhost:~/smash_tutorial]> ass_composition_analysis.py \
--plot_components 1:2 \
-R calculate_nmer_data \
-R calculate_nmer_pca \
--smashdb_url "sqlite:///./smash_pipeline/test.smashdb" \
--label_observations_a 10 \
--marker_size 5 \
--marker_linewidth 0.3 \
--ass_accessions mgB_tiny_nblr \
--overwrite_data \
--nmer_len 4 \
--formats pdf \
--formats png \
--plot_variables
Using the python library
import numpy
from Smash.Databases.SmashDB.DB import SmashDB
# connect to the sqlite3 pipeline database
smashdb = SmashDB(db_url='smash_pipeline/test.smashdb')
# iterate of the GenePredictions for this MetaGenomeCollection
# and calculate the mean gene length
for mg_acc,mg in mc.metagenomes.items():
for ass_acc,ass in mg.assemblies.items():
for gp_acc,gp in ass.gene_predictions.items():
#connect to the GenomeDB associated with this GenePrediction
gdb = gp.genomedb
gene_lengths = []
for gene in gdb.gene_iter():
gene_lengths.append(len(gene))
num_genes = len(gene_lengths)
mean_length = numpy.array(gene_lengths).mean()
print 'Mean gene length: %s -> %s -> %s : %.2f nt (n=%d)'%\
(mg_acc.ljust(12), ass_acc.ljust(17),gp_acc.ljust(25),
mean_length,num_genes)