SmashCell

Introduction

SmashCell is a software framework that automates the major steps in the analysis of microbial genome sequences: assembly, gene prediction and functional annotation. It is designed to facilitate parameter and algorithm exploration at each of these steps and generates graphs to compare the results of each combination. SmashCell was been developed to analyse single-cell amplified genomes, however is also suitable ordinary microbial genomes and low-complexity metagenomes. For more complex metagenomes you should check out SmashCommunity.You can get a better idea of what SmashCell is by reading the introduction and what it does by reading the tutorial.

Documentation

The Manual is divided into the following sections:
Introduction
An overview of SmashCell, describing the data model and some of the conventions used in SmashCell. It also contains detailed installation instructions. You should read this first.
Tutorial
A series of worked examples that demonstrate the features of SmashCell.
Framework Components
Documentation on some of the code used in SmashCell including automatically generated schema graphs for the databases used by SmashCell
The documentation is designed to be viewed in a browser, however you can also get a pdf version of the documentation from the downloads page. This documentation is for the 0.1.0 version of SmashCell, you can view the development version here.

Getting SmashCell

SmashCell is can be downloaded for local installation and is also available for download as a virtual machine, please see the installation section for details.

Using SmashCell

From the command line

The following command calculates the 4-mer frequences in the contigs of an assembled genome, carries out PCA on the resulting matrix and saves the results as a graph:
(SmashCell)[user@localhost:~/smash_tutorial]> ass_composition_analysis.py \
	--plot_components 1:2 \
	-R calculate_nmer_data  \
	-R calculate_nmer_pca \
	--smashdb_url "sqlite:///./smash_pipeline/test.smashdb" \
	--label_observations_a 10 \
	--marker_size 5 \
	--marker_linewidth 0.3 \
	--ass_accessions mgB_tiny_nblr \
	--overwrite_data  \
	--nmer_len 4 \
	--formats pdf  \
	--formats png \
	--plot_variables 

Using the python library

import numpy 
from Smash.Databases.SmashDB.DB import SmashDB 
 
# connect to the sqlite3 pipeline database  
smashdb = SmashDB(db_url='smash_pipeline/test.smashdb')

# iterate of the GenePredictions for this MetaGenomeCollection 
# and calculate the mean gene length 
for mg_acc,mg in mc.metagenomes.items(): 
    for ass_acc,ass in mg.assemblies.items(): 
        for gp_acc,gp in ass.gene_predictions.items(): 
            #connect to the GenomeDB associated with this GenePrediction 
            gdb = gp.genomedb 
            gene_lengths = [] 
            for gene in gdb.gene_iter(): 
                gene_lengths.append(len(gene)) 
            num_genes = len(gene_lengths) 
            mean_length = numpy.array(gene_lengths).mean() 
            print 'Mean gene length: %s -> %s -> %s : %.2f nt (n=%d)'%\
                    (mg_acc.ljust(12), ass_acc.ljust(17),gp_acc.ljust(25), 
                     mean_length,num_genes)