MicroTOOLs Software Pipeline Demonstration

Jonathan Magasin (email)
15 December 2017


Introduction

These instructions explain how to run the MicroTOOLs software pipeline using as an example microarray data from the BioLINCS cruise. They can be adapted to your own microarray data by substituting in several text tables to describe your samples and a configuration file.

MicroTOOLs pipeline and R package

The goal of the pipeline is to make it easy for a researcher with little or no programming experience to quickly analyze their MicroTOOLs microarrays. One sets up configuration parameters with this web page, creates a text table to point to the array data files, and then runs a short R script that performs each stage of the pipeline:

  1. Initialization: Load configuration file and data
  2. Quality control checks: Analyze probe intensity distributions.
  3. Normalization of probe intensities across arrays.
  4. Conversion to gene intensities from probe intensities.
  5. ERCC-based normalization gene intensities based on ERCC spike-ins. Not done in BioLINCS.
  6. Gene detection of genes above background noise
  7. Differentially expressed gene identification and clustering. Includes annotated heatmaps that show genes and organisms of interest.
  8. Gene module detection of genes with highly correlated expression using WGCNA. Includes network diagrams that link organisms (vertices) to WGCNA modules (vertices) by edges that are colored by gene type (element or pathway).
The main output of the pipeline is a web page report. For example, our analysis of the BioLINCS microarrays began with a report like this. Some of the figures in the report appeared in our paper, and data saved at each stage of the pipeline (as R objects) was used to answer questions about specific genes and organisms underlying expression patterns.

If you have some experience with R, then please note that the MicroTOOLs R package is primarily a library of many functions for interacting with and analyzing the data. You can explore these functions from within the R interpreter in the usual way:

You are free to customize the example pipeline. You can add steps between the stages, or you can replace those in the script with your own. (For this I suggest modifying, or starting with, the source code for the provided stages (e.g. PipelineStage5.DetectGenes()) which show how to add text to the report.)

BioLINCS arrays

This demonstration uses BioLINCS microarray data from six in situ samples and ten mixing experiment samples. Please see our publication for details on the samples and the experiment design. Briefly:


Instructions to run the demo

You must first get and get and install the MicroTOOLs R package as described here. Then the following steps will help you get the data from NCBI GEO and run the pipeline.

  1. Download this file. In a BASH shell, extract the contents:
            % tar xvfz MicroTOOLs_BioLINCS_Demo.tgz
        
    You now have the following: [Note that in your downloaded README_DEMO.html, links to output files from the pipeline will only work after you've run the pipeline.]

    When you run MicroTOOLs on your own data, you will have to get or create these files yourself. This is what you need to do:

  2. Get the raw microarray data files for BioLINCS at GEO (accession GSE109218) and save them within the BioLINCSArrays subdirectory. These have the probe intensities. (At GEO do not get the "series" data which has only the normalized gene intensities.)
  3. Run the script from a BASH shell:
            % run_MicroTOOLs_on_BioLINCS.R &> log.txt &
        
    The pipeline should finish in under an hour with stages 0 through 6 (DE genes) completing in under ~10 minutes. Stage 7 (WGCNA) most of the time. Here are two runs on different machines:
          Stages 0..6     Stage 7     Machine
          ===========     =======     =======
           5 min          51 min      MacBook Pro [macOS] (2.2Ghz, 4 cores; 16GB RAM)
          10 min          31 min      iMac        [Linux] (2.4Ghz, 4 cores; 32GB RAM)
        
  4. Look through the log file log.txt for errors or warnings. An example log file from a complete run the pipeline is here.
  5. View the report with your browser:
            % open report.html