Having trouble analyzing your single-cell RNAseq data? File formats got you bogged down? Get away from the format wars- SeqGeq can import just about any gene expression file. CSV, TSV, TXT, TAB, MTX, or H5 (AKA HDF5 from 10X Genomics) are just some of the growing number of file formats that SeqGeq currently reads.
Single-cell RNAseq and its derivative methodologies are rapidly being adopted by labs that historically have been non-sequencing oriented (e.g those in the broad fields of immunology, cell biology and development). As sequencing technology expands the available methods diversify, giving bench scientists an unprecedented level of biological resolution. However, the data types and formats are also diverse, making unifying the analyses a particularly troubling and mundane detail.
Tackling the formatting problem is more difficult than one would assume. After all, it’s just just a bunch of rows and columns in a spreadsheet, right? Even the simplest single-cell experiment can generate an expression matrix consisting of thousands of rows or columns. Editing such files to add metadata, or merge samples for side-by-side comparisons typically cannot be done in familiar programs like Excel and is at best a tedious task for experts in R and Vim; at worst an intimidating venture into the nuanced world of computer syntax for the novice.
If editing a single file was the only thing you had to do to get your sc-RNAseq data analyzed, then I’d be obliged to say it might be a great learning experience for a 1st year graduate student. That too, is not often the case. In the DIY world, you may want to manipulate data outputs at every stage in the pipeline. For example, consider the following steps in a single-cell sequencing analysis pipeline: 1) Debarcoding 2) Sequence alignment 3) Read counting 4) Removal of dead/dying cells and other artifacts, 5) Add metadata/ merge samples 6) Normalization 7) Dimensionality reduction 8) Clustering 9) Differential gene expression 10) Data Visualization. Each step generates a new file, possibly in a different format than was input. If the data coming from the previous step are not compatible with the data input formats of the following step, you have to use a data formatting package to get it right. To make matters worse, every new file will require a separate R-script be written to perform the specified action. In a perfect world, each step would output a format that was useable by the next step, and thus in our example, only 10 files and 10 R-scripts would be needed. In the real world you would likely have 20 R-scripts (10 output files needing reformatting and 10 scripts to run each R package) and 10 files to manage. What if you want to increase the number of principal components, or use a different clustering algorithm? You must go back to step 7, change the R-script, and repeat the data import and reformatting for all the remaining steps! Sound tedious? It is!
The remedy for this file-output-file-formatting battle is SeqGeq. SeqGeq is impartial to what machine you used sequenced your cells, the orientation of the matrix, presence of metadata or sample composition. You can have metadata in the file or add it within the application. You can merge samples together within the application to make side-by-side comparisons between samples or groups. Need to change the normalization? No problem, simply point and click. Furthermore, you can run, re-run and re-re-run PCA, tSNE, and clustering algorithms as many times as you like, with as few or as many genes as you like. No file formatting, editing or script writing required. Drag it. Drop it. Get on with your analyses.
Try SeqGeq with your own data with a free trial!