At the top of the preprocessing section, you will find two links that provide useful information: Cell Observations and Gene Observations (1) which provide you with different metrics about the number of genes and counts per cell. During the preprocessing you can filter your dataset according to four parameters. (2) Min Genes: filter low quality cells with less than the indicated number of genes. (3) Min Cells: filter genes expressed in less than the indicated number of cells to ensure the inclusion of biologically relevant genes. (4) Mitochondrial (MT) Threshold (%): filter cells if the mitochondrial count is above the indicated percentage. (5) Doublet Detection: enable the detection and filtering of doublets generated during library construction. For this tutorial we set 200 Min Genes, 3 Min Cells, 5 % of MT Threshold, and yes for Doublet Detection (default parameters). After you set the parameters click on Run (6).
Preprocessing of single-cell data is a crucial step for accurate identification and characterization of cell types, states, and biological mechanisms. Thus, proper data preprocessing will improve downstream analysis. A good starting point to explore the preprocessing parameters is those recommended by Seurat and Scanpy developers such as 3 for Min Genes, 200 for Min Cells, and 5 % for Mitochondrial Threshold. However, we encourage users to explore multiple parameters since the best will vary across datasets, species, platforms, etc. For instance, a systematic analysis determined that a mitochondrial threshold of 5 % distinguishes high- and low-quality cells in mouse samples whereas in human failed to discriminate in one third of the datasets [1]. Droplet-based single-cell methods generates doublets, which are two cells in one droplet. Thus, elimination of doublets is a key step during preprocessing since single-cell methods assumes that each droplet contains only one cell.
After running preprocessing, scExplorer will show six quality control plots that are useful to determine whether the dataset is suitable for downstream analysis. The three box/violin plots from above showed the number of Genes by Cells, Total Counts, and % of Mitochondrial Genes (1). The three scatter plots from below show the correlation between number of Genes by Counts and % of Mitochondrial Counts vs Total Counts, and the gene variance (2). Is important to note that for datasets with multiple samples, preprocess should be done for each sample separately. Once you are satisfied with the quality of the data click on Continue to Embedding.