![]() set_figure_params ( 'scvelo' ) # get the proportion of spliced and unspliced count umap ( adata, n_components = 2 ) # housekeeping var_names_make_unique () # get embeddings scVelo provides multiple modes when calculating velocity, here we use the dynamical mode as an example.Īdata. With the AnnData object with the spliced and unspliced layers in hand, we can run scVelo to generate the velocity graph. We tested many strategies to assign ambiguous count, and we found that as long as the rule of assigning ambiguous count relates to the proportion of S to U, the velocity graph undergoes only small changes. In this data, ambiguous counts make up only 4% of the total deduplicated UMIs. This strategy is widely adopted by other tools and makes the assumption that ambiguous UMIs are most likely to derive from spliced transcripts. When using the “velocity” output format, we use S+A as the spliced counts, and U as the unsplcied counts. We load the packages we will use in the following procedure.įrydir = "pancreas_quant_res" e2n_path = "data/mm10-2.1.0_geneid_to_name.txt" adata = load_fry ( frydir, output_format = "velocity" ) e2n = dict () adata. We will also show that, when using same set of genes and cells as in the scVelo tutorial, the velocity graph generated from the alevin-fry count matrix (below we call it as the fry dataset) is very similar to the scVelo tutorial output, even though these two datasets are generated by completely different toolchains.įirst, we’ll load our favorite Python environment you can use IPython, or a Jupyter notebook. Subsequent velocity estimation is done using scVelo. Sound compicated? Don’t worry, the loadFry() function in the fishpond R package and RNA velocity estimation The first num_genes columns correspond to the spliced counts, the next num_genes columns correspond to the unspliced counts, and the final num_genes columns correspond to the ambiguous counts. All of these counts are packed into the same output matrix. Importantly, the dimensions of this matrix will be num_cells x 3 num_genes - the factor of 3 arises because this quantification mode separately attributes UMIs to _spliced, or unspliced (in this case, intronic) gene sequence, or as ambiguous (not able to be confidently assigned to the spliced or unspliced category). Within this directory, there is a subdirectory named alevin with 3 files: quants_mat.mtx, quants_mat_cols.txt, and quants_mat_rows.txt which correspond, respectively, to the count matrix, the gene names for each column of this matrix, and the corrected, filtered cell barcodes for each row of this matrix. This happened since the transcript-to-gene mapping we provided, t2g_3col.tsv, had 3 columns instead of 2, to allow specifying the splicing state of each transcript. The usa_mode flag says that our data was processed in Unspliced-Spliced-Ambiguous mode. ![]() If you use conda or anaconda, you can configure the environment by running the follow command in your terminal: To infer the RNA-velocity, you will need scVelo python package and its dependencies installed. gffread CLI is needed for converting genes’ Entrez ID to the official gene name. Having bedtools installed will make pyroe run faster. Alternatively, if you are a Python lover, you can use pyroe to prepare the splici reference. To prepare the reference spliced+intron ( splici) transcriptome, you will need a recent version of R and the packages roe. As always, it’s best to use as recent versions of these tools as possible. Preliminaries and overview of the pipelineįor this pipeline, you will need salmon (>= v1.5.0), and alevin-fry (>= v0.3.0). In this tutorial, we will show the whole analysis pipeline, starting from the raw FASTQ files to the gorgeous velocity plots (generated by scVelo) that you may like to include in your next analysis or paper. Here we provide an end-to-end tutorial describing how to perform an RNA-velocity analysis for a 10x Chromium dataset. That is, the solution presented in that approach for controlling the spurious mapping to spliced transcripts of sequenced fragments arising from introns (in the absence of full decoy) basically gives us the preprocessing results we need to perform an RNA-velocity analysis “for free”. In this post, we will discuss an additional advantage brought by the Unspliced-Spliced-Ambiguous (USA) mode introduced in alevin-fry 0.3.0 and later. Recently, RNA-velocity estimation has becomes increasingly popular tool in single-cell RNA seq analysis. on alevin-fry An introduction to RNA-velocity using alevin-fry.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |