Understanding Single Cell technologies: Barcodes and UMIs

June 14, 2023 21:53
Updated

Molecular Barcodes are short nucleotide tags added to sequences of interest during sample preparation to provide information about which cell the sequence came from and/or other features of the sample. Unique Molecular Identifiers (UMIs) are used for quality control and can help identify rare variants, detect differential amplification and also enable you to screen out probable sequencing errors. Geneious Biologics is able to pull out and sort Barcoded data as well as collapse UMIs according to user specifics by using the Collapse UMI Duplicates and Separate Barcodes pipeline before Single Clone Antibody Analysis.

Single Cell Overview

Single Cell technologies like 10X Genomics and BD Rhapsody allow you to partition individual cells into wells or droplets and sequence the mRNA reads of the individual cells. Because of this, the sequencing output is often referred to as an Expression Library. The technology relies on the capture of the mRNA poly-A tail using a poly dT sequence that is attached to a bead.

As can be seen above, there are two other features found on the sequence used to capture mRNAs: a Barcode and a UMI. These are crucial to understanding your sequence library and evaluating the quality of your reads. In general, Single Cell technologies generate paired-end reads that have a Barcode followed by a UMI at either the 3' or 5' end of the parent sequence.

What are Barcodes?

There are two broad types of Barcodes: Feature Barcodes and Cell Barcodes. Both are short nucleotide tags (~16 bp) that are used to "label" sequences. These labels are used to sort sequences that have originated from a single cell source, or to denote some other feature of the cell like the presence of a particular cell surface protein.

Cell Barcode: Sequences with the same cell barcode can be grouped together as coming from the same cell source.

Feature Barcode: Used as an additional tag to indicate the presence of a cell surface protein. As seen below, the Feature Barcode is concatenated to an antibody against the cell surface target. The Feature Barcode can then be assigned to a Cell Barcode via capture to a bead containing the Cell Barcode.

Screen_Shot_2022-08-09_at_12.01.42_AM.png

***Note: Geneious Biologics does not currently support Feature Barcode analysis. Please contact us if you would like to see this capability added.

What are UMIs?

UMIs (Unique Molecular Identifiers) are used for quality control purposes and are also used in other non-Single Cell sequencing kits. Like Barcodes, UMIs are short nucleotide sequences. However, unlike Barcodes the sequences are random with a very low likelihood of a duplicate UMI within a single bead; therefore each mRNA read is assigned a unique UMI.

As the starting material is very low and of poor quality, PCR amplification is needed to generate enough material for High Throughput Sequencing technologies like Illumina. During PCR amplification, base incorporation errors and biased amplification can occur. These stochastic effects in the first rounds of PCR can propagate through to the final sequenced library. UMIs are used to screen out these errors.

Quantitative Analysis

The below figure shows how UMIs can be used to determine the starting mRNA content, as each UMI represents one initial mRNA read.

Screen_Shot_2022-08-09_at_1.24.34_AM__1_.png

Variant Detection

The below figure shows how UMIs can help to distinguish between true and false variants.

Screen_Shot_2022-08-18_at_4.14.17_PM.png

Final Read Format

The below image provides an overview of a generic initial amplification process. Upon reaching the 5’ end of the RNA template during first-strand synthesis, the reverse transcriptase enzyme used in the reaction appends additional non-template nucleotides to the sequence, mostly deoxycytidine residues. This non-template sequence is used as a binding site for a template switching oligonucleotide (TSO). The TSO causes the transcriptase enzyme to switch from transcribing the RNA template sequence to transcribing the TSO sequence, generating the complementary cDNA strand.

Screen_Shot_2022-08-09_at_1.02.34_AM.png

The full-length V(D)J region is often enriched from the amplified cDNA (which includes the constant region) via PCR amplification with primers specific to the Ig constant regions. Enzymatic fragmentation may also be used to "clip off" the constant region. Size selection is then used to pull out reads that span the V(D)J segment prior to library construction. Following this, a sequencing primer (eg. Illumina Read 2) is added along with a sample index via End Repair, A-tailing, Adaptor Ligation, and Sample Index PCR.

References

Single Cell Gene Expression—Official 10x Genomics Support. (n.d.). 10x Genomics. Retrieved August 2, 2022, from https://www.10xgenomics.com/support/single-cell-gene-expression

What are UMIs and why are they used in high-throughput sequencing? | DNA Technologies Core. (n.d.). Retrieved August 2, 2022, from https://dnatech.genomecenter.ucdavis.edu/faqs/what-are-umis-and-why-are-they-used-in-high-throughput-sequencing/