Processing single and dual-indexed Libraries

March 05, 2024 00:11
Updated

Some sequencing providers will use dual-indexing techniques to process multiple samples in one sequencing run. This involves appending short nucleotide barcodes to either the i5 or i7 index read (or both) when using Illumina sequencing technologies. If your sequencing provider has not demultiplexed your samples, Biologics can be used to process these kinds of datasets using the steps outlined in this guide.

Introduction to various indexed read layouts

The index reads of an Illumina sequencing run are referred to as the i5 and i7 reads, and are found upstream (i5) and downstream (i7) from the R1 and R2 reads that contain your sequence of interest. These index reads may have short nucleotide sequences called Barcodes that allow the corresponding R1/R2 reads to be indexed or "tagged". To learn more about sequencing technologies like barcodes, see Understanding Barcodes and UMIs.

Screenshot_2023-04-26_at_11.51.04_PM.png

Indexed Sequencing Overview Guide (15057455). (n.d.). Retrieved April 27, 2023, from https://support.illumina.com/documentation.html

Adding unique barcodes to the index reads allows different samples to be sequenced concurrently. Following this, the unique index reads are used to separate out the original samples.

Libraries can either be single or dual-indexed. Single indexing refers to when the barcode is found only on the i7 index read, therefore allowing a limited amount of unique combinations or barcodes. Dual indexed libraries contain barcodes at both the i7 and i5 index reads, allowing for many more unique combinations and therefore greater sample multiplexing.

Processing single-indexed libraries in Biologics

Merging the R1 and R2 reads

First merge the R1 and R2 reads together by selecting both the R1 and R2 files and going to Pre-Processing > Set & Merge Paired Reads. For more info, see Merging Paired Reads.

Screenshot_2023-04-27_at_2.25.08_PM.png

Screenshot_2023-04-27_at_12.14.28_AM.png

This will output two documents, one containing the pairs that could be merged and the other containing the pairs that could not be merged. The successfully merged reads will be used in the next step.

Pairing the Merged reads and the i7 read file

Assuming your i7 read file contains your indexing barcodes, select this sequence list and the merged reads sequence list from the above step. Select Pre-Processing > Set & Merge Paired Reads again, however this time choose to Set paired reads only.

Screenshot_2023-04-27_at_2.21.45_PM.png

Screenshot_2023-04-27_at_12.16.27_AM.png

Depending on the names of the two files (the merged R1+R2 reads and the i7 read), one will be designated as the forward (or 5' end) of the associated sequences, while the other will be designated as the reverse (or 3' end).

This is determined by the alphabetical order of the two names:

The sequence list document name that would come first when sorting the two is designated as the forward (5') read.
The sequence list document name that would come last when sorting the two is designated as the reverse (3') read.

This can easily be determined by sorting your sequences by name (ascending):

Screenshot_2023-04-27_at_2.19.00_PM.png
In this case the index reads have been designated as the 3'. This will be important for the next step.

Running Collapse UMI Duplicates & Separate Barcodes

Now that you have paired your i7 index read with your assembled (merged) sequences, you can run Pre-processing > Collapse UMI Duplicates & Separate Barcodes.

For this example, we will use the following as the general format of the i7 read:

CAAGCAGAAGACGGCATACGAGATNNNNNNNN
- Where CAAGCAGAAGACGGCATACGAGAT is the adapter with a length of 24
- Where NNNNNNNN is the barcode. We will use two example barcodes:
  - CGTACTAG
  - TCCTGAGC

Plug these values into the options for Collapse UMI Duplicates & Separate Barcodes below:

Screenshot_2023-04-27_at_12.49.57_AM.png

The options for Discard barcodes containing fewer than: should be adjusted to reflect the sequencing depth of your dataset.

Note that you can designate your barcodes to have different names, like the sample ID. To learn more see How to Specify Barcodes.

Next select the following options:

Screenshot_2023-04-27_at_4.56.33_PM.png

Adapter/Barcode/UMI are present at

This option will depend on whether the name of the index read sequence document came first alphabetically. See the above section for further clarification.

If the index document name was first alphabetically, select the 5' end
If the index document name was last alphabetically, select the 3' end
- This is the case in the example documents from the above section.

Allow single mismatch in UMI, barcode, and TSO

We recommend turning this off for index reads, as the barcodes are short and could differ by 1 bp between distinct barcodes.

All other options

These can be changed according to preference, see our main article Collapse UMI Duplicates and Separate Barcodes for more details. Note that if UMIs are not present (UMI box left unchecked) then steps mentioning UMIs will not be performed.

Next Steps

After running the job, the resulting document name will provide more info on the number of barcodes found. For example: 500K reads from vdj_v1_hs_cd19_b_S1_L001_R_001 (330 barcodes with between 100 and 348 sequences).

If these results are not expected, changing the Adapter/Barcode/UMI are present at option from 5' to 3' or vice versa may solve things. Please reach out to us if you encounter any issues.

The output document can then be analysed using our Single Clone Antibody Analysis pipeline, please see this article: NGS Antibody Analysis for instructions.

Processing dual-indexed read libraries in Biologics

Concatenating the i5 and i7 reads

To do this you will need to use another sequence editing software, for example Geneious Prime. In the image below, two index reads have been selected. Select Tools > Concatenate Sequences or Alignments to bring up the options.

Screenshot_2023-04-27_at_1.54.24_PM.png

Make note of the order and click OK to concatenate the sequences (make sure the option index in sequence list is selected).

Screenshot_2023-04-27_at_1.57.36_PM.png

This should output 1 file with the sequence reads of the i5 and i7 in one line. Upload this concatenated read file into Geneious Biologics. See Uploading Files if you are unsure of how to do this.

Merging the R1 and R2 reads

Using Geneious Biologics, merge the R1 and R2 reads together using Pre-Processing > Set & Merge Paired Reads. For more info, see Merging Paired Reads.

Screenshot_2023-04-27_at_2.25.08_PM.png

Screenshot_2023-04-27_at_12.14.28_AM.png

This will output two documents, one containing the pairs that could be merged and the other containing the pairs that could not be merged. The successfully merged reads will be used in the next step.

Pairing the merged reads and concatenated index read files

Select the concatenated i5- i7 sequence list file and the merged reads sequence list file from the above step. Select Pre-Processing > Set & Merge Paired Reads again, however this time choose to Set paired reads only.

Screenshot_2023-04-27_at_2.16.01_PM.png

The following settings are used to pair reads only:

Screenshot_2023-04-27_at_12.16.27_AM.png

Depending on the names of the two files (the merged R1+R2 reads and the concatenated i5 - i7 read), one will be designated as the forward (or 5' end) of the associated sequences, while the other will be designated as the reverse (or 3' end).

This is determined by the alphabetical order of the two names:

The sequence list document name that would come first when sorting the two is designated as the forward (5') read.
The sequence list document name that would come last when sorting the two is designated as the reverse (3') read.

This can easily be determined by sorting your sequences by name (ascending):

Screenshot_2023-04-27_at_4.39.19_PM.png

In this case the index reads (i5.fastq - i7.fastq) have been designated as 3'. This will be important for the next step.

Running Collapse UMI Duplicates & Separate Barcodes

Now that you have paired your i5- i7 index read with your assembled (merged) sequences, you can run Pre-processing > Collapse UMI Duplicates & Separate Barcodes.

For this example, we will use the following as the general format of the concatenated i5+i7 read:

AATGATACGGCGACCACCGAGATCTACACNNNNNNNNCAAGCAGAAGACGGCATACGAGATNNNNNNNN
- Where AATGATACGGCGACCACCGAGATCTACAC is the adapter of the i5 read and has a length of 29 nt.
- Where NNNNNNNNCAAGCAGAAGACGGCATACGAGATNNNNNNN is the "barcode" with a length of 39 nt.
  - NNNNNNNN is the true i5 Barcode
  - NNNNNNNN is the true i7 Barcode

Note: In this example, we will use the length setting for identifying barcodes. However, if you would like to specify barcodes by name (as explained in the single indexed reads guide above), you can trim off the adapter on both the i5 and i7 reads before concatenating the two index reads. This will result in a final concatenated sequence of NNNNNNNNNNNNNNNN (the i5 and i7 barcodes). You can then designate the barcodes according to sample ID etc as outlined here: How to Specify Barcodes.

Plug these values into the options for Collapse UMI Duplicates & Separate Barcodes below:

Screenshot_2023-04-27_at_4.53.43_PM.png

The options for Discard barcodes containing fewer than: should be adjusted to reflect the sequencing depth of your dataset.

Next select the following options:

Screenshot_2023-04-27_at_4.56.33_PM.png

Adapter/Barcode/UMI are present at

This option will depend on whether the name of the index read sequence document came first alphabetically. See the above section for further clarification.

If the index document name was first alphabetically, select the 5' end
If the index document name was last alphabetically, select the 3' end
- This is the case in the example documents from the above section.

Allow single mismatch in UMI, barcode, and TSO

We recommend turning this off for index reads, as the barcodes are short and could differ by 1 bp between distinct barcodes.

All other options

Next Steps

If these results are not expected, changing the Adapter/Barcode/UMI are present at option from 5' to 3' or vice versa may solve things. Please reach out to us if you encounter any issues.

The output document can then be analysed using our Single Clone Antibody Analysis pipeline, please see this article: NGS Antibody Analysis for instructions.