Some sequencing providers will use dual-indexing techniques to process multiple samples in one sequencing run. This involves appending short nucleotide barcodes to either the i5 or i7 index read (or both) when using Illumina sequencing technologies. If your sequencing provider has not demultiplexed your samples, Biologics can be used to process these kinds of datasets using the steps outlined in this guide.
Jump to:
- Introduction to various indexed read layouts
- Processing single-indexed libraries in Biologics
-
Processing dual-indexed read libraries in Biologics
Introduction to various indexed read layouts
The index reads of an Illumina sequencing run are referred to as the i5 and i7 reads, and are found upstream (i5) and downstream (i7) from the R1 and R2 reads that contain your sequence of interest. These index reads may have short nucleotide sequences called Barcodes that allow the corresponding R1/R2 reads to be indexed or "tagged". To learn more about sequencing technologies like barcodes, see Understanding Barcodes and UMIs.
Indexed Sequencing Overview Guide (15057455). (n.d.). Retrieved April 27, 2023, from https://support.illumina.com/documentation.html
Adding unique barcodes to the index reads allows different samples to be sequenced concurrently. Following this, the unique index reads are used to separate out the original samples.
Libraries can either be single or dual-indexed. Single indexing refers to when the barcode is found only on the i7 index read, therefore allowing a limited amount of unique combinations or barcodes. Dual indexed libraries contain barcodes at both the i7 and i5 index reads, allowing for many more unique combinations and therefore greater sample multiplexing.
Processing single-indexed libraries in Biologics
Merging the R1 and R2 reads
First merge the R1 and R2 reads together by selecting both the R1 and R2 files and going to Pre-Processing > Set & Merge Paired Reads. For more info, see Merging Paired Reads.
This will output two documents, one containing the pairs that could be merged and the other containing the pairs that could not be merged. The successfully merged reads will be used in the next step.
Pairing the Merged reads and the i7 read file
Assuming your i7 read file contains your indexing barcodes, select this sequence list and the merged reads sequence list from the above step. Select Pre-Processing > Set & Merge Paired Reads again, however this time choose to Set paired reads only.
Depending on the names of the two files (the merged R1+R2 reads and the i7 read), one will be designated as the forward (or 5' end) of the associated sequences, while the other will be designated as the reverse (or 3' end).
This is determined by the alphabetical order of the two names:
- The sequence list document name that would come first when sorting the two is designated as the forward (5') read.
- The sequence list document name that would come last when sorting the two is designated as the reverse (3') read.
This can easily be determined by sorting your sequences by name (ascending):
In this case the index reads have been designated as the 3'. This will be important for the next step.
Running Collapse UMI Duplicates & Separate Barcodes
Now that you have paired your i7 index read with your assembled (merged) sequences, you can run Pre-processing > Collapse UMI Duplicates & Separate Barcodes.
For this example, we will use the following as the general format of the i7 read:
-
CAAGCAGAAGACGGCATACGAGATNNNNNNNN
- Where CAAGCAGAAGACGGCATACGAGAT is the adapter with a length of 24
-
Where NNNNNNNN is the barcode. We will use two example barcodes:
- CGTACTAG
- TCCTGAGC
Plug these values into the options for Collapse UMI Duplicates & Separate Barcodes below:
The options for Discard barcodes containing fewer than: should be adjusted to reflect the sequencing depth of your dataset.
Note that you can designate your barcodes to have different names, like the sample ID. To learn more see How to Specify Barcodes.
Next select the following options:
Adapter/Barcode/UMI are present at
This option will depend on whether the name of the index read sequence document came first alphabetically. See the above section for further clarification.
- If the index document name was first alphabetically, select the 5' end
- If the index document name was last alphabetically, select the 3' end
- This is the case in the example documents from the above section.
Allow single mismatch in UMI, barcode, and TSO
We recommend turning this off for index reads, as the barcodes are short and could differ by 1 bp between distinct barcodes.
All other options
These can be changed according to preference, see our main article Collapse UMI Duplicates and Separate Barcodes for more details. Note that if UMIs are not present (UMI box left unchecked) then steps mentioning UMIs will not be performed.
Next Steps
After running the job, the resulting document name will provide more info on the number of barcodes found. For example: 500K reads from vdj_v1_hs_cd19_b_S1_L001_R_001 (330 barcodes with between 100 and 348 sequences).
If these results are not expected, changing the Adapter/Barcode/UMI are present at option from 5' to 3' or vice versa may solve things. Please reach out to us if you encounter any issues.
The output document can then be analysed using our Single Clone Antibody Analysis pipeline, please see this article: NGS Antibody Analysis for instructions.
Processing dual-indexed read libraries in Biologics
Concatenating the i5 and i7 reads
To do this you will need to use another sequence editing software, for example Geneious Prime. In the image below, two index reads have been selected. Select Tools > Concatenate Sequences or Alignments to bring up the options.
Make note of the order and click OK to concatenate the sequences (make sure the option index in sequence list is selected).
This should output 1 file with the sequence reads of the i5 and i7 in one line. Upload this concatenated read file into Geneious Biologics. See Uploading Files if you are unsure of how to do this.
Merging the R1 and R2 reads
Using Geneious Biologics, merge the R1 and R2 reads together using Pre-Processing > Set & Merge Paired Reads. For more info, see Merging Paired Reads.
This will output two documents, one containing the pairs that could be merged and the other containing the pairs that could not be merged. The successfully merged reads will be used in the next step.
Pairing the merged reads and concatenated index read files
Select the concatenated i5- i7 sequence list file and the merged reads sequence list file from the above step. Select Pre-Processing > Set & Merge Paired Reads again, however this time choose to Set paired reads only.
The following settings are used to pair reads only:
Depending on the names of the two files (the merged R1+R2 reads and the concatenated i5 - i7 read), one will be designated as the forward (or 5' end) of the associated sequences, while the other will be designated as the reverse (or 3' end).
This is determined by the alphabetical order of the two names:
- The sequence list document name that would come first when sorting the two is designated as the forward (5') read.
- The sequence list document name that would come last when sorting the two is designated as the reverse (3') read.
This can easily be determined by sorting your sequences by name (ascending):
In this case the index reads (i5.fastq - i7.fastq) have been designated as 3'. This will be important for the next step.
Running Collapse UMI Duplicates & Separate Barcodes
Now that you have paired your i5- i7 index read with your assembled (merged) sequences, you can run Pre-processing > Collapse UMI Duplicates & Separate Barcodes.
For this example, we will use the following as the general format of the concatenated i5+i7 read:
-
AATGATACGGCGACCACCGAGATCTACACNNNNNNNNCAAGCAGAAGACGGCATACGAGATNNNNNNNN
- Where AATGATACGGCGACCACCGAGATCTACAC is the adapter of the i5 read and has a length of 29 nt.
-
Where NNNNNNNNCAAGCAGAAGACGGCATACGAGATNNNNNNN is the "barcode" with a length of 39 nt.
- NNNNNNNN is the true i5 Barcode
- NNNNNNNN is the true i7 Barcode
Note: In this example, we will use the length setting for identifying barcodes. However, if you would like to specify barcodes by name (as explained in the single indexed reads guide above), you can trim off the adapter on both the i5 and i7 reads before concatenating the two index reads. This will result in a final concatenated sequence of NNNNNNNNNNNNNNNN (the i5 and i7 barcodes). You can then designate the barcodes according to sample ID etc as outlined here: How to Specify Barcodes.
Plug these values into the options for Collapse UMI Duplicates & Separate Barcodes below:
The options for Discard barcodes containing fewer than: should be adjusted to reflect the sequencing depth of your dataset.
Next select the following options:
Adapter/Barcode/UMI are present at
This option will depend on whether the name of the index read sequence document came first alphabetically. See the above section for further clarification.
- If the index document name was first alphabetically, select the 5' end
- If the index document name was last alphabetically, select the 3' end
- This is the case in the example documents from the above section.
Allow single mismatch in UMI, barcode, and TSO
We recommend turning this off for index reads, as the barcodes are short and could differ by 1 bp between distinct barcodes.
All other options
These can be changed according to preference, see our main article Collapse UMI Duplicates and Separate Barcodes for more details. Note that if UMIs are not present (UMI box left unchecked) then steps mentioning UMIs will not be performed.
Next Steps
After running the job, the resulting document name will provide more info on the number of barcodes found. For example: 500K reads from vdj_v1_hs_cd19_b_S1_L001_R_001 (330 barcodes with between 100 and 348 sequences).
If these results are not expected, changing the Adapter/Barcode/UMI are present at option from 5' to 3' or vice versa may solve things. Please reach out to us if you encounter any issues.
The output document can then be analysed using our Single Clone Antibody Analysis pipeline, please see this article: NGS Antibody Analysis for instructions.