This article discusses setting paired reads without merging, which can be useful if you find that merge rates are low with BBMerge. Merging paired reads is closely related, and discussed here.
The below video gives a general introduction to pre-processing in Biologics. The first few videos in our Getting Started series may also be helpful, linked here.
Why use paired reads and not merged reads?
If your dataset has very low merge rates with BBMerge, you can instead pair the reads and feed these paired reads straight into the Biologics Annotation Pipelines. All of the Annotation pipelines can assemble paired reads together using the underlying logic of the reference database, leading to a better merge rate for antibody sequences.
- Note: this does not apply to the Antibody Annotator setting for
Both chains in associated sequences. In that case, the pairing of reads is used to represent paired heavy-light chains, rather than reads covering a single chain which can be merged/assembled.
Using Set & Merge Paired Reads
Read files from paired-end sequencing are required to be paired prior to assembly and this can be done using the Set & merge paired reads operation.
To set paired reads, select one or more sequence list documents and click the Set & merge paired reads option under the Pre-processing menu. Below are the pairing options available for sequence list documents:
-
Pairs of documents (most common) - The sequence lists are paired together and each read in one list is paired with the read at the same position in the other list. You will need to select 2 or more sequence lists to pair the sequences.
- Note: Lists will be assigned to "forward" or "reverse" in the order they appear in the options for Read Orientation according to the alphabetic order of the lists selected. For example, the list that comes first alphabetically will be considered the list that contains reverse reads for the Illumina mate ends - outward pointing option
- Interlaced sequences within each document - This is where both reads in a pair are in the same file, one after the other i.e. the first read is paired with the second, the third is paired with the fourth, etc. You need to select a sequence list containing an even number of reads to interlace them.
- Split each sequence in half (Not recommended)- This method was primarily used by the Polonator, an older sequencing technology. Use this when both reads in a pair have been concatenated together into a single read in the fastq file. The read will be split into two equal halves which are treated as pairs. All reads must be of the same length and have an even number of nucleotides to use this option.
- Match sequence names (Not recommended) (for standalone sequences, or within each list) - Pairs are identified by sorting the reads by name, and then pairing adjacent reads when the two names differ by a single character as long as the two different characters are in this list of possible pairs. Use this when the pairs may be in a random order or some reads may have their mate pair missing.
**Note that if you selected standalone sequence documents, you will only have the option to pair reads by Split each sequence in half and Match sequence names (for standalone sequences, or within each list).
As different sequencing technologies use different methods for generating reads, you can choose the appropriate settings for your data.
- Read Orientation - Different sequencing technologies orientate their paired reads differently.
- Expected length of sequenced region - Only applies to paired reads. This is the estimated distance between the outer ends of the reads. All paired read data will have a known expected distance between each pair given the experimental design. It is important you set this to the correct value to achieve good results when assembling with the Annotation pipelines.
- Read technology - This includes the sequencing platform and the types of reads generated.
Finally, select Set paired reads only as the Merge Rate and click Run. This will generate a new paired reads document containing the paired reads.