Setting Paired Reads

December 12, 2022 05:47
Updated

This article discusses setting paired reads without merging. Merging paired reads is closely related, and discussed here.

The below video gives a general introduction to pre-processing in Biologics. The first few videos in our Getting Started series may also be helpful, linked here.

Using Set & Merge Paired Reads

Read files from paired-end sequencing are required to be paired prior to assembly and this can be done using the Set & merge paired reads operation.

To set paired reads, select one or more sequences, or one or more sequence list documents and click the Set & merge paired reads option under the Pre-processing menu. Depending on your sequencing data, so you will need to specify which format should the reads be paired by. Below are the pairing options available for sequence list documents:

Interlaced sequences within each document - This is where both reads in a pair are in the same file, one after the other i.e. the first sequence is paired with the second, the third is paired with the fourth, etc. You need to select a sequence list containing an even number of sequences to interlace them.
Pairs of documents (most common) - The sequence lists are paired together and each sequence in one list is paired with the sequence at the same position in the other list. You will need to select 2 or more sequence lists to pair the sequences.
- Note: Lists will be assigned to "forward" or "reverse" in the order they appear in the options for Relative Orientation according to the alphabetic order of the lists selected. For example, the list that comes first alphabetically will be considered the list that contains reverse reads for the Illumina mate pairs - outward pointing option
Split each sequence in half - Use this when both reads in a pair have been concatenated together into a single sequence in the fastq file. The sequence will be split into two equal halves which are treated as pairs. All sequences must be of the same length and have an even number of nucleotides to use this option.
Match sequence names (for standalone sequences, or within each list) - Pairs are identified by sorting the sequences by name, and then pairing adjacent sequences when the two names differ by a single character as long as the two different characters are in this list of possible pairs. Use this when the pairs may be in a random order or some reads may have their mate pair missing.

**Note that if you selected standalone sequence documents, you will only have the option to pair reads by Split each sequence in half and Match sequence names (for standalone sequences, or within each list).

As different sequencing technology uses different methods of generating reads, you should choose the appropriate settings for your data.

Relative Orientation - Different sequencing technologies orientate their paired reads differently.
Expected distance/Insert size - This is the estimated distance between the outer ends of the reads. All paired read data will have a known expected distance between each pair given the experimental design. It is important you set this to the correct value to achieve good results when assembling.
Read technology - This includes the sequencing platform and the types of reads generated.

Finally, select Set paired reads only as the Output format and click Run. This will generate a new paired reads document containing the paired reads.