This article describes the Repair Sequences operation. Sequences can be repaired following annotation, if you have observed that some of them could not be annotated correctly or contain low quality/ambiguous bases. Repairing consists of replacing low quality bases or entire regions in your sequences with the corresponding germline reference database sequences used to annotate them. Repair Sequences is currently only supported for results produced by the Antibody Annotator pipeline.
The purpose of repair is to enable you to salvage sequences that are otherwise difficult to compare against your high quality sequences due to sequencing errors or missing data.
Repair Sequences is an alpha feature and is in active development, so it may not be as mature as our other analysis pipelines. Please contact us if you have any suggestions or encounter any issues.
How to Get Started
To repair your sequences, select a Biologics Annotator Result document in the documents viewer, then select all sequences or a subset that you wish to check for repair.
In the dropdown labelled Post-processing above the sequence selection viewer, click on Repair Sequences... to bring up the operation options.
The Repair Precision section allows you to select how you would like repairs to proceed. The top radio titled Replace entire region for all selected repair regions will mean that the selected regions to repair will be replaced in all sequences, regardless of whether low quality bases are present or not. The Replace entire region when poor quality bases are present will mean that the selected regions to repair will only be replaced in all sequences if one or more poor quality bases are identified within them. Finally, Replace only poor quality bases (where possible) will scan all bases in each region selected for repair and only replace those that are classified as low quality.
Regions to repair lets you choose to repair one or more light and heavy chain regions in your sequences, where regions consist of FR1, CDR1, FR2, CDR2, FR3 and FR4. Note that CDR3 is intentionally excluded due to high variability. To select more than one region, hold Control/Command and click on multiple options. Most commonly, your sequences may have truncated ends. In this case, you could select the FR1 or FR4 regions for repair.
The Base quality threshold input field lets you specify the minimum base quality required for a base to be qualified as high quality. If a base's quality is below this value, it will be classed as low quality and flagged for repair. Set this value to 0 to refrain from using base qualities when determining low quality regions. If your sequences do not have associated qualities, this option will be ignored.
Conditions for Repair
The Repair Sequences operation runs on each sequence selected and determines whether the region(s) you selected require repair. A region requires repair if one of the following conditions is met:
- One or more ambiguous bases are present in the sequence of the region.
- One or more bases with quality scores below the Base quality threshold are present.
- The region was truncated relative to the reference database sequence.
- Only part of the region could be identified.
For the latter two conditions, even if single base repair is selected in the Repair Precision section, the whole region will be repaired. The reason for this is that in these cases there is no reference sequence alignment that spans the entire region to enable single base repair.
Regions/bases will not be repaired if:
- There is no gene annotation matching to the target region (i.e. same chain and region).
- The region is completely missing from the target sequence.
- The overlapping gene does not contain the annotation to repair.
How is Repair Conducted?
Whole Region Repair
If a region has been marked for repair, its sequence is entirely replaced by the sequence of the same region from the closest gene match.
For example, if you have selected to repair FR1 and one of your sequences has an ambiguous nucleotide inside its FR1 annotation, it will be marked for repair. If FR1 was annotated as matching the gene IGKV1-9*01, the FR1 sequence for this gene will be taken from the reference database used to originally annotate your sequences. This reference sequence for FR1 will entirely replace the existing FR1 sequence with the ambiguous nucleotide. When a region is replaced, an annotation is added on the entirety of the replaced region(s) and includes the original sequence information for future reference.
Note that the rest of the sequence to either side of the repaired region will remain unaltered.
Single Base Repair
If a region has been marked for repair, Repair Sequences will step through each base and identify whether any require repair. If they do, the reference bases from the alignment of the repair region against the closest gene match will be used to replace the low quality bases.
Note that insertion bases identified as low quality will not be removed as this may lead to a frameshift.
What Happens Next?
Your selected sequences are annotated using Antibody Annotator again using the same options as used to create the original document. It will re-annotate both repaired sequences, and any selected sequences that did not require repairing. In the new result, you will now see fewer errors across your sequences.
All repairs are annotated on the sequences so you can observe what the Repair Sequences operation has changed. These annotations include the original DNA sequence and translation for region repair and the original base and base quality for base repair. Mouse over the repair annotations to see this information: