Title: | Functions for working with variant string format |
---|---|
Description: | Contains a series of functions for working with genetic information encoded in variant string format. Includes methods for comparing and manipulating strings. |
Authors: | Bob Verity [aut, cre] |
Maintainer: | Bob Verity <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.7.0 |
Built: | 2025-01-16 15:21:15 UTC |
Source: | https://github.com/mrc-ide/variantstring |
Returns a data.frame of allowed amino acid single-letter codes. These come from IUPAC (International Union of Pure and Applied Chemistry), see here for details.
allowed_amino_acids()
allowed_amino_acids()
Checks that an input string (or a vector of strings) matches the required format for a position string. This is equivalent to a full variant string but with the amino acid information removed, so just giving the gene name(s) and position(s).
check_position_string(x)
check_position_string(x)
x |
a character string or vector of character strings. |
Checks that an input string (or a vector of strings) matches the required format for a variant string.
check_variant_string(x)
check_variant_string(x)
x |
a character string or vector of character strings. |
Compares a target position string against a vector of comparison strings. A match is found if every codon position in every gene of the target is also found within the comparison (irrespective of the observed amino acids).
compare_position_string(target_string, comparison_strings)
compare_position_string(target_string, comparison_strings)
target_string |
a single position string that we want to compare. |
comparison_strings |
a vector of variant strings against which the target is compared. |
Compares a target variant string against a vector of comparison strings. A match is found if every amino acid at every codon position in every gene of the target is also found within the comparison. Note that ambiguous matches may occur if there are multiple heterozygous loci in the comparison. In this case, the target may or may not be within this sample. A match is recorded but a second output also flags this as an ambiguous match.
compare_variant_string(target_string, comparison_strings)
compare_variant_string(target_string, comparison_strings)
target_string |
a single variant string that we want to compare. Cannot contain any heterozygous calls. |
comparison_strings |
a vector of variant strings against which the target is compared. |
Count the number of heterozygous loci in each variant string and return as a vector.
count_het_loci(x)
count_het_loci(x)
x |
a variant string or vector of variant strings. |
Takes a vector of variant strings and strips and information on read counts.
drop_read_counts(x)
drop_read_counts(x)
x |
a variant string or vector of variant strings. |
Takes a vector of variant strings, potentially with information at multiple codon positions or genes, and returns variant strings corresponding to all unique single-locus variants within the input. For example, crt:72_73:C_N/V can be extracted to crt:72:C, crt:73:N, and crt:73:V.
extract_single_locus_variants(x)
extract_single_locus_variants(x)
x |
a vector of variant strings. |
For a variant string with at most one heterozygous locus we can unambiguously define the genotypes that are present in this mixture. This function returns all such component genotypes.
get_consistent_variants(x)
get_consistent_variants(x)
x |
a vector of variant strings. |
Takes a list of data.frames in long form and converts each to position string format.
long_to_position(x)
long_to_position(x)
x |
a list of data.frames. |
Takes a list of data.frames in long form and converts each to variant string format.
long_to_variant(x)
long_to_variant(x)
x |
a list of data.frames. |
Reorders a position string in alphabetical order of genes. This can be useful when checking for duplicated strings as the same information may be presented in a different order.
order_position_string(x)
order_position_string(x)
x |
a position string or vector of position strings. |
Reorders a variant string in alphabetical order of genes, and then alphabetical order of amino acids at each heterozygous locus. This can be useful when checking for duplicated strings as the same information may be presented in a different order.
order_variant_string(x)
order_variant_string(x)
x |
a variant string or vector of variant strings. |
Extract a position string from a variant string by stripping the amino acids.
position_from_variant_string(x)
position_from_variant_string(x)
x |
a character string or vector of character strings. |
Takes a vector of position strings and expands into a list of data.frames containing the same information in long form.
position_to_long(x)
position_to_long(x)
x |
a vector of position strings. |
Given a vector of variant strings and a single position string, subsets all variant strings to only the genes and codons in the position string. Retains read counts at these positions if present.
subset_position(position_string, variant_strings)
subset_position(position_string, variant_strings)
position_string |
a single position string. |
variant_strings |
a variant string or vector of variant strings. |
Takes a vector of variant strings and expands into a list of data.frames containing the same information in long form.
variant_to_long(x)
variant_to_long(x)
x |
a vector of variant strings. |