Title: | Estimation of Deme Inbreeding Spatial Coefficients with Gradient Descent |
---|---|
Description: | In the early 1970s, Malécot described the relationship between genetic relatedness and physical distance, forming the framework of isolation by distance, or -- put simply -- pairs that are far apart are less likely to mate. Capitalizing on this framework by using measures of Identity by Descent, we produce a deme inbreeding spatial coefficient (DISC) using "vanilla" gradient descent. For the mathematical formulation of the of DISC, see: <TODO>. Briefly, we assume that the relatedness between two locations (demes) in space is given by the average pairwise IBD between the two locations conditional on the distance that seperates them. Further, we assume that geographic distance is scaled by a migration rate, which is a global parameter among all spatial locations. |
Authors: | Nick Brazeau [aut, cre], Bob Verity [aut] |
Maintainer: | Nick Brazeau <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.0 |
Built: | 2024-11-10 06:16:29 UTC |
Source: | https://github.com/nickbrazeau/discent |
The Particle Swarm Optimization (PSO) is a meta-optimization (meta-heuristic) approach that attempts to find optimal start parameters for the user to avoid a grid-search approach as would be best practices for fine-tuning the gradient descent approach.
deme_inbreeding_spcoef_pso( discdat, m_lowerbound = 1e-10, m_upperbound = Inf, fi_lowerinit = 0.001, fi_upperinit = 0.3, learn_lowerinit = 1e-10, learn_upperinit = 0.01, lambda_lowerinit = 1e-08, lambda_upperinit = 10, c1 = 2, c2 = 2, w = 0.73, b1 = 0.9, b2 = 0.999, e = 1e-08, finalsteps = 1000, particlesteps = 100, swarmmoves = 50, swarmsize = 25, thin = 1, normalize_geodist = TRUE, report_sd_progress = TRUE, report_fd_progress = TRUE, return_verbose = FALSE )
deme_inbreeding_spcoef_pso( discdat, m_lowerbound = 1e-10, m_upperbound = Inf, fi_lowerinit = 0.001, fi_upperinit = 0.3, learn_lowerinit = 1e-10, learn_upperinit = 0.01, lambda_lowerinit = 1e-08, lambda_upperinit = 10, c1 = 2, c2 = 2, w = 0.73, b1 = 0.9, b2 = 0.999, e = 1e-08, finalsteps = 1000, particlesteps = 100, swarmmoves = 50, swarmsize = 25, thin = 1, normalize_geodist = TRUE, report_sd_progress = TRUE, report_fd_progress = TRUE, return_verbose = FALSE )
discdat |
dataframe; The genetic-geographic data by deme (K) |
m_lowerbound |
double; lower limit value for the global "m" parameter; any "m" value encounter less than the lower bound will be replaced by the lower bound |
m_upperbound |
double; upper limit value for the global "m" parameter; any "m" value encounter greater than the upper bound will be replaced by the upper bound |
fi_lowerinit |
double; The initial deme-inbreeding parameter lower-bound to parameterize
each swarm-particle ( |
fi_upperinit |
double; As above, the initial deme-inbreeding parameter upper-bound to parameterize
each swarm-particle ( |
learn_lowerinit |
double; The initial deme-inbreeding learning-rate lower-bound to parameterize
each swarm-particle's convergence to the |
learn_upperinit |
double; As above, the initial deme-inbreeding learning-rate upper-bound to parameterize
each swarm-particle ( |
lambda_lowerinit |
double; similar to before, the initial lambda value lower-bound to parameterize
each swarm-particle's convergence to the |
lambda_upperinit |
double; As above, the initial lambda value upper-bound to parameterize
each swarm-particle ( |
c1 |
double; the "cognitive" coefficient from the PSO algorithm. Essentially, it dictates how strongly the prior particle's positions should be weighted against the entire swarm's historical positions in determining the next step of exploration. |
c2 |
double; the "social" coefficient from the PSO. Essentially determines how much weight or influence other particles in the swarm exert on the current particle in determing the next step of exploration. |
w |
double; the "inertia" coefficient from the PSO algorithm. Essentially, how strongly the currently velocity (i.e. the direction the particle is headed) should be weighted relative to the prior particle's and swarm's current positions (i.e. prior directions). |
b1 |
double; exponential decay rates for the first moment estimate in the Adam optimization algorithm |
b2 |
double; exponential decay rates for the second moment estimate in the Adam optimization algorithm |
e |
double; epsilon (error) for stability in the Adam optimization algorithm |
finalsteps |
integer; the number of "final" steps considered for the "final run" of the gradient descent |
particlesteps |
integer; the number of steps that a particle takes in the vanilla gradient descent algorithm given its newly initialized start parameters in order to calculate a cost for the new position. Essentially, we consider the vanilla gradient descent model at the current position for a short number of iterations to estimate the "favorability of the positions current footing" or "traction" of the current position being considered. |
swarmmoves |
integer; the number of iterations or moves that the particles within the swarm are able to explore before selecting the final particle for the "final" run (note, moves is reserved for the swarm's actions, while steps is used to describe iterations in the gradient descent algorithm). |
swarmsize |
integer; the number of particles in the swarm |
thin |
integer; the number of steps to keep as part of the output (i.e. if the user specifies 10, every 10th iteration will be kept) |
normalize_geodist |
boolean; whether geographic distances between demes should be normalized (i.e. Min-Max Feature Scaling: |
report_sd_progress |
boolean; search chain |
report_fd_progress |
boolean; final chain |
return_verbose |
boolean; whether the inbreeding coefficients and migration rate should be returned for every iteration or only for the final iteration. User will typically not want to store every iteration, which can be memory intensive |
Default values are based on ***
Clerc, M., and J. Kennedy. The Particle Swarm — Explosion, Stability, and Convergence in a Multidimensional Complex Space. IEEE Transactions on Evolutionary Computation 6, no. 1 (February 2002): 58–73. Y. H. Shi and R. C. Eberhart, “A modified particle swarm optimizer,” in Proceedings of the IEEE International Conferences on Evolutionary Computation, pp. 69–73, Anchorage, Alaska, USA, May 1998.
The purpose of this statistic is to identify an inbreeding coefficient, or degree of relatedness, for a given location in discrete space. We assume that locations in spaces can be represented as "demes," such that multiple individuals live in the same deme (i.e. samples are sourced from the same location). The expected pairwise relationship between two individuals, or samples, is dependent on the each sample's deme's inbreeding coefficient and the geographic distance between the demes. The program assumes a symmetric distance matrix.
deme_inbreeding_spcoef_vanilla( discdat, start_params = c(), lambda = 0.1, learningrate = 0.001, m_lowerbound = 0, m_upperbound = Inf, b1 = 0.9, b2 = 0.999, e = 1e-08, steps = 1000, thin = 1, normalize_geodist = TRUE, report_progress = TRUE, return_verbose = FALSE )
deme_inbreeding_spcoef_vanilla( discdat, start_params = c(), lambda = 0.1, learningrate = 0.001, m_lowerbound = 0, m_upperbound = Inf, b1 = 0.9, b2 = 0.999, e = 1e-08, steps = 1000, thin = 1, normalize_geodist = TRUE, report_progress = TRUE, return_verbose = FALSE )
discdat |
dataframe; The genetic-geographic data by deme (K) |
start_params |
named double vector; vector of start parameters. |
lambda |
double; A quadratic L2 explicit regularization, or penalty, parameter on "m" parameter. Note, lambda is a scalar such that: |
learningrate |
double; alpha parameter for how much each "step" is weighted in the gradient descent |
m_lowerbound |
double; lower limit value for the global "m" parameter; any "m" value encounter less than the lower bound will be replaced by the lower bound |
m_upperbound |
double; upper limit value for the global "m" parameter; any "m" value encounter greater than the upper bound will be replaced by the upper bound |
b1 |
double; exponential decay rates for the first moment estimate in the Adam optimization algorithm |
b2 |
double; exponential decay rates for the second moment estimate in the Adam optimization algorithm |
e |
double; epsilon (error) for stability in the Adam optimization algorithm |
steps |
integer; the number of steps as we move down the gradient |
thin |
integer; the number of steps to keep as part of the output (i.e. if the user specifies 10, every 10th iteration will be kept) |
normalize_geodist |
boolean; whether geographic distances between demes should be normalized (i.e. Min-Max Feature Scaling: |
report_progress |
boolean; whether or not a progress bar should be shown as you iterate through steps |
return_verbose |
boolean; whether the inbreeding coefficients and migration rate should be returned for every iteration or only for the final iteration. User will typically not want to store every iteration, which can be memory intensive |
The gen.geo.dist dataframe must be named with the following columns: "smpl1"; "smpl2"; "deme1"; "deme2"; "gendist"; "geodist"; which corresponds to: Sample 1 Name; Sample 2 Name; Sample 1 Location; Sample 2 Location; Pairwise Genetic Distance; Pairwise Geographpic Distance. Note, the order of the columns do not matter but the names of the columns must match.
The start_params vector names must match the cluster names (i.e. clusters must be have a name that we can match on for the starting relatedness paramerts). In addition, you must provide a start parameter for "m".
Note: We have implemented coding decisions to not allow the "f" inbreeding coefficients to be negative by using a logit transformation internally in the code.
Gradient descent is performed using the Adam (adaptive moment estimation) optimization approach. Default values for moment decay rates, epsilon, and learning rates are taken from DP Kingma, 2014.
The "vanilla" method does not attempt to optimize start parameters.
Simulated Identity by Descent from Isolation by Distance
IBD_simulation_data
IBD_simulation_data
A dataframe with 45 rows and 6 columns:
Placeholder sample names
Placeholder discrete demes
Simulated genetic distances based on identity by descent
Simulated geographic distances
A toy dataset generated by basic simulation assuming an exponential relationship between relatedness and geographic distance. Data is not representative or generalizable but is simply meant to be used as input for various tests and function explanations
Overload is: function for determining if object is of class DISCresult
is.vanillaDISCresult(x)
is.vanillaDISCresult(x)
x |
DISC result from deme_inbreeding_spcoef function |
overload print() function to print summary only
## S3 method for class 'vanillaDISCresult' print(x, ...)
## S3 method for class 'vanillaDISCresult' print(x, ...)
x |
DISC result from deme_inbreeding_spcoef function |
... |
further arguments passed to or from other methods. |
overload summary() function.
## S3 method for class 'vanillaDISCresult' summary(object, ...)
## S3 method for class 'vanillaDISCresult' summary(object, ...)
object |
DISCresult Simulation |
... |
further arguments passed to or from other methods. |
Method assignment
tidyout(x)
tidyout(x)
x |
DISC result from deme_inbreeding_spcoef function |
Function for taking output of SIR NE and lifting it over
## S3 method for class 'vanillaDISCresult' tidyout(x)
## S3 method for class 'vanillaDISCresult' tidyout(x)
x |
DISC result from deme_inbreeding_spcoef function |