PlasmoSim returns two types of output:
This tutorial gives example of both outputs, and shows basic visualisation and interpretation functions.
Load necessary packages:
library(PlasmoSim)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::group_rows() masks kableExtra::group_rows()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
First, we need to define our desired sampling strategy using a data.frame. This specifies the demes (partially isolated sub-populations) that we will sample from, at what time points, and how many hosts will be drawn at random from the population:
# define individual-level sampling via a data.frame
sample_df <- data.frame(deme = 1,
time = 365,
n = 100)
Now we run the main simulation function:
# run simulation
sim1 <- sim_falciparum(H = 1000, # human population size
M = 5000, # adult female mosquito population size
seed_infections = 100, # number of infected hosts at time 0
L = 24, # number of loci
sample_dataframe = sample_df # sampling data.frame
)
#> Running simulation
#>
#> simulation completed in 0.0398071 seconds
#> processing output
Daily output is stored in long format, which makes it easy to produce plots:
# basic plot of prevalence in "I" state (infected)
sim1$daily_values |>
ggplot() + theme_bw() +
geom_line(aes(x = time, y = 100 * I / 1000)) +
ylab("Prevalence (%)")
We can see that prevalence jumped to 10% (100 infected hosts in the population of 1000) on day 13. This is because we set the simulation going with 100 seed infections, which here means new liver-stage infections. The default time from liver-stage to blood-stage (the intrinsic incubation period) is set to by default, hence these all emerging together on day 13. After this point we can see dynamic and stochastic changes in the number infected.
Individual-level output is also returned in long format. We will use
the kable
package to make this slightly easier to read:
# take a peek at basic individual-level output, without the haplotypes column
sim1$indlevel |>
kable() |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) |>
scroll_box(width = "1000px", height = "400px")
time | deme | sample_ID | positive | haplotypes | haplo_ID |
---|---|---|---|---|---|
365 | 1 | 7 | FALSE | NULL | NULL |
365 | 1 | 15 | FALSE | NULL | NULL |
365 | 1 | 31 | FALSE | NULL | NULL |
365 | 1 | 35 | FALSE | NULL | NULL |
365 | 1 | 54 | FALSE | NULL | NULL |
365 | 1 | 71 | FALSE | NULL | NULL |
365 | 1 | 81 | FALSE | NULL | NULL |
365 | 1 | 82 | FALSE | NULL | NULL |
365 | 1 | 89 | FALSE | NULL | NULL |
365 | 1 | 93 | TRUE | 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81 | 7bb9cce5c9911a60f4e7f1e827615768 |
365 | 1 | 95 | FALSE | NULL | NULL |
365 | 1 | 101 | FALSE | NULL | NULL |
365 | 1 | 106 | FALSE | NULL | NULL |
365 | 1 | 113 | FALSE | NULL | NULL |
365 | 1 | 120 | TRUE | 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39 | 468815117ba482205c8aa5d5eca5d558 |
365 | 1 | 136 | FALSE | NULL | NULL |
365 | 1 | 142 | TRUE | 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14 | 5a04f475aaafb7d8d57fb74d15afdf98 |
365 | 1 | 159 | FALSE | NULL | NULL |
365 | 1 | 174 | FALSE | NULL | NULL |
365 | 1 | 176 | TRUE | 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53 | 0d05caa56fd36e9a1b89ec813cef6de7 |
365 | 1 | 183 | FALSE | NULL | NULL |
365 | 1 | 194 | TRUE | 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17 | a7ec324ac08f16ea346bc7d8a3c6d4bc |
365 | 1 | 196 | FALSE | NULL | NULL |
365 | 1 | 204 | FALSE | NULL | NULL |
365 | 1 | 227 | FALSE | NULL | NULL |
365 | 1 | 240 | FALSE | NULL | NULL |
365 | 1 | 263 | FALSE | NULL | NULL |
365 | 1 | 275 | FALSE | NULL | NULL |
365 | 1 | 284 | FALSE | NULL | NULL |
365 | 1 | 287 | TRUE | 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 | 5eba0a09b0d61a2da0d5bd4fffc6516f |
365 | 1 | 288 | FALSE | NULL | NULL |
365 | 1 | 291 | FALSE | NULL | NULL |
365 | 1 | 350 | FALSE | NULL | NULL |
365 | 1 | 354 | FALSE | NULL | NULL |
365 | 1 | 356 | TRUE | 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47 | 1938d4f28086e6687ede6b64163e9e3d |
365 | 1 | 363 | FALSE | NULL | NULL |
365 | 1 | 374 | FALSE | NULL | NULL |
365 | 1 | 381 | FALSE | NULL | NULL |
365 | 1 | 390 | TRUE | 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89, 47, 89 | 1938d4f28086e6687ede6b64163e9e3d, 4c364f6f36b13c5e685e9a946b2c38c1 |
365 | 1 | 394 | TRUE | 68, 42, 42, 68, 68, 68, 42, 68, 68, 42, 42, 68, 68, 42, 42, 68, 68, 42, 42, 68, 68, 42, 42, 68, 42, 42, 42, 68, 42, 42, 42, 68, 42, 42, 42, 68, 68, 42, 42, 68, 68, 42, 42, 68, 68, 42, 42, 68, 68, 42, 42, 68, 68, 42, 42, 68, 68, 68, 42, 68, 42, 68, 42, 68, 42, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 42, 68, 68, 68, 42, 68, 68, 68, 42, 68, 68 | 3a41ce4c33d3dd5808b4c225bfb16708, 2caecb59b554231a5158876dcfdda9c5, d0187d454d53dd08b30854f451e17134, 186af5c5ac9bf632066ea1c6433dac5d |
365 | 1 | 397 | FALSE | NULL | NULL |
365 | 1 | 400 | FALSE | NULL | NULL |
365 | 1 | 401 | FALSE | NULL | NULL |
365 | 1 | 412 | TRUE | 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14 | 5a04f475aaafb7d8d57fb74d15afdf98 |
365 | 1 | 414 | FALSE | NULL | NULL |
365 | 1 | 415 | FALSE | NULL | NULL |
365 | 1 | 417 | FALSE | NULL | NULL |
365 | 1 | 420 | FALSE | NULL | NULL |
365 | 1 | 424 | TRUE | 47, 47, 21, 47, 47, 21, 21, 47, 47, 21, 21, 47, 21, 21, 21, 47, 21, 21, 47, 47, 21, 21, 21, 47, 47, 21, 21, 47, 47, 21, 21, 47, 47, 21, 21, 47, 47, 21, 21, 47, 47, 21, 21, 47, 47, 21, 21, 21, 47, 21, 47, 21, 47, 21, 47, 21, 47, 21, 47, 47, 21, 21, 21, 47, 21, 21, 21, 47, 21, 21, 21, 47, 21, 47, 21, 21, 21, 47, 21, 21, 21, 47, 21, 21, 47, 47, 47, 21, 47, 47, 47, 21, 47, 47, 47, 47 | 67623df4fcabc563d1c6f770084b8404, cd8b80d9780e7baa568ae43a9e427fed, e1a6e7b41ade666e1208f821ff536cb3, e629f1bebe0b02832ecc3fe63f457242 |
365 | 1 | 430 | FALSE | NULL | NULL |
365 | 1 | 436 | FALSE | NULL | NULL |
365 | 1 | 438 | TRUE | 42, 68, 68, 42, 68, 68, 42, 68, 68, 42, 42, 42, 42, 42, 68, 68, 42, 68, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 68, 42, 42, 42, 42, 42, 42, 42, 42, 42, 68, 42, 42, 68, 42, 68, 68, 42, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 42, 68, 68, 42, 68, 68, 42, 68 | d50ccfeb495e6304eb2f50069f03decb, 946e804ff63228416dd65529939c9464, 3db0498dafecf7738dd73ec80d75089e |
365 | 1 | 451 | TRUE | 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86 | a3010796957ecae73d6f2f663164dc88 |
365 | 1 | 456 | FALSE | NULL | NULL |
365 | 1 | 457 | FALSE | NULL | NULL |
365 | 1 | 463 | FALSE | NULL | NULL |
365 | 1 | 473 | FALSE | NULL | NULL |
365 | 1 | 479 | FALSE | NULL | NULL |
365 | 1 | 480 | FALSE | NULL | NULL |
365 | 1 | 498 | FALSE | NULL | NULL |
365 | 1 | 517 | FALSE | NULL | NULL |
365 | 1 | 524 | FALSE | NULL | NULL |
365 | 1 | 538 | FALSE | NULL | NULL |
365 | 1 | 546 | TRUE | 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89 | 4c364f6f36b13c5e685e9a946b2c38c1 |
365 | 1 | 547 | FALSE | NULL | NULL |
365 | 1 | 549 | FALSE | NULL | NULL |
365 | 1 | 551 | TRUE | 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14 | 5a04f475aaafb7d8d57fb74d15afdf98 |
365 | 1 | 573 | FALSE | NULL | NULL |
365 | 1 | 587 | TRUE | 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53 | 0d05caa56fd36e9a1b89ec813cef6de7 |
365 | 1 | 590 | FALSE | NULL | NULL |
365 | 1 | 596 | FALSE | NULL | NULL |
365 | 1 | 641 | FALSE | NULL | NULL |
365 | 1 | 649 | FALSE | NULL | NULL |
365 | 1 | 652 | TRUE | 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81 | 7bb9cce5c9911a60f4e7f1e827615768 |
365 | 1 | 681 | TRUE | 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87 | 8ff6b39ffbb37ad530e1e02242acd424 |
365 | 1 | 688 | FALSE | NULL | NULL |
365 | 1 | 691 | TRUE | 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55 | 83cacee044af7e1edd2c042d6d837a1f |
365 | 1 | 703 | FALSE | NULL | NULL |
365 | 1 | 708 | FALSE | NULL | NULL |
365 | 1 | 726 | FALSE | NULL | NULL |
365 | 1 | 745 | FALSE | NULL | NULL |
365 | 1 | 754 | TRUE | 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42 | e582bccedae1a8c866136c0c59119502 |
365 | 1 | 781 | FALSE | NULL | NULL |
365 | 1 | 785 | FALSE | NULL | NULL |
365 | 1 | 811 | FALSE | NULL | NULL |
365 | 1 | 818 | FALSE | NULL | NULL |
365 | 1 | 830 | FALSE | NULL | NULL |
365 | 1 | 844 | TRUE | 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 64, 88, 64, 88, 64, 88, 64, 88, 64, 88, 88, 88, 88 | 195c30a32df1dd776b6aa4a20bc2f924, 0ce52b07bc461926e17172e8f37ab893 |
365 | 1 | 866 | FALSE | NULL | NULL |
365 | 1 | 872 | TRUE | 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85 | d9fc74b6c8e22d6169af05082e17798d |
365 | 1 | 886 | FALSE | NULL | NULL |
365 | 1 | 897 | TRUE | 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64 | 14e7f028287c07ea9e875126193d6aca |
365 | 1 | 911 | FALSE | NULL | NULL |
365 | 1 | 931 | FALSE | NULL | NULL |
365 | 1 | 947 | FALSE | NULL | NULL |
365 | 1 | 959 | FALSE | NULL | NULL |
365 | 1 | 961 | FALSE | NULL | NULL |
365 | 1 | 963 | FALSE | NULL | NULL |
365 | 1 | 973 | FALSE | NULL | NULL |
365 | 1 | 986 | FALSE | NULL | NULL |
The first few columns tell us when and where (i.e. which deme)
sampling occurred, the ID of the host, and whether they were positive
for malaria parasites. The next two columns give genetic data (positive
samples only). The haplotypes
column gives the raw
information at all 24 loci. Note that each host can be infected with
multiple strains, meaning this element is actually a matrix with one row
for each strain and one column for each locus. We can see this by
printing out the full element for a malaria-positive host:
# print raw genetic data
sim1$indlevel |>
filter(sample_ID == 394) |>
pull(haplotypes)
#> [[1]]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] 68 68 68 68 68 68 42 42 42 68 68 68 68 68
#> [2,] 42 68 42 42 42 42 42 42 42 42 42 42 42 42
#> [3,] 42 42 42 42 42 42 42 42 42 42 42 42 42 42
#> [4,] 68 68 68 68 68 68 68 68 68 68 68 68 68 68
#> [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
#> [1,] 68 42 42 68 68 68 68 68 68 68
#> [2,] 68 68 68 68 68 68 68 42 42 42
#> [3,] 42 42 68 68 68 68 68 68 68 68
#> [4,] 68 68 68 68 68 68 68 68 68 68
Returning to the table, the final column gives the
haplo_ID
. This is a hash of the information contained in
each row of the haplotypes
matrix, meaning each unique
combination of values over all loci will be given a unique name. This
can be very useful when we only care about unique genotypes and not the
locus-by-locus information contained in those genotypes. For example,
for the same individual as before:
# print haplo_ID
sim1$indlevel |>
filter(sample_ID == 394) |>
pull(haplo_ID)
#> [[1]]
#> [1] "3a41ce4c33d3dd5808b4c225bfb16708" "2caecb59b554231a5158876dcfdda9c5"
#> [3] "d0187d454d53dd08b30854f451e17134" "186af5c5ac9bf632066ea1c6433dac5d"
But what do the values in the haplotypes
matrix actually
mean? Although we have described them as haplotypes, they do not (yet)
represent genetic information. Instead, each value specifies the
ancestor that the information is descended from at the start of the
simulation. For example, if we see a value 68 then we know that, at this
locus, the information eventually traces back to the 68th seed infection
at the start of the simulation. There are two reasons for encoding
information like this:
Note that two samples having the value 68 does not imply that they are both direct descendants of the 68th seed infection. Rather, it implies that these samples have a common ancestor some time between the start of the simulation and the present day, and this common ancestor is descended from the 68th seed infection. It is much more likely that the common ancestor is much more recent than going all the way back to the start of the simulation.
Look again at the haplotypes
matrix for this
poly-clonally infected host. Notice that each row contains blocks of the
values 68 and 42:
# print raw genetic data
sim1$indlevel |>
filter(sample_ID == 394) |>
pull(haplotypes)
#> [[1]]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] 68 68 68 68 68 68 42 42 42 68 68 68 68 68
#> [2,] 42 68 42 42 42 42 42 42 42 42 42 42 42 42
#> [3,] 42 42 42 42 42 42 42 42 42 42 42 42 42 42
#> [4,] 68 68 68 68 68 68 68 68 68 68 68 68 68 68
#> [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
#> [1,] 68 42 42 68 68 68 68 68 68 68
#> [2,] 68 68 68 68 68 68 68 42 42 42
#> [3,] 42 42 68 68 68 68 68 68 68 68
#> [4,] 68 68 68 68 68 68 68 68 68 68
This pattern is due to recombination. We could use this matrix to identify blocks of IBD within a sample, or compare this matrix to another to explore IBD between samples.