Package 'dcifer' reference manual

Title:	Genetic Relatedness Between Polyclonal Infections
Description:	An implementation of Dcifer (Distance for complex infections: fast estimation of relatedness), an identity by descent (IBD) based method to calculate genetic relatedness between polyclonal infections from biallelic and multiallelic data. The package includes functions that format and preprocess the data, implement the method, and visualize the results. Gerlovina et al. (2022) <doi:10.1093/genetics/iyac126>.
Authors:	Inna Gerlovina [aut, cre] (ORCID: <https://orcid.org/0000-0002-7772-7473>)
Maintainer:	Inna Gerlovina <[email protected]>
License:	MIT + file LICENSE
Version:	1.5.2
Built:	2026-06-05 06:06:36 UTC
Source:	https://github.com/eppicenter/dcifer

Calculate Allele Frequencies

Description

Calculates population allele frequencies from data, adjusting for COI.

Usage

calcAfreq(dsmp, coi, tol = 1e-04, qstart = 0.5)
calcAfreq(dsmp, coi, tol = 1e-04, qstart = 0.5)

Arguments

dsmp

a list with each element corresponding to one sample.

coi

a vector containing complexity of infection for each sample.

tol

convergence tolerance for frequency estimates.

qstart

a starting value for frequencies.

Value

A list of allele frequencies, where each element is a numeric vector containing frequencies for a single locus.

Examples

coi   <- getCOI(dsmp, lrank = 2)           # estimate COI first
afreq <- calcAfreq(dsmp, coi, tol = 1e-5)
coi   <- getCOI(dsmp, lrank = 2)           # estimate COI first
afreq <- calcAfreq(dsmp, coi, tol = 1e-5)

Dcifer results

Description

Results of relatedness estimation.

Usage

dres
dres

Format

A three-dimensional array with 52 columns, 52 rows, and 4 matrices. Dimension names correspond to sample ID's (rows and columns) and types of results c("estimate", "p_value", "CI_lower", "CI_upper") (matrices).

Sample data

Description

Microhaplotype data from two clinics in Mozambique. Samples are sorted by location.

Usage

dsmp
dsmp

Format

A list of 52 elements (samples), each element is a list of 87 elements (loci), which are integer vectors (alleles).

Read and Reformat Data

Description

readDat and readAfreq read data and population allele frequencies from csv files and reformat them for further processing. formatDat and formatAfreq reformat corresponding data frames. Original data are assumed to be in a long format, with one row per allele.

Usage

readDat(sfile, svar, lvar, avar, ...)

formatDat(dlong, svar, lvar, avar)

readAfreq(afile, lvar, avar, fvar, ...)

formatAfreq(aflong, lvar, avar, fvar)
readDat(sfile, svar, lvar, avar, ...)

formatDat(dlong, svar, lvar, avar)

readAfreq(afile, lvar, avar, fvar, ...)

formatAfreq(aflong, lvar, avar, fvar)

Arguments

sfile

the name of the file containing sample data.

svar

the name of the variable for sample ID.

lvar

the name of the variable for locus/marker.

avar

the name of the variable for allele/haplotype.

...

additional arguments for read.csv().

dlong

a data frame containing sample data.

afile

the name of the file containing population allele frequencies.

fvar

the name of the variable for population allele frequiency.

aflong

a data frame containing population allele frequencies.

Value

For readDat and formatDat, a list with elements corresponding to samples. Each element of the list is itself a list of binary vectors, one vector for each locus. For readAfreq and formatAfreq, a list with elements corresponding to loci. The frequencies at each locus are normalized and sum to 1. Samples, loci, and alleles are ordered by their IDs/names.

Examples

sfile <- system.file("extdata", "MozParagon.csv", package = "dcifer")
dsmp  <- readDat(sfile, svar = "sampleID", lvar = "locus", avar = "allele")

# OR, if the dataset is provided as an R data frame, e.g.
dlong <- read.csv(sfile)
# reformat only:
dsmp <- formatDat(dlong, svar = "sampleID", lvar = "locus", avar = "allele")

afile <- system.file("extdata", "MozAfreq.csv", package = "dcifer")
afreq <- readAfreq(afile, lvar = "locus", avar = "allele", fvar = "freq")

# OR, if allele frequencies are provided as an R data frame, e.g.
aflong <- read.csv(afile)
# reformat only:
afreq <- formatAfreq(aflong, lvar = "locus", avar = "allele", fvar = "freq")

da_upd <- matchAfreq(dsmp, afreq)
dsmp2  <- da_upd$dsmp
afreq2 <- da_upd$afreq

sfile <- system.file("extdata", "MozParagon.csv", package = "dcifer")
dsmp  <- readDat(sfile, svar = "sampleID", lvar = "locus", avar = "allele")

# OR, if the dataset is provided as an R data frame, e.g.
dlong <- read.csv(sfile)
# reformat only:
dsmp <- formatDat(dlong, svar = "sampleID", lvar = "locus", avar = "allele")

afile <- system.file("extdata", "MozAfreq.csv", package = "dcifer")
afreq <- readAfreq(afile, lvar = "locus", avar = "allele", fvar = "freq")

# OR, if allele frequencies are provided as an R data frame, e.g.
aflong <- read.csv(afile)
# reformat only:
afreq <- formatAfreq(aflong, lvar = "locus", avar = "allele", fvar = "freq")

da_upd <- matchAfreq(dsmp, afreq)
dsmp2  <- da_upd$dsmp
afreq2 <- da_upd$afreq

Grid of Parameter Values

Description

Generates a grid of parameter values to evaluate over.

Usage

generateReval(M, rval = NULL, nr = NULL)
generateReval(M, rval = NULL, nr = NULL)

Arguments

M

an integer.

rval

a numeric vector with parameter values for the grid. If not provided, it will be generated by dividing [0, 1] into nr equal intervals.

nr

an integer. Ignored if rval is provided.

Value

A a matrix with M rows and nr + 1 or length(rval) columns.

Examples

reval <- generateReval(M = 2, nr = 1e2)
dim(reval)

reval <- generateReval(M = 2, nr = 1e2)
dim(reval)

Calculate COI

Description

Calculates complexity of infection for a list of samples, using the number of detected alleles.

Usage

getCOI(dsmp, lrank = 2)
getCOI(dsmp, lrank = 2)

Arguments

dsmp

a list with each element corresponding to one sample.

lrank

the rank of the locus that will determine a sample's COI (loci are ranked by the number of detected alleles).

Value

A vector with estimated COI for each sample.

Examples

coi <- getCOI(dsmp, lrank = 2)
coi <- getCOI(dsmp, lrank = 2)

Pairwise Relatedness

Description

Provides pairwise relatedness estimates within a dataset or between two datasets along with optional p-values and confidence intervals (CI).

Usage

ibdDat(
  dsmp,
  coi,
  afreq,
  dsmp2 = NULL,
  coi2 = NULL,
  pval = TRUE,
  confint = FALSE,
  rnull = 0,
  side = c("right", "left", "two-sided"),
  alpha = 0.05,
  nr = 1000,
  reval = NULL
)
ibdDat(
  dsmp,
  coi,
  afreq,
  dsmp2 = NULL,
  coi2 = NULL,
  pval = TRUE,
  confint = FALSE,
  rnull = 0,
  side = c("right", "left", "two-sided"),
  alpha = 0.05,
  nr = 1000,
  reval = NULL
)

Arguments

dsmp

a list with each element corresponding to one sample.

coi

a vector containing complexity of infection for each sample.

afreq

a list of allele frequencies. Each element of the list corresponds to a locus.

dsmp2

a list representing a second dataset.

coi2

a vector with complexities of infection for a second dataset.

pval

a logical value specifying if p-values should be returned.

confint

a logical value specifying if confidence intervals should be returned.

rnull

a null value of relatedness parameter for hypothesis testing (needed if pval = TRUE).

side

a character string specifying if a one-sided ("right" or "left") or a two-sided ("two-sided") hypothesis test should be performed (needed if pval = TRUE). Set to "right" if rnul = 0 and to "left" if rnull = 1.

alpha

significance level for a 1 - α confidence region.

nr

an integer specifying precision of the estimate: resolution of a grid of parameter values ([0, 1] divided into nr equal intervals), over which the likelihood will be calculated. Ignored if non-null reval is provided.

reval

a vector or a single-row matrix. A grid of parameter values, over which the likelihood will be calculated.

Details

For this function, M is set to 1. If confint = FALSE, Newton's method is used to find the estimates, otherwise the likelihood is calculated for a grid of parameter values.

Value

A matrix if pval and confint are FALSE and 3-dimensional arrays otherwise. The matrices are lower triangular if distances are calculated within a dataset. For a 3-dimensional array, stacked matrices contain relatedness estimates, p-values, and endpoints of confidence intervals (if requested).

Examples

coi   <- getCOI(dsmp, lrank = 2)           # estimate COI
afreq <- calcAfreq(dsmp, coi, tol = 1e-5)  # estimate allele frequencies

# subset of samples for faster processing
i1 <- 1:15     # from Maputo
i2 <- 31:40    # from Inhambane
isub <- c(i1, i2)

# matrix is returned
dres1 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = FALSE)
dim(dres1)

# test a null hypothesis H0: r = 0, change precision
dres2 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = TRUE, rnull = 0,
                nr = 1e2)
dim(dres2)

# test H0: r <= 0.2, include 99% confidence intervals
dres3 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = TRUE, confint = TRUE,
                rnull = 0.2, side = "right", alpha = 0.01)
dres3[2, 1, ]

# pairwise relatedness between two datasets, H0: r = 0
drbetween <- ibdDat(dsmp[i1], coi[i1], afreq,
                    dsmp2 = dsmp[i2], coi2 = coi[i2])
dim(drbetween)
drbetween[1, 2, ]
sum(is.na(drbetween[, , 1]))

coi   <- getCOI(dsmp, lrank = 2)           # estimate COI
afreq <- calcAfreq(dsmp, coi, tol = 1e-5)  # estimate allele frequencies

# subset of samples for faster processing
i1 <- 1:15     # from Maputo
i2 <- 31:40    # from Inhambane
isub <- c(i1, i2)

# matrix is returned
dres1 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = FALSE)
dim(dres1)

# test a null hypothesis H0: r = 0, change precision
dres2 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = TRUE, rnull = 0,
                nr = 1e2)
dim(dres2)

# test H0: r <= 0.2, include 99% confidence intervals
dres3 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = TRUE, confint = TRUE,
                rnull = 0.2, side = "right", alpha = 0.01)
dres3[2, 1, ]

# pairwise relatedness between two datasets, H0: r = 0
drbetween <- ibdDat(dsmp[i1], coi[i1], afreq,
                    dsmp2 = dsmp[i2], coi2 = coi[i2])
dim(drbetween)
drbetween[1, 2, ]
sum(is.na(drbetween[, , 1]))

Estimate Relatedness and a Number of Related Strains

Description

Estimates the number of related pairs of strains between two infections along with corresponding relatedness estimates and optional inference.

Usage

ibdEstM(
  pair,
  coi,
  afreq,
  Mmax = 6,
  pval = FALSE,
  confreg = FALSE,
  llik = FALSE,
  rnull = 0,
  side = c("right", "left", "two-sided"),
  alpha = 0.05,
  equalr = FALSE,
  freqlog = FALSE,
  nrs = c(1000, 100, 32, 16, 12, 10),
  revals = NULL,
  tol0 = 1e-09,
  logrs = NULL,
  nevals = NULL,
  nloc = NULL
)
ibdEstM(
  pair,
  coi,
  afreq,
  Mmax = 6,
  pval = FALSE,
  confreg = FALSE,
  llik = FALSE,
  rnull = 0,
  side = c("right", "left", "two-sided"),
  alpha = 0.05,
  equalr = FALSE,
  freqlog = FALSE,
  nrs = c(1000, 100, 32, 16, 12, 10),
  revals = NULL,
  tol0 = 1e-09,
  logrs = NULL,
  nevals = NULL,
  nloc = NULL
)

Arguments

pair

a list of length two containing data for a pair of samples.

coi

a vector containing complexity of infection for each sample.

afreq

a list of allele frequencies. Each element of the list corresponds to a locus.

Mmax

a maximum number of related pairs of strains to evaluate over. If greater than min(coi), will be set to min(coi).

pval, confreg, llik

logical values specifying if p-value, confidence region, and log-likelihood for a range of $r$ values should be returned.

rnull

a null value of relatedness parameter for hypothesis testing (needed if pval = TRUE).

side

alpha

significance level for a 1 - α confidence region.

equalr

a logical value. If TRUE, the same level of relatedness is assumed for M pairs of strains (r₁ = ... = r_M).

freqlog

a logical value indicating if afreq is on the log scale.

nrs

an integer vector where i'th element correspons to M = i and indicates precision of the estimate (resolution of a grid of parameter values). Ignored if non-null revals is provided.

revals

a list where i'th element corresponds to M = i and is a matrix representing a grid of parameter values (a matrix where each column represents a single (r₁, ..., r_M) combination).

tol0

a tolerance value for an estimate to be considered zero.

logrs

a list where i'th element corresponds to M = i and is a list as returned by logReval.

nevals

a vector where i'th element corresponds to M = i and provides the number of relatedness values/combinations to evaluate over.

nloc

the number of loci.

Value

A named list if multiple output logical values are TRUE - or a vector if only rhat = TRUE. The output includes:

a relatedness estimate (numeric vector of length corresponding to the estimated number of related pairs);
a p-value if pval = TRUE;
parameter values from the grid in revals that are within the confidence region if confreg = TRUE;
log-likelihood values for the parameter grid in revals if llik = TRUE.

Examples

coi   <- getCOI(dsmp, lrank = 2)           # estimate COI
afreq <- calcAfreq(dsmp, coi, tol = 1e-5)  # estimate allele frequencies

# two samples
ipair <- c(21, 17)
# for higher COI: c(33, 5): COI = 5-6; c(37, 20): 4-3, c(41, 50): 5-4

Mmax  <- min(coi[ipair])
# choose resolution of the grid for different M
nrs   <- c(1e3, 1e2, 32, 16, 12, 10)[1:Mmax]
revals <- mapply(generateReval, 1:Mmax, nr = nrs)

(res1 <- ibdEstM(dsmp[ipair], coi[ipair], afreq, Mmax = Mmax, equalr = FALSE,
                 reval = revals))
(res2 <- ibdEstM(dsmp[ipair], coi[ipair], afreq, Mmax = Mmax, equalr = TRUE))
# number of related pairs of strains (M')
sum(res1 > 0)
sum(res2 > 0)  # can be 0's

coi   <- getCOI(dsmp, lrank = 2)           # estimate COI
afreq <- calcAfreq(dsmp, coi, tol = 1e-5)  # estimate allele frequencies

# two samples
ipair <- c(21, 17)
# for higher COI: c(33, 5): COI = 5-6; c(37, 20): 4-3, c(41, 50): 5-4

Mmax  <- min(coi[ipair])
# choose resolution of the grid for different M
nrs   <- c(1e3, 1e2, 32, 16, 12, 10)[1:Mmax]
revals <- mapply(generateReval, 1:Mmax, nr = nrs)

(res1 <- ibdEstM(dsmp[ipair], coi[ipair], afreq, Mmax = Mmax, equalr = FALSE,
                 reval = revals))
(res2 <- ibdEstM(dsmp[ipair], coi[ipair], afreq, Mmax = Mmax, equalr = TRUE))
# number of related pairs of strains (M')
sum(res1 > 0)
sum(res2 > 0)  # can be 0's

Relatedness Between Two Samples

Description

Provides estimates of relatedness between a pair of samples along with an optional support curve and inference.

Usage

ibdPair(
  pair,
  coi,
  afreq,
  M,
  rhat = TRUE,
  pval = FALSE,
  confreg = FALSE,
  llik = FALSE,
  maxllik = FALSE,
  rnull = 0,
  side = c("right", "left", "two-sided"),
  alpha = 0.05,
  equalr = FALSE,
  mnewton = NULL,
  freqlog = FALSE,
  nr = 1000,
  reval = NULL,
  tol = NULL,
  logr = NULL,
  neval = NULL,
  inull = NULL,
  nloc = NULL
)
ibdPair(
  pair,
  coi,
  afreq,
  M,
  rhat = TRUE,
  pval = FALSE,
  confreg = FALSE,
  llik = FALSE,
  maxllik = FALSE,
  rnull = 0,
  side = c("right", "left", "two-sided"),
  alpha = 0.05,
  equalr = FALSE,
  mnewton = NULL,
  freqlog = FALSE,
  nr = 1000,
  reval = NULL,
  tol = NULL,
  logr = NULL,
  neval = NULL,
  inull = NULL,
  nloc = NULL
)

Arguments

pair

a list of length two containing data for a pair of samples.

coi

a vector containing complexity of infection for each sample.

afreq

a list of allele frequencies. Each element of the list corresponds to a locus.

M

the number of related pairs of strains.

rhat, pval, confreg, llik, maxllik

logical values specifying if relatedness estimate, p-value, confidence region, log-likelihood for a range of $r$ values, and maximum log-likelihood should be returned.

rnull

a null value of relatedness parameter for hypothesis testing (needed if pval = TRUE).

side

alpha

significance level for a 1 - α confidence region.

equalr

a logical value. If TRUE, the same level of relatedness is assumed for M pairs of strains (r₁ = ... = r_M).

mnewton

a logical value. If TRUE, Newton's method, adapted for a bounded parameter space, will be used to find MLE. Ignored (set to FALSE) if M > 1, confreg = TRUE, or llik = TRUE.

freqlog

a logical value indicating if afreq is on the log scale.

nr

reval

a matrix representing a grid of (r₁, ..., r_M) combinations, over which the likelihood will be calculated. Each column is a single combination.

tol

tolerance for calculating an estimate if mnewton = TRUE. Set to 1/nr if not provided.

logr

a list as returned by logReval with logs of reval and other quantities.

neval

the number of relatedness values/combinations to evaluate over.

inull

an index of the value/column of reval that is closest to rnull.

nloc

the number of loci.

Details

Handling of irregular cases:

Allele with population frequency of 0 is present: locus is skipped (does not contribute any information).
Number of unique alleles at a locus is greater than COI: COI will be increased for that locus only.

Value

A named list if multiple output logical values are TRUE - or a vector if only rhat = TRUE or llik = TRUE. Depending on these logical values, the following quantities are included:

If rhat = TRUE, a relatedness estimate (a vector of length 1 if equalr = TRUE or of length M if equalr = FALSE);
If pval = TRUE, a p-value;
If confreg = TRUE, relatedness parameter values from the grid reval that are within 1 - α confidence region;
If llik = TRUE, log-likelihood values for relatedness parameter grid (provided in reval or determined by nr);
If maxllik = TRUE, maximum log-likelihood.

Examples

coi   <- getCOI(dsmp, lrank = 2)
afreq <- calcAfreq(dsmp, coi, tol = 1e-5)

# two samples
ipair <- c(21, 17)
pair <- dsmp[ipair]
coip <-  coi[ipair]
M    <- 2

res1 <- ibdPair(pair, coip, afreq, M = M, confreg = TRUE, alpha = 0.05,
                equalr = FALSE, reval = revals[[M]])
res2 <- ibdPair(pair, coip, afreq, M = M, llik = TRUE,
                equalr = TRUE, reval = revals[[1]])
res1$rhat
rep(res2$rhat, M)

# plot confidence region
creg <- cbind(res1$confreg, res1$confreg[2:1, ])
plot(creg[1, ], creg[2, ], xlim = c(0, 1), ylim = c(0, 1), pch = 15,
     cex = 0.6, col = "cadetblue3", xlab = expression(hat(r)[1]),
     ylab = expression(hat(r)[2]))
points(res1$rhat, rev(res1$rhat), pch = 16)

# plot log-likelihood
plot(revals[[1]], res2$llik, type = "l", xlab = "r", ylab = "log-likelihood")

ipair <- c(41, 50)
pair <- dsmp[ipair]
coip <-  coi[ipair]

# rtotal at different values of M with and without equality constraint
Mmax <- min(coip)
for (M in 1:Mmax) {
  print(paste0("M = ", M))
  print(c(sum(ibdPair(pair, coip, afreq, M = M, pval = FALSE,
                      equalr = FALSE, reval = revals[[M]])),
          ibdPair(pair, coip, afreq, M = M, pval = FALSE, equalr = TRUE)*M))
  cat("\n")
}

# M = 1
# log-likelihood for specific r values
ibdPair(pair, coip, afreq, M = 1, rhat = FALSE, pval = FALSE, llik = TRUE,
        reval = c(0, 0.15, 0.38, 1))
# grid vs Newton's method
system.time(
  ibdPair(pair, coip, afreq, M = 1, mnewton = TRUE,  tol = 1e-5))
system.time(
  ibdPair(pair, coip, afreq, M = 1, mnewton = FALSE, nr  = 1e5))

coi   <- getCOI(dsmp, lrank = 2)
afreq <- calcAfreq(dsmp, coi, tol = 1e-5)

# two samples
ipair <- c(21, 17)
pair <- dsmp[ipair]
coip <-  coi[ipair]
M    <- 2

res1 <- ibdPair(pair, coip, afreq, M = M, confreg = TRUE, alpha = 0.05,
                equalr = FALSE, reval = revals[[M]])
res2 <- ibdPair(pair, coip, afreq, M = M, llik = TRUE,
                equalr = TRUE, reval = revals[[1]])
res1$rhat
rep(res2$rhat, M)

# plot confidence region
creg <- cbind(res1$confreg, res1$confreg[2:1, ])
plot(creg[1, ], creg[2, ], xlim = c(0, 1), ylim = c(0, 1), pch = 15,
     cex = 0.6, col = "cadetblue3", xlab = expression(hat(r)[1]),
     ylab = expression(hat(r)[2]))
points(res1$rhat, rev(res1$rhat), pch = 16)

# plot log-likelihood
plot(revals[[1]], res2$llik, type = "l", xlab = "r", ylab = "log-likelihood")

ipair <- c(41, 50)
pair <- dsmp[ipair]
coip <-  coi[ipair]

# rtotal at different values of M with and without equality constraint
Mmax <- min(coip)
for (M in 1:Mmax) {
  print(paste0("M = ", M))
  print(c(sum(ibdPair(pair, coip, afreq, M = M, pval = FALSE,
                      equalr = FALSE, reval = revals[[M]])),
          ibdPair(pair, coip, afreq, M = M, pval = FALSE, equalr = TRUE)*M))
  cat("\n")
}

# M = 1
# log-likelihood for specific r values
ibdPair(pair, coip, afreq, M = 1, rhat = FALSE, pval = FALSE, llik = TRUE,
        reval = c(0, 0.15, 0.38, 1))
# grid vs Newton's method
system.time(
  ibdPair(pair, coip, afreq, M = 1, mnewton = TRUE,  tol = 1e-5))
system.time(
  ibdPair(pair, coip, afreq, M = 1, mnewton = FALSE, nr  = 1e5))

Logarithms of `reval`

Description

Calculates logarithms of reval and 1 - reval, as well as other associated quantities.

Usage

logReval(reval, M = NULL, neval = NULL, equalr = FALSE)
logReval(reval, M = NULL, neval = NULL, equalr = FALSE)

Arguments

reval

a matrix representing a grid of (r₁, ..., r_M) combinations, over which the likelihood will be calculated. Each column is a single combination.

M

the number of related pairs of strains.

neval

the number of relatedness values/combinations to evaluate over.

equalr

a logical value. If TRUE, the same level of relatedness is assumed for M pairs of strains (r₁ = ... = r_M).

Details

For equalr = TRUE relatedness estimation, reval should be a 1 x neval matrix.

Value

A list of length 5 that contains log(reval), log(1 - reval), the number of reval = 1 for each column, the number of 0 < reval < 1 for each column, and sum(log(1 - reval[reval < 1])) for each column.

Examples

reval <- generateReval(M = 2, nr = 1e2)
logr  <- logReval(reval, M = 2, equalr = FALSE)

reval <- generateReval(M = 1, nr = 1e3)
logr3  <- logReval(reval, M = 3, equalr = TRUE)
logr1  <- logReval(reval, M = 1)
all(logr3$sum1r == logr1$sum1r*3)

reval <- generateReval(M = 2, nr = 1e2)
logr  <- logReval(reval, M = 2, equalr = FALSE)

reval <- generateReval(M = 1, nr = 1e3)
logr3  <- logReval(reval, M = 3, equalr = TRUE)
logr1  <- logReval(reval, M = 1)
all(logr3$sum1r == logr1$sum1r*3)

Match Samples and Allele Frequencies

Description

Checks if the list containing sample data is conformable to provided population allele frequencies and reformats it if needed.

Usage

matchAfreq(dsmp, afreq, minfreq = 1e-04)
matchAfreq(dsmp, afreq, minfreq = 1e-04)

Arguments

dsmp

a list with each element corresponding to one sample.

afreq

a list of allele frequencies. Each element of the list corresponds to a locus.

minfreq

an allele frequency to assign to alleles that are present in dsmp but not in afreq.

Details

The function reorders loci and alleles in dsmp to match those in afreq and inserts alleles into dsmp if they are present in afreq and not in dsmp; doesn't handle cases when alleles are present in dsmp but not in afreq. Allele names are required for this procedure.

Value

A named list of length 2 containing updated and matching dsmp and afreq.

Examples

afile  <- system.file("extdata", "MozAfreq.csv", package = "dcifer")
afreq  <- readAfreq(afile, lvar = "locus", avar = "allele", fvar = "freq")
da_upd <- matchAfreq(dsmp, afreq, minfreq = 1e-3)
dsmp2  <- da_upd$dsmp
afreq2 <- da_upd$afreq

afile  <- system.file("extdata", "MozAfreq.csv", package = "dcifer")
afreq  <- readAfreq(afile, lvar = "locus", avar = "allele", fvar = "freq")
da_upd <- matchAfreq(dsmp, afreq, minfreq = 1e-3)
dsmp2  <- da_upd$dsmp
afreq2 <- da_upd$afreq

Combine two matrices

Description

Creates a single matrix from two lower or upper triangular matrices.

Usage

mixMat(mat1, mat2, lower = TRUE)
mixMat(mat1, mat2, lower = TRUE)

Arguments

mat1, mat2

square matrices of the same dimensions.

lower

a logical value indicating if the resulting matrix should be comprised of lower triangular matrices.

Value

A square matrix.

Examples

x <- matrix(1:16, 4)
y <- matrix(101:116, 4)
x[upper.tri(x, diag = TRUE)] <- NA
mixMat(x, y)

x <- matrix(1:16, 4)
y <- matrix(101:116, 4)
x[upper.tri(x, diag = TRUE)] <- NA
mixMat(x, y)

Colorbar

Description

Creates a colorbar for a plot.

Usage

plotColorbar(
  rlim = c(0, 1),
  by = 0.1,
  at = NULL,
  horiz = FALSE,
  col = grDevices::hcl.colors(301, "YlGnBu", rev = TRUE),
  ...
)
plotColorbar(
  rlim = c(0, 1),
  by = 0.1,
  at = NULL,
  horiz = FALSE,
  col = grDevices::hcl.colors(301, "YlGnBu", rev = TRUE),
  ...
)

Arguments

rlim

the range of values to be represented by colors.

by

increment size for tickmark locations. Ignored if at is provided.

at

a vector of tickmark locations.

horiz

a logical value specifying if the colorbar should be drawn horizontally.

col

the colors for the colorbar.

...

other graphical parameters.

Details

The colorbar will fill the whole plotting region, which needs to be specified outside of this function to control proportions and location of the colorbar (see examples). To match the colors in the main plot, rlim values should be the same for plotRel and plotColorbar; if rlim = NULL or rlim = NA in plotRel, provide the actual range of relatedness estimates for plotColorbar (see examples).

Value

NULL; called for plotting.

Examples

parstart <- par(no.readonly = TRUE)   # save starting graphical parameters

# colorbar on the side of the main plot
layout(matrix(1:2, 1), width = c(7, 1))
par(mar = c(2, 0, 2, 0) + 0.1)
# make symmetric matrix
dmat <- dres[, , "estimate"]
dmat[upper.tri(dmat)] <- t(dmat)[upper.tri(t(dmat))]
isig <- which(dres[, , "p_value"] <= 0.05, arr.ind = TRUE)
plotRel(dmat, draw_diag = TRUE, isig = rbind(isig, isig[, 2:1]))
abline(v = 26, h = 26, col = "gray45", lty = 5)
par(mar = c(2, 1, 2, 2) + 0.1)
plotColorbar()

# shorter colorbar, tick mark locations provided
par(mar = c(2, 0, 2, 0) + 0.1)
plotRel(dmat, draw_diag = TRUE, isig = rbind(isig, isig[, 2:1]))
par(mar = c(5, 0.5, 5, 2.5) + 0.1)
plotColorbar(at = c(0.0625, 0.125, 0.25, 0.5, 0.78))
par(parstart)

# triangular matrix, inset horizontal colorbar
par(mar = c(1, 1, 1, 1))
plotRel(dres, rlim = NULL, draw_diag = TRUE, border_diag = 1, alpha = 0.05)
par(fig = c(0.3, 1.0, 0.73, 0.83), new = TRUE)
rlim <- range(dres[, , 1], na.rm = TRUE)
plotColorbar(rlim = rlim, at = c(0.2, 0.4, 0.6, 0.8), horiz = TRUE)
par(parstart)

parstart <- par(no.readonly = TRUE)   # save starting graphical parameters

# colorbar on the side of the main plot
layout(matrix(1:2, 1), width = c(7, 1))
par(mar = c(2, 0, 2, 0) + 0.1)
# make symmetric matrix
dmat <- dres[, , "estimate"]
dmat[upper.tri(dmat)] <- t(dmat)[upper.tri(t(dmat))]
isig <- which(dres[, , "p_value"] <= 0.05, arr.ind = TRUE)
plotRel(dmat, draw_diag = TRUE, isig = rbind(isig, isig[, 2:1]))
abline(v = 26, h = 26, col = "gray45", lty = 5)
par(mar = c(2, 1, 2, 2) + 0.1)
plotColorbar()

# shorter colorbar, tick mark locations provided
par(mar = c(2, 0, 2, 0) + 0.1)
plotRel(dmat, draw_diag = TRUE, isig = rbind(isig, isig[, 2:1]))
par(mar = c(5, 0.5, 5, 2.5) + 0.1)
plotColorbar(at = c(0.0625, 0.125, 0.25, 0.5, 0.78))
par(parstart)

# triangular matrix, inset horizontal colorbar
par(mar = c(1, 1, 1, 1))
plotRel(dres, rlim = NULL, draw_diag = TRUE, border_diag = 1, alpha = 0.05)
par(fig = c(0.3, 1.0, 0.73, 0.83), new = TRUE)
rlim <- range(dres[, , 1], na.rm = TRUE)
plotColorbar(rlim = rlim, at = c(0.2, 0.4, 0.6, 0.8), horiz = TRUE)
par(parstart)

Plot Relatedness Estimates

Description

Represents a matrix of pairwise relatedness estimates with colors corresponding to the levels of relatedness. Optionally, also outlines results of a hypothesis testing. The plot follows a matrix layout.

Usage

plotRel(
  r,
  rlim = c(0, 1),
  sig = NULL,
  alpha = NULL,
  col = grDevices::hcl.colors(101, "YlGnBu", rev = TRUE),
  draw_diag = FALSE,
  col_diag = "gray",
  border_diag = NA,
  lwd_diag = 0.5,
  border_sig = "orangered2",
  lwd_sig = 1.5,
  xlab = "",
  ylab = "",
  add = FALSE,
  idlab = FALSE,
  side_id = c(1, 2),
  col_id = 1,
  cex_id = 0.5,
  srt_id = NULL,
  ...
)
plotRel(
  r,
  rlim = c(0, 1),
  sig = NULL,
  alpha = NULL,
  col = grDevices::hcl.colors(101, "YlGnBu", rev = TRUE),
  draw_diag = FALSE,
  col_diag = "gray",
  border_diag = NA,
  lwd_diag = 0.5,
  border_sig = "orangered2",
  lwd_sig = 1.5,
  xlab = "",
  ylab = "",
  add = FALSE,
  idlab = FALSE,
  side_id = c(1, 2),
  col_id = 1,
  cex_id = 0.5,
  srt_id = NULL,
  ...
)

Arguments

r

a matrix or a 3-dimensional array as returned by ibdDat.

rlim

the range of values for colors. If NULL or NA, will be calculated from r.

sig

a logical matrix specifying which entries of the relatedness matrix should be outlined or "amplified" (made larger). sig takes precedence over alpha.

alpha

significance level for hypothesis testing; determines relatedness matrix entries to be outlined. Ignored if sig is not NULL.

col

the colors for the range of relatedness values.

draw_diag

a logical value specifying if diagonal cells should be distinguished from others by a separate color.

col_diag, border_diag, lwd_diag

the color for the fill, the color for the border, and the line width for the border of diagonal entries. Ignored if draw_diag = FALSE.

border_sig, lwd_sig

the color and the line width for outlining entries specified by sig or alpha. If border_sig is NA, these entries will be "amplified", their size controlled by lwd_sig.

xlab, ylab

axis labels.

add

a logical value specifying if the graphics should be added to the existing plot (useful for triangular matrices).

idlab

a logical value specifying if sample ID's should be displayed.

side_id

an integer vector specifying plot sides for sample ID labels.

col_id, cex_id

numeric vectors for the color and the size of sample ID labels.

srt_id

a vector of the same length as side_id specifying rotation angles for sample ID labels. If NULL, the labels will be perpendicular to the axes.

...

other graphical parameters.

Value

NULL; called for plotting.

Examples

parstart <- par(no.readonly = TRUE)   # save starting graphical parameters

par(mar = c(0.5, 0.5, 0.5, 0.5))
plotRel(dres, alpha = 0.05, draw_diag = TRUE)

# draw log of p-values in the upper triangle
pmat <- t(log(dres[, , "p_value"]))
pmat[pmat == -Inf] <- min(pmat[is.finite(pmat)])
plotRel(pmat, rlim = NULL, draw_diag = TRUE, col = hcl.colors(101, "PuRd"),
        add = TRUE, col_diag = "slategray2", border_diag = 1)

# symmetric matrix, outline significant in upper triangle, display sample ID
par(mar = c(3, 3, 0.5, 0.5))
dmat  <- dres[, , "estimate"]
dmats <- mixMat(dmat, dmat)
sig <- dres[, , "p_value"] <= 0.05
col_id <- rep(c("orchid4", "cadetblue4"), each = 26)
plotRel(dmats, sig = t(sig), border_sig = "magenta2", draw_diag = TRUE,
        idlab = TRUE, col_id = col_id)
abline(v = 26, h = 26, col = "gray45", lty = 5)

# rotate sample ID labels on all sides, increase size for significant pairs
par(mar = c(3, 3, 3, 3))
sig <- dres[, , "p_value"] <= 0.01
plotRel(dmats, sig = mixMat(sig, sig), border_sig = NA, lwd_sig = 5,
        draw_diag = TRUE, idlab = TRUE, side_id = 1:4, col_id = col_id,
        srt_id = c(-55, 25, 65, -35))
par(parstart)

parstart <- par(no.readonly = TRUE)   # save starting graphical parameters

par(mar = c(0.5, 0.5, 0.5, 0.5))
plotRel(dres, alpha = 0.05, draw_diag = TRUE)

# draw log of p-values in the upper triangle
pmat <- t(log(dres[, , "p_value"]))
pmat[pmat == -Inf] <- min(pmat[is.finite(pmat)])
plotRel(pmat, rlim = NULL, draw_diag = TRUE, col = hcl.colors(101, "PuRd"),
        add = TRUE, col_diag = "slategray2", border_diag = 1)

# symmetric matrix, outline significant in upper triangle, display sample ID
par(mar = c(3, 3, 0.5, 0.5))
dmat  <- dres[, , "estimate"]
dmats <- mixMat(dmat, dmat)
sig <- dres[, , "p_value"] <= 0.05
col_id <- rep(c("orchid4", "cadetblue4"), each = 26)
plotRel(dmats, sig = t(sig), border_sig = "magenta2", draw_diag = TRUE,
        idlab = TRUE, col_id = col_id)
abline(v = 26, h = 26, col = "gray45", lty = 5)

# rotate sample ID labels on all sides, increase size for significant pairs
par(mar = c(3, 3, 3, 3))
sig <- dres[, , "p_value"] <= 0.01
plotRel(dmats, sig = mixMat(sig, sig), border_sig = NA, lwd_sig = 5,
        draw_diag = TRUE, idlab = TRUE, side_id = 1:4, col_id = col_id,
        srt_id = c(-55, 25, 65, -35))
par(parstart)

Likelihood for U_x, U_y

Description

Calculates log-likelihood for a pair of samples at a single locus.

Usage

probUxUy(
  Ux,
  Uy,
  nx,
  ny,
  probs,
  M,
  logj,
  factj,
  equalr = FALSE,
  mnewton = TRUE,
  reval = NULL,
  logr = NULL,
  neval = NULL
)
probUxUy(
  Ux,
  Uy,
  nx,
  ny,
  probs,
  M,
  logj,
  factj,
  equalr = FALSE,
  mnewton = TRUE,
  reval = NULL,
  logr = NULL,
  neval = NULL
)

Arguments

Ux, Uy

sets of unique alleles for two samples at a given locus. Vectors of indices corresponding to ordered probabilities in probs.

nx, ny

complexity of infection for two samples. Vectors of length 1.

probs

a vector of population allele frequencies (on a log scale) at a given locus. It is not checked if frequencies on a regular scale sum to 1.

M

the number of related pairs of strains.

logj, factj

numeric vectors containing precalculated logarithms and factorials.

equalr

a logical value. If TRUE, the same level of relatedness is assumed for M pairs of strains (r₁ = ... = r_M).

mnewton

a logical value. If TRUE, the coefficients for using Newton's method will be calculated.

reval

a matrix representing a grid of (r₁, ..., r_M) combinations, over which the likelihood will be calculated. Each column is a single combination.

logr

a list of length 5 as returned by logReval.

neval

the number of relatedness values/combinations to evaluate over.

Value

If mnewton = TRUE, a vector of length 2 containing coefficients for fast likelihood calculation;
If mnewton = FALSE, a vector of length neval containing log-likelihoods for a range of parameter values.

Examples

Ux <- c(1, 3, 7)                       # detected alleles at locus t
Uy <- c(2, 7)
coi <- c(5, 6)
aft <- runif(7)                        # allele frequencies for locus t
aft <- log(aft/sum(aft))

logj  <- log(1:max(coi))
factj <- lgamma(0:max(coi) + 1)

# M = 2, equalr = FALSE
M <- 2
reval <- generateReval(M, nr = 1e2)
logr  <- logReval(reval, M = M)
llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj,
                  equalr = FALSE, logr = logr, neval = ncol(reval))
length(llikt)

# M = 2, equalr = TRUE
reval <- matrix(seq(0, 1, 0.001), 1)
logr  <- logReval(reval, M = M, equalr = TRUE)
llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj,
                  equalr = TRUE, logr = logr, neval = ncol(reval))

# M = 1, mnewton = FALSE
M <- 1
reval <- matrix(seq(0, 1, 0.001), 1)
logr  <- logReval(reval, M = M)
llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj,
                  mnewton = FALSE, reval = reval, logr = logr,
                  neval = ncol(reval))

# M = 1, mnewton = TRUE
probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, mnewton = TRUE)

Ux <- c(1, 3, 7)                       # detected alleles at locus t
Uy <- c(2, 7)
coi <- c(5, 6)
aft <- runif(7)                        # allele frequencies for locus t
aft <- log(aft/sum(aft))

logj  <- log(1:max(coi))
factj <- lgamma(0:max(coi) + 1)

# M = 2, equalr = FALSE
M <- 2
reval <- generateReval(M, nr = 1e2)
logr  <- logReval(reval, M = M)
llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj,
                  equalr = FALSE, logr = logr, neval = ncol(reval))
length(llikt)

# M = 2, equalr = TRUE
reval <- matrix(seq(0, 1, 0.001), 1)
logr  <- logReval(reval, M = M, equalr = TRUE)
llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj,
                  equalr = TRUE, logr = logr, neval = ncol(reval))

# M = 1, mnewton = FALSE
M <- 1
reval <- matrix(seq(0, 1, 0.001), 1)
logr  <- logReval(reval, M = M)
llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj,
                  mnewton = FALSE, reval = reval, logr = logr,
                  neval = ncol(reval))

# M = 1, mnewton = TRUE
probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, mnewton = TRUE)

Parameter grid

Description

Precalculated parameter grids for a range of values of M (from 1 to 5).

Usage

revals
revals

Format

A list of length 5, where each element corresponds to a single value of M and is a matrix with M rows. Each column of a matrix is a r₁ = ... = r_M) combination.

Package 'dcifer'

Help Index

Calculate Allele Frequencies

Description

Usage

Arguments

Value

Examples

Dcifer results

Description

Usage

Format

Sample data

Description

Usage

Format

Read and Reformat Data

Description

Usage

Arguments

Value

See Also

Examples

Grid of Parameter Values

Description

Usage

Arguments

Value

Examples

Calculate COI

Description

Usage

Arguments

Value

Examples

Pairwise Relatedness

Description

Usage

Arguments

Details

Value

See Also

Examples

Estimate Relatedness and a Number of Related Strains

Description

Usage

Arguments

Value

See Also

Examples

Relatedness Between Two Samples

Description

Usage

Arguments

Details

Value

See Also

Examples

Logarithms of reval

Description

Usage

Arguments

Details

Value

Examples

Match Samples and Allele Frequencies

Description

Usage

Arguments

Details

Value

See Also

Examples

Combine two matrices

Description

Usage

Arguments

Value

See Also

Examples

Logarithms of `reval`

Likelihood for U_x, U_y