Title: | Reconstruction of Clones from Integration Site Readouts and Visualization |
---|---|
Description: | Tools necessary to reconstruct clonal affiliations from temporally and/or spatially separated measurements of viral integration sites. For this means it utilizes correlations present in the relative readouts of the integration sites. Furthermore, facilities for filtering of the data and visualization of different steps in the pipeline are provided with the package. |
Authors: | Sebastian Wagner [cre, aut]
|
Maintainer: | Sebastian Wagner <[email protected]> |
License: | LGPL |
Version: | 0.6.2 |
Built: | 2025-02-14 04:57:07 UTC |
Source: | https://github.com/cran/MultIS |
Create a stacked area plot that represents the abundance of integration sites over time.
bushmanplot( readouts, aes = NULL, col = NULL, only = NULL, rec = NULL, time = NULL, facet = NULL )
bushmanplot( readouts, aes = NULL, col = NULL, only = NULL, rec = NULL, time = NULL, facet = NULL )
readouts |
The readouts of the integration sites over time. |
aes |
An additional 'ggplot2::aes' object, that will be used as the plots main aesthetic. Note, that the 'geom_area' object overwrites some of these aesthetics. Useful if you later on want to add additional elements to the plot. |
col |
A color palette for integration sites that should be colored. Any integration site not in this named vector will be colored 'gray50'. This takes precedence over 'only' and 'rec'. |
only |
A list of integration sites that should be colored with the default ggplot2 color palette. Any other integration site is colored 'gray50'. Takes precedence over 'rec'. |
rec |
A matrix containing the columns "IS" and "Clone". Integration sites will be colored by the clone they belong to. The colors for the clones are the default ggplot2 ones. |
time |
A function that extracts the time component from the measurement (i.e. column)-names. Will be applied to the measurements. |
facet |
A function that extracts a value from the measurement names and splits the plot into different facets by that values. Useful, for example if you have measurements that are sorted for the cell type and you want to split these into facets. |
Calculate the bw index
bw(distance, clusters, bw_balance = 1, ind_cluster = FALSE)
bw(distance, clusters, bw_balance = 1, ind_cluster = FALSE)
distance |
Distance or Dis-Similarity Matrix |
clusters |
The clustering to evaluate. |
bw_balance |
The balance [0, 1] between inner cluster similarity (Compactness) and the similarity between clusters (Separation). A balance value < 1 increases the importance of Compactness, whereas a value > 1 increases the importance of Separation. |
ind_cluster |
If true, the bw value for all individual clusters is returned. |
A score that describes how well the clustering fits the data.
Converts a matrix to relative abundances
convert_columnwise_relative(data)
convert_columnwise_relative(data)
data |
A matrix of readouts that should be converted to relative abundances |
The matrix with all columns in percent
Evaluate a clustering using the given method
evaluate_clustering(readouts, clustering, sim, method, custom_eval = NULL, ...)
evaluate_clustering(readouts, clustering, sim, method, custom_eval = NULL, ...)
readouts |
The readouts the clustering and similarity matrix are based on. |
clustering |
The clustering to evaluate. |
sim |
The similarity matrix, this clustering is based on. |
method |
The method to evaluate the given clustering. This might be one of "silhouette", "sdindex", "ptbiserial", "dunn", "bw", or "custom'. |
custom_eval |
A custom function to be run for evaluating a clustering. Only used with method "custom". |
... |
Further arguments that are passed to a custom function. |
A score that describes how well the clustering fits the data.
Evaluate a clustering using the bw index
evaluate_clustering_bw(readouts, clustering, sim, ...)
evaluate_clustering_bw(readouts, clustering, sim, ...)
readouts |
The readouts the clustering and similarity matrix are based on. |
clustering |
The clustering to evaluate. |
sim |
The similarity matrix, this clustering is based on. |
... |
Further arguments that are passed to the bw function. |
A score that describes how well the clustering fits the data.
Evaluate a clustering using a custom evaluation function
evaluate_clustering_custom(readouts, clustering, sim, custom_eval, ...)
evaluate_clustering_custom(readouts, clustering, sim, custom_eval, ...)
readouts |
The readouts the clustering and similarity matrix are based on. |
clustering |
The clustering to evaluate. |
sim |
The similarity matrix, this clustering is based on. |
custom_eval |
The custom function to be run for evaluating a clustering. |
... |
Further arguments that are passed to the custom function. |
A score that describes how well the clustering fits the data.
Evaluate a clustering using the dunn index
evaluate_clustering_dunn(readouts, clustering, sim)
evaluate_clustering_dunn(readouts, clustering, sim)
readouts |
The readouts the clustering and similarity matrix are based on. |
clustering |
The clustering to evaluate. |
sim |
The similarity matrix, this clustering is based on. |
A score that describes how well the clustering fits the data.
Evaluate a clustering using the point-biserial index
evaluate_clustering_ptbiserial(readouts, clustering, sim)
evaluate_clustering_ptbiserial(readouts, clustering, sim)
readouts |
The readouts the clustering and similarity matrix are based on. |
clustering |
The clustering to evaluate. |
sim |
The similarity matrix, this clustering is based on. |
A score that describes how well the clustering fits the data.
Evaluate a clustering using the SD-index
evaluate_clustering_sdindex(readouts, clustering, sim)
evaluate_clustering_sdindex(readouts, clustering, sim)
readouts |
The readouts the clustering and similarity matrix are based on. |
clustering |
The clustering to evaluate. |
sim |
The similarity matrix, this clustering is based on. |
A score that describes how well the clustering fits the data.
Evaluate a clustering using the silhouette index
evaluate_clustering_silhouette(readouts, clustering, sim)
evaluate_clustering_silhouette(readouts, clustering, sim)
readouts |
The readouts the clustering and similarity matrix are based on. |
clustering |
The clustering to evaluate. |
sim |
The similarity matrix, this clustering is based on. |
A score that describes how well the clustering fits the data.
Filters a matrix of readouts for the n biggest IS at a certain measurement
filter_at_tp_biggest_n(data, at = "168", n = 50)
filter_at_tp_biggest_n(data, at = "168", n = 50)
data |
The readout matrix to filter. |
at |
A filter for the columns/measurement. Only matching columns/measurements are considered, though all will be returned. |
n |
The number of biggest IS to return. If 'at' matches multiple columns/measurements, the 'rowSum()' over the columns/measurements will be used. For ties, more than 'n' IS may be returned. |
A matrix with only the n biggest IS at the selected measurements.
Filters a matrix of readouts for IS that have a minimum occurrence in some measurement
filter_at_tp_min(data, at = "168", min = 0.02)
filter_at_tp_min(data, at = "168", min = 0.02)
data |
The readout matrix to filter. |
at |
A filter for the columns/measurements. Only matching columns/measurements are considered, though all will be returned. This is a regular expression, so multiple columns/measurements may match it. |
min |
The minimum with which an IS has to occur. This could be either absolute or relative reads. If 'at' matches multiple columns/measurements, the 'rowSum()' over the columns will be used. |
A matrix with only the IS that occur with a minimum at the selected measurements.
Combines columns that have the same name. The columns are joined additively.
filter_combine_measurements(dat, pre_norm = TRUE, post_norm = TRUE)
filter_combine_measurements(dat, pre_norm = TRUE, post_norm = TRUE)
dat |
The readout matrix to filter. |
pre_norm |
Whether to normalize columns before joining them. |
post_norm |
Whether to normalize columns after they are joined. |
A matrix in which columns that had the same name are added and (possibly) normalized.
Shortens the rownames of a readout matrix to the shortest distinct prefix
filter_is_names(dat, by = "[_():]|[^_():]*")
filter_is_names(dat, by = "[_():]|[^_():]*")
dat |
The readout matrix for which the names should be filtered. |
by |
The regexp used to split the names. |
A matrix with the names filtered to the shortest unique prefix.
filter.names
Filters for columns containing a certain substring.
filter_match(dat, match = "E2P11")
filter_match(dat, match = "E2P11")
dat |
The readout matrix to filter. |
match |
The substring that columns must match. |
A readout matrix that only contains the columns whose names contain the substring.
Splits a vector of strings by a given regexp, selects and rearranges the parts and joins them again
filter_measurement_names(dat, elems = c(1, 3), by = "_")
filter_measurement_names(dat, elems = c(1, 3), by = "_")
dat |
The readout matrix to filter. |
elems |
The elements to select. They are rearrange in the order that is given via this argument. |
by |
The string used for splitting the names of the columns. |
A matrix where the names of the columns are split by the given string, rearranged and again joined by the string.
Filters a vector of names and returns the shortest common prefix.
filter_names(names, by = "[_():]|[^_():]*")
filter_names(names, by = "[_():]|[^_():]*")
names |
The vector of names to filter. |
by |
A regexp that splits the string. The default filters by special characters. A split by character can be achieved by using "." as the regexp. |
The names shortened to the shortest prefix (in chunks defined by the regexp) where all names are unique.
Filters for a minimum number of time points/measurements
filter_nr_tp_min(dat, min = 6)
filter_nr_tp_min(dat, min = 6)
dat |
The readout matrix to filter. |
min |
The minimum number of measurements where an IS needs to have a value that is not 0 or NA. |
A matrix with only ISs that have more than 'min' columns that are not 0 or NA.
Removes columns that only contain 0 or NA.
filter_zero_columns(dat)
filter_zero_columns(dat)
dat |
The readout matrix to filter. |
A matrix where columns that where only 0 or NA are filtered out.
Removes rows that only contain 0 or NA.
filter_zero_rows(dat)
filter_zero_rows(dat)
dat |
The readout matrix to filter. |
A matrix where rows that where only 0 or NA are filtered out.
Finds the best number of clusters according to silhouette
find_best_nr_cluster( data, sim, method_reconstruction = "kmedoids", method_evaluation = "silhouette", report = FALSE, parallel = FALSE, best = max, return_all = FALSE, ... )
find_best_nr_cluster( data, sim, method_reconstruction = "kmedoids", method_evaluation = "silhouette", report = FALSE, parallel = FALSE, best = max, return_all = FALSE, ... )
data |
The barcode data in a matrix. |
sim |
A similarity matrix. |
method_reconstruction |
The clustering method to use. |
method_evaluation |
The evaluation method to use. |
report |
Whether the current progress should be reported. Note that this will not work if parallel is set to TRUE. |
parallel |
Whether the clustering should be performed in parallel. |
best |
The method to use to determine the best clustering. |
return_all |
Whether to return the silhouette score for all clusterings. |
... |
passed params to evaluating clustering |
The R^2 value for rows is1 and is2 in matrix dat
Generate a similarity matrix
get_similarity_matrix( readouts, self = NULL, upper = TRUE, method = "rsquared", strategy = "atLeastOne", min_measures = 3L, post_norm = TRUE, parallel = FALSE )
get_similarity_matrix( readouts, self = NULL, upper = TRUE, method = "rsquared", strategy = "atLeastOne", min_measures = 3L, post_norm = TRUE, parallel = FALSE )
readouts |
The readouts that are used to generate the similarity matrix |
self |
Values to set on the diagonal of the matrix. If NULL, the values that are returned by the method are used. |
upper |
Only used with "rsquared". If TRUE, generates the upper triangle. |
method |
The method to use as a string. Possible values for the string are "rsquared" and any method that is accepted by stats::dist. In case of stats::dist we are using the change in the values over time / compartments (columns). |
strategy |
Defines the strategy how to treat 0 / NA values. Considering a pair (two lines), **atLeastOne** ignores all columns, where both are 0. **all** takes all measures into account, independent whether they are 0 or not. |
min_measures |
Minimum number of measures to compare two integration sites (rows). If there are less measures, the similarity entry is NA. |
post_norm |
Normalize the similarity matrix to [0,1] scale. |
parallel |
Whether parallelism should be used. Number of cores is set by option mc.cores. If unset, parallel::detectCores is used. |
A similarity matrix.
This is an adapted version of https://stackoverflow.com/a/8197703
ggplot_colors(n = 6, h = c(0, 360) + 15, l = c(65, 65))
ggplot_colors(n = 6, h = c(0, 360) + 15, l = c(65, 65))
n |
The number of colors in the color palette. If 'n' is a vector, get a color palette, that has 'length(n)' different base colors. For each item in n, the actual colors are equally spaced on in the luminance range 'l' between the upper and lower value. |
h |
The hue range. |
l |
A vector of length 2 that describes the luminance range |
A vector of 'sum(n)' colors strings
Show line plots of all integration sites over time, split into facets by their respective clone.
lineplot_split_clone( bd, rec, order = NULL, mapping = NULL, sim = NULL, silhouette_values = !is.null(sim), singletons = TRUE, zero_values = TRUE )
lineplot_split_clone( bd, rec, order = NULL, mapping = NULL, sim = NULL, silhouette_values = !is.null(sim), singletons = TRUE, zero_values = TRUE )
bd |
The readouts of the integration sites over time. |
rec |
A matrix with columns "IS" and "Clone", that describes for each integration site, which clone it belongs to. |
order |
Integration site names will be converted to a factor. This allows to give the order for this factor, as it influences the order in which the lines are drawn. |
mapping |
A ggplot2 aesthetics mapping that will be merged with the aesthetics used by this plot. |
sim |
A similarity matrix giving the similarities for each pair of integration sites. Used if 'silhouette_values' is 'TRUE' to calculate the silhouette score. |
silhouette_values |
A boolean value that determines whether the silhouette values for each clone should be calculated and added to the facet labels. Requires 'sim' to be present. |
singletons |
Whether to show clones that only have a single integration site. |
zero_values |
How to handle values that are zero. If 'TRUE', they remain zero and subsequently, a the measurement the line drops to zero. If 'FALSE', the values are removed and a gap in the line is shown. |
Each integration site is replaced by its clone. The size of the clone is adjusted to be the mean size of the integration sites within it. For integration sites that are not mentioned in 'rec', we adjust by the average number of integration sites per clone.
normalize_timecourse(readouts, rec, rec_first = FALSE, reduce_clones = TRUE)
normalize_timecourse(readouts, rec, rec_first = FALSE, reduce_clones = TRUE)
readouts |
The integration site readouts to adjust. |
rec |
A matrix with columns "IS" and "Clone" that assigns each integration site to a clone. |
rec_first |
Whether the clones should be put in the first rows of the resulting time course. |
reduce_clones |
Whether to represent the integration sites by their respective clone. |
The adjusted time course.
Plots R^2 of two integration sites
plot_rsquare(dat, is1, is2)
plot_rsquare(dat, is1, is2)
dat |
The matrix that holds the values |
is1 |
The name of the first row |
is2 |
The name of the second row |
A ggplot object, which can be used to further individualize or to plot directly.
Plots the clustering based on a clustering object
## S3 method for class 'clusterObj' plot(x, ...)
## S3 method for class 'clusterObj' plot(x, ...)
x |
The clustering object. |
... |
Further arguments are ignored. |
A ggplot object, which can be used to further individualize or to plot directly.
Plots the similarity of integration sites
## S3 method for class 'ISSimilarity' plot(x, na.rm = TRUE, ...)
## S3 method for class 'ISSimilarity' plot(x, na.rm = TRUE, ...)
x |
The matrix that holds the similarity values |
na.rm |
whether NA values should be deleted beforehand |
... |
Further arguments are ignored. |
A ggplot object, which can be used to further individualize or to plot directly.
Plots time series data, which consists of multiple measurements over time / place (cols) of different clones / integration sites (rows).
## S3 method for class 'timeseries' plot(x, ...)
## S3 method for class 'timeseries' plot(x, ...)
x |
The data to plot. |
... |
Further arguments are ignored. |
A ggplot object, which can be used to further individualize or to plot directly.
Apply a clustering algorithm to a given time course.
reconstruct( readouts, target_communities, method = "kmedoids", sim = MultIS::get_similarity_matrix(readouts = readouts, upper = TRUE), cluster_obj = FALSE )
reconstruct( readouts, target_communities, method = "kmedoids", sim = MultIS::get_similarity_matrix(readouts = readouts, upper = TRUE), cluster_obj = FALSE )
readouts |
The time course for which to find clusters. |
target_communities |
The number of clusters to cluster for. |
method |
Either "kmedoids", "kmeans" or any string permitted as a method for stats::hclust. |
sim |
A similarity matrix used with all methods except "kmeans". |
cluster_obj |
If TRUE, a clusterObject with the readouts, similarity and clustering is returned. |
A matrix with two columns: "Clone" and "IS" or if cluster_obj = TRUE a cluster object, which can be used to plot the clustering.
Calculate the k-medoids clustering for a given time course.
reconstruct_kmedoid( readouts, target_communities, sim = MultIS::get_similarity_matrix(readouts = readouts, self = 0, upper = TRUE) )
reconstruct_kmedoid( readouts, target_communities, sim = MultIS::get_similarity_matrix(readouts = readouts, self = 0, upper = TRUE) )
readouts |
The time course for which to find clusters. |
target_communities |
The number of clusters to cluster for. |
sim |
A similarity matrix for the time course. |
A matrix with two columns: "Clone" and "IS".
Apply a clustering algorithm recursively to a given time course.
reconstruct_recursive( readouts, method = "kmedoids", sim = MultIS::get_similarity_matrix(readouts = readouts, upper = TRUE), split_similarity = 0.7, combine_similarity = 0.9, use_silhouette = TRUE, cluster_obj = FALSE )
reconstruct_recursive( readouts, method = "kmedoids", sim = MultIS::get_similarity_matrix(readouts = readouts, upper = TRUE), split_similarity = 0.7, combine_similarity = 0.9, use_silhouette = TRUE, cluster_obj = FALSE )
readouts |
The time course for which to find clusters. |
method |
Either "kmedoids", "kmeans" or any string permitted as a method for stats::hclust. |
sim |
A similarity matrix used with all methods except "kmeans". |
split_similarity |
Similarity Threshold. If any two elements within a cluster are below this threshold, another split is initiated. |
combine_similarity |
After Splitting, a combination phase is activated. If any two elements between two clusters have a similarity higher than this threshold, the cluster are combined. |
use_silhouette |
If TRUE, silhouette is used to define number of cluster during splitting, otherwise cluster are always split into two new clusters. |
cluster_obj |
If TRUE, a clusterObject with the readouts, similarity and clustering is returned. |
A matrix with two columns: "Clone" and "IS" or if cluster_obj = TRUE a cluster object, which can be used to plot the clustering.
Integration sites will be represented as nodes in the graph, while their mutual similarity is indicated by the line size and opaqueness of the lines between them.
weighted_spring_model( readouts, mapping, gt, sim = get_similarity_matrix(readouts, self = NA, upper = FALSE, parallel = FALSE), rec_pal = NULL, clone_pal = NULL, line_color = "#009900FF", seed = 4711L )
weighted_spring_model( readouts, mapping, gt, sim = get_similarity_matrix(readouts, self = NA, upper = FALSE, parallel = FALSE), rec_pal = NULL, clone_pal = NULL, line_color = "#009900FF", seed = 4711L )
readouts |
The integration site readouts that this spring model is based on. |
mapping |
The reconstructed mapping from clones to integration sites. This is represented as a matrix with two columns "IS" and "Clone". |
gt |
The ground truth mapping from clones to integration sites, if available. Same structure as 'mapping'. |
sim |
The similarity matrix holding the similarities between all integration sites. |
rec_pal |
A named vector color palette holding colors for each integration site. Will be used as the fill color for the nodes. |
clone_pal |
A named vector color palette holding colors for each integration site. Will be used as the line color for the nodes. |
line_color |
The line color to use for the edges of the graph. |
seed |
A seed that will be set using 'set.seed()' to ensure consistent behaviour with the layout that is provided by 'igraph'. |
A ggplot object that contains the generated graph.