Title: | Compute Global Sensitivity Analysis Indices Using Optimal Transport |
---|---|
Description: | Computing Global Sensitivity Indices from given data using Optimal Transport, as defined in Borgonovo et al (2024) <doi:10.1287/mnsc.2023.01796>. You provide an input sample, an output sample, decide the algorithm, and compute the indices. |
Authors: | Leonardo Chiani [aut, cre, cph] , Emanuele Borgonovo [rev], Elmar Plischke [rev], Massimo Tavoni [rev] |
Maintainer: | Leonardo Chiani <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2025-01-11 09:32:49 UTC |
Source: | https://github.com/pietrocipolla/gsaot |
Calculate lower bounds for Optimal Transport sensitivity indices
lower_bound( y, M, bound = "dummy", dummy_optns = NULL, cost = "L2", discrete_out = FALSE, solver = "sinkhorn", solver_optns = NULL, scaling = TRUE )
lower_bound( y, M, bound = "dummy", dummy_optns = NULL, cost = "L2", discrete_out = FALSE, solver = "sinkhorn", solver_optns = NULL, scaling = TRUE )
y |
An array or a matrix containing the output values. |
M |
A scalar representing the number of partitions for continuous inputs. |
bound |
(default |
dummy_optns |
(default |
cost |
(default |
discrete_out |
(default |
solver |
Solver for the Optimal Transport problem. Currently supported options are:
|
solver_optns |
(optional) A list containing the options for the Optimal Transport solver. See details for allowed options and default ones. |
scaling |
(default |
The function allows the computation of two different lower bounds.
With bound="dummy"
, the function samples from a distribution defined in
dummy_optns
(by default a standard normal), independent from the output
y
and then computes the indices using the algorithm specified in
solver
. Under the hood, lower_bound
calls the other available functions
in the package:
ot_indices_1d()
(for solver="1d
)
ot_indices_wb()
(for solver="wasserstein-bures"
)
ot_indices()
(for solver %in% c("sinkhorn", "sinkhorn_log", "wasserstein")
)
The user can choose the distribution of the dummy variable using the
argument dummy_optns
. dummy_optns
should be a named list with at least
a term called "distr"
defining the sampling function. The other terms in
the list are used as arguments to the sampling function. With
bound="entropic"
, the function computes the lower bound of the entropic
indices. In this case, solver
should be either "sinkhorn"
or
"sinkhorn_log"
An object of class gsaot_indices
for bound="dummy"
or a scalar
for bound="entropic"
.
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) M <- 25 sink_lb <- lower_bound(y, M, bound = "entropic") dummy_lb <- lower_bound(y, M, bound = "dummy") # Custom sampling funtion and network simplex solver dummy_optns <- list(distr = "rgamma", shape = 3) dummy_lb_cust <- lower_bound(y, M, bound = "dummy", dummy_optns = dummy_optns, solver = "transport")
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) M <- 25 sink_lb <- lower_bound(y, M, bound = "entropic") dummy_lb <- lower_bound(y, M, bound = "dummy") # Custom sampling funtion and network simplex solver dummy_optns <- list(distr = "rgamma", shape = 3) dummy_lb_cust <- lower_bound(y, M, bound = "dummy", dummy_optns = dummy_optns, solver = "transport")
ot_indices
calculates sensitivity indices using Optimal
Transport (OT) for a multivariate output sample y
with respect to input
data x
. Sensitivity indices measure the influence of inputs on outputs,
with values ranging between 0 and 1.
ot_indices( x, y, M, cost = "L2", discrete_out = FALSE, solver = "sinkhorn", solver_optns = NULL, scaling = TRUE, boot = FALSE, stratified_boot = FALSE, R = NULL, parallel = "no", ncpus = 1, conf = 0.95, type = "norm" )
ot_indices( x, y, M, cost = "L2", discrete_out = FALSE, solver = "sinkhorn", solver_optns = NULL, scaling = TRUE, boot = FALSE, stratified_boot = FALSE, R = NULL, parallel = "no", ncpus = 1, conf = 0.95, type = "norm" )
x |
A matrix or data.frame containing the input(s) values. The values
can be numeric, factors, or strings. The type of data changes the
partitioning. If the values are continuous (double), the function
partitions the data into |
y |
A matrix containing the output values. Each column represents a different output variable, and each row represents a different observation. Only numeric values are allowed. |
M |
A scalar representing the number of partitions for continuous inputs. |
cost |
(default |
discrete_out |
(default |
solver |
Solver for the Optimal Transport problem. Currently supported options are:
|
solver_optns |
(optional) A list containing the options for the Optimal Transport solver. See details for allowed options and default ones. |
scaling |
(default |
boot |
(default |
stratified_boot |
(default |
R |
(default |
parallel |
(default |
ncpus |
(default |
conf |
(default |
type |
(default |
OT is a widely studied topic in Operational Research and Calculus. The
reference for the OT solvers in this package is
Peyré et al. (2019). The default solver is
"sinkhorn"
, the Sinkhorn's solver introduced in
Cuturi (2013). It solves the
entropic-regularized version of the OT problem. The "sinkhorn_log"
solves
the same OT problem but in log scale. It is more stable for low values of
the regularization parameter but slower to converge. The option
"transport"
is used to choose a solver for the non-regularized OT
problem. Under the hood, the function calls transport::transport()
from
package transport
. This option does not define the solver per se, but the
solver should be defined with the argument solver_optns
. See the next
section for more information.
The argument solver_optns
should be empty (for default options) or a list
with all or some of the required solver parameters. All the parameters not
included in the list will be set to default values. The solvers
"sinkhorn"
and "sinkhorn_log"
have the same options:
numIterations
(default 1e3
): a positive integer defining the maximum number
of Sinkhorn's iterations allowed. If the solver does not converge in the
number of iterations set, the solver will throw an error.
epsilon
(default 0.01
): a positive real number defining the regularization
coefficient. If the value is too low, the solver may return NA
.
maxErr
(default 1e-9
): a positive real number defining the
approximation error threshold between the marginal histogram of the
partition and the one computed by the solver. The solver may fail to
converge in numIterations
if this value is too low.
The solver "transport"
has the parameters:
method
(default "networkflow
): string defining the solver of the OT
problem.
control
: a named list of parameters for the chosen method or the result
of a call to transport::trcontrol()
.
threads
(default 1
): an Integer specifying the number of threads used
in parallel computing.
For details regarding this solver, check the transport::transport()
help
page.
A gsaot_indices
object containing:
method
: a string that identifies the type of indices computed.
indices
: a names array containing the sensitivity indices between 0 and 1
for each column in x, indicating the influence of each input variable on
the output variables.
bound
: a double representing the upper bound of the separation measure or
an array representing the mean of the separation for each input according
to the bootstrap replicas.
x
, y
: input and output data provided as arguments of the function.
inner_statistic
: a list of matrices containing the values of the inner
statistics for the partitions defined by partitions
. If method = wasserstein-bures
, each matrix has three rows containing the
Wasserstein-Bures indices, the Advective, and the Diffusive components.
partitions
: a matrix containing the partitions built to calculate the
sensitivity indices. Each column contains the partition associated to the
same column in x
. If boot = TRUE
, the object contains also:
indices_ci
: a data.frame
with first column the input, second and third
columns the lower and upper bound of the confidence interval.
inner_statistic_ci
: a list of matrices. Each element of the list contains
the lower and upper confidence bounds for the partition defined by the row.
bound_ci
: a list containing the lower and upper bounds of the confidence
intervals of the separation measure bound.
type
, conf
: type of confidence interval and confidence level, provided
as arguments.
Cuturi M (2013).
“Sinkhorn distances: Lightspeed computation of optimal transport.”
Advances in neural information processing systems, 26.
Peyré G, Cuturi M, others (2019).
“Computational optimal transport: With applications to data science.”
Foundations and Trends® in Machine Learning, 11(5-6), 355–607.
ot_indices_1d()
, ot_indices_wb()
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) M <- 25 # Calculate sensitivity indices sensitivity_indices <- ot_indices(x, y, M) sensitivity_indices
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) M <- 25 # Calculate sensitivity indices sensitivity_indices <- ot_indices(x, y, M) sensitivity_indices
Evaluate Optimal Transport indices on one dimensional outputs
ot_indices_1d( x, y, M, boot = FALSE, R = NULL, parallel = "no", ncpus = 1, conf = 0.95, type = "norm" )
ot_indices_1d( x, y, M, boot = FALSE, R = NULL, parallel = "no", ncpus = 1, conf = 0.95, type = "norm" )
x |
A matrix or data.frame containing the input(s) values. The values
can be numeric, factors, or strings. The type of data changes the
partitioning. If the values are continuous (double), the function
partitions the data into |
y |
An array containing the output values. |
M |
A scalar representing the number of partitions for continuous inputs. |
boot |
(default |
R |
(default |
parallel |
(default |
ncpus |
(default |
conf |
(default |
type |
(default |
A gsaot_indices
object containing:
method
: a string that identifies the type of indices computed.
indices
: a names array containing the sensitivity indices between 0 and 1
for each column in x, indicating the influence of each input variable on
the output variables.
bound
: a double representing the upper bound of the separation measure or
an array representing the mean of the separation for each input according
to the bootstrap replicas.
x
, y
: input and output data provided as arguments of the function.
inner_statistic
: a list of matrices containing the values of the inner
statistics for the partitions defined by partitions
. If method = wasserstein-bures
, each matrix has three rows containing the
Wasserstein-Bures indices, the Advective, and the Diffusive components.
partitions
: a matrix containing the partitions built to calculate the
sensitivity indices. Each column contains the partition associated to the
same column in x
. If boot = TRUE
, the object contains also:
indices_ci
: a data.frame
with first column the input, second and third
columns the lower and upper bound of the confidence interval.
inner_statistic_ci
: a list of matrices. Each element of the list contains
the lower and upper confidence bounds for the partition defined by the row.
bound_ci
: a list containing the lower and upper bounds of the confidence
intervals of the separation measure bound.
type
, conf
: type of confidence interval and confidence level, provided
as arguments.
x <- rnorm(1000) y <- 10 * x ot_indices_1d(data.frame(x), y, 30)
x <- rnorm(1000) y <- 10 * x ot_indices_1d(data.frame(x), y, 30)
Evaluate sensitivity maps using Optimal Transport indices
ot_indices_smap(x, y, M)
ot_indices_smap(x, y, M)
x |
A matrix or data.frame containing the input(s) values. The values
can be numeric, factors, or strings. The type of data changes the
partitioning. If the values are continuous (double), the function
partitions the data into |
y |
A matrix containing the output values. Each column is interpreted as a different output. |
M |
A scalar representing the number of partitions for continuous inputs. |
A matrix where each column represents an input and each row
represents an output. The values are indices between 0 and 1 computed using
ot_indices_1d()
.
N <- 1000 x1 <- rnorm(N) x2 <- rnorm(N) x <- cbind(x1, x2) y1 <- 10 * x1 y2 <- x1 + x2 y <- cbind(y1, y2) ot_indices_smap(data.frame(x), y, 30)
N <- 1000 x1 <- rnorm(N) x2 <- rnorm(N) x <- cbind(x1, x2) y1 <- 10 * x1 y2 <- x1 + x2 y <- cbind(y1, y2) ot_indices_smap(data.frame(x), y, 30)
Evaluate Wasserstein-Bures approximation of the Optimal Transport solution
ot_indices_wb( x, y, M, boot = FALSE, R = NULL, parallel = "no", ncpus = 1, conf = 0.95, type = "norm" )
ot_indices_wb( x, y, M, boot = FALSE, R = NULL, parallel = "no", ncpus = 1, conf = 0.95, type = "norm" )
x |
A matrix or data.frame containing the input(s) values. The values
can be numeric, factors, or strings. The type of data changes the
partitioning. If the values are continuous (double), the function
partitions the data into |
y |
A matrix containing the output values. Each column represents a different output variable, and each row represents a different observation. Only numeric values are allowed. |
M |
A scalar representing the number of partitions for continuous inputs. |
boot |
(default |
R |
(default |
parallel |
(default |
ncpus |
(default |
conf |
(default |
type |
(default |
A gsaot_indices
object containing:
method
: a string that identifies the type of indices computed.
indices
: a names array containing the sensitivity indices between 0 and 1
for each column in x, indicating the influence of each input variable on
the output variables.
bound
: a double representing the upper bound of the separation measure or
an array representing the mean of the separation for each input according
to the bootstrap replicas.
x
, y
: input and output data provided as arguments of the function.
inner_statistic
: a list of matrices containing the values of the inner
statistics for the partitions defined by partitions
. If method = wasserstein-bures
, each matrix has three rows containing the
Wasserstein-Bures indices, the Advective, and the Diffusive components.
partitions
: a matrix containing the partitions built to calculate the
sensitivity indices. Each column contains the partition associated to the
same column in x
. If boot = TRUE
, the object contains also:
indices_ci
: a data.frame
with first column the input, second and third
columns the lower and upper bound of the confidence interval.
inner_statistic_ci
: a list of matrices. Each element of the list contains
the lower and upper confidence bounds for the partition defined by the row.
bound_ci
: a list containing the lower and upper bounds of the confidence
intervals of the separation measure bound.
type
, conf
: type of confidence interval and confidence level, provided
as arguments.
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) y <- y ot_indices_wb(x, y, 100)
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) y <- y ot_indices_wb(x, y, 100)
Plot Optimal Transport based inner statistics for each partition using
ggplot2
package. If provided, it plots also the uncertainty estimates.
plot_inner_stats(x, ranking = NULL, wb_all = FALSE, ...)
plot_inner_stats(x, ranking = NULL, wb_all = FALSE, ...)
x |
An object generated by |
ranking |
An integer with absolute value less or equal than the number
of inputs. If positive, select the first |
wb_all |
(default |
... |
Further arguments passed to or from other methods. |
A patchwork
object that, if called, will print.
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) M <- 25 # Get sensitivity indices sensitivity_indices <- ot_indices(x, y, M) plot_inner_stats(sensitivity_indices)
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) M <- 25 # Get sensitivity indices sensitivity_indices <- ot_indices(x, y, M) plot_inner_stats(sensitivity_indices)
Plot Optimal Transport based sensitivity indices using ggplot2
package.
## S3 method for class 'gsaot_indices' plot(x, ranking = NULL, wb_all = FALSE, dummy = NULL, ...)
## S3 method for class 'gsaot_indices' plot(x, ranking = NULL, wb_all = FALSE, dummy = NULL, ...)
x |
An object generated by |
ranking |
An integer with absolute value less or equal than the number
of inputs. If positive, select the first |
wb_all |
(default |
dummy |
(default |
... |
Further arguments passed to or from other methods. |
A ggplot
object that, if called, will print
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) M <- 25 # Calculate sensitivity indices sensitivity_indices <- ot_indices_wb(x, y, M) sensitivity_indices plot(sensitivity_indices)
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) M <- 25 # Calculate sensitivity indices sensitivity_indices <- ot_indices_wb(x, y, M) sensitivity_indices plot(sensitivity_indices)
Print Optimal Transport Sensitivity indices information
## S3 method for class 'gsaot_indices' print(x, data = FALSE, ...)
## S3 method for class 'gsaot_indices' print(x, data = FALSE, ...)
x |
An object generated by |
data |
Logical, indicating whether or not the input and output data should be printed. |
... |
Further arguments passed to or from other methods. |
The information contained in argument x
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) M <- 25 # Calculate sensitivity indices sensitivity_indices <- ot_indices(x, y, M) print(sensitivity_indices)
N <- 1000 mx <- c(1, 1, 1) Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3) x1 <- rnorm(N) x2 <- rnorm(N) x3 <- rnorm(N) x <- cbind(x1, x2, x3) x <- mx + x %*% chol(Sigmax) A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE) y <- t(A %*% t(x)) x <- data.frame(x) M <- 25 # Calculate sensitivity indices sensitivity_indices <- ot_indices(x, y, M) print(sensitivity_indices)