DataDrivenEnzymeRateEqs
API for DataDrivenEnzymeRateEqs.
DataDrivenEnzymeRateEqs.data_driven_rate_equation_selection
DataDrivenEnzymeRateEqs.display_rate_equation
DataDrivenEnzymeRateEqs.fit_rate_equation
DataDrivenEnzymeRateEqs.@derive_general_mwc_rate_eq
DataDrivenEnzymeRateEqs.@derive_general_qssa_rate_eq
DataDrivenEnzymeRateEqs.data_driven_rate_equation_selection
— Methoddata_driven_rate_equation_selection(
general_rate_equation::Function,
data::DataFrame,
metab_names::Tuple{Symbol,Vararg{Symbol}},
param_names::Tuple{Symbol,Vararg{Symbol}};
range_number_params::Union{Nothing, Tuple{Int,Int}} = nothing,
forward_model_selection::Bool = true,
max_zero_alpha::Int = 1 + ceil(Int, length(metab_names) / 2),
n_reps_opt::Int = 20,
maxiter_opt::Int = 50_000,
model_selection_method::String = "current_subsets_filtering",
p_val_threshold::Float64 = 0.4,
save_train_results::Bool = false,
enzyme_name::String = "Enzyme",
subsets_min_limit::Int = 1,
subsets_max_limit::Union{Int, Nothing}=nothing,
subsets_filter_threshold::Float64=0.1,
)
This function is used to perform data-driven rate equation selection using a general rate equation and data.
There are three model_selection methods:
- currentsubsetsfiltering:
This method iteratively fits models that are subsets of the top 10% from the previous iteration, saving the best model for each n params based on training loss. Optimal number of parameters are selected using the Wilcoxon test on test scores from LOOCV, and the best equation is the best model with this optimal number.
- cvsubsetsfiltering:
This method implements currentsubsetsfiltering separately for each figure, leaving one figure out as a test set while training on the remaining data. For each number of parameters, it saves the test loss of the best subset for that figure. It uses the Wilcoxon test across all figures' results to select the optimal number of parameters. Then, for the chosen number, it trains all subset with this n params on the entire dataset and selects the best rate equation based on minimal training loss.
- cvallsubsets:
This method fits all subsets for each figure, using the others as training data and the left-out figure as the test set. It selects the best model for each number of parameters and figure based on training error and computes LOOCV test scores. The optimal n params is determined by the Wilcoxon test across all figures' test scores. The best equation is the subset with minimal training loss for this optimal n params when trained on the entire dataset.
Arguments
general_rate_equation::Function
: Function that takes a NamedTuple of metabolite concentrations (withmetab_names
keys) and parameters (withparam_names
keys) and returns an enzyme rate.data::DataFrame
: DataFrame containing the data with columnRate
and columns for eachmetab_names
where each row is one measurement. It also needs to have a columnsource
that contains a string that identifies the source of the data. This is used to calculate the weights for each figure in the publication.metab_names::Tuple
: Tuple of metabolite names that correspond to the metabolites ofrate_equation
and column names indata
.param_names::Tuple
: Tuple of parameter names that correspond to the parameters ofrate_equation
.
Keyword Arguments
save_train_results::Bool
: A boolean indicating whether to save the results of the training for each number of parameters as a csv file.enzyme_name::String
: A string for enzyme name that is used to name the csv files that are saved.range_number_params::Tuple{Int,Int}
: A tuple of integers representing the range of the number of parameters of generalrateequation to search over.forward_model_selection::Bool
: A boolean indicating whether to use forward model selection (true) or reverse model selection (false).max_zero_alpha::Int
: An integer representing the maximum number of alpha parameters that can be set to 0.n_reps_opt::Int
n repetitions of optimizationmaxiter_opt::Int
max iterations of optimization algorithmmodel_selection_method::String
- which model selection to find best rate equation (default is currentsubsetsfiltering)p_val_threshold::Float64
- pval threshold for Wilcoxon testsave_train_results::Bool
: A boolean indicating whether to save the results of the training for each number of parameters as a csv file.enzyme_name::String
: A string for enzyme name that is used to name the csv files that are saved.subsets_min_limit::Int
- The minimum number of filtered subsets (those with training loss within 10% of the minimum)
that must be kept for each number of parameters. These subsets are used to generate the subsets for the next iteration (only subsets of these are considered). Relevant to model selection methods currentsubsetsfiltering or cvsubsetsfiltering.
subsets_max_limit::Union{Int, Nothing}
- The maximum number of filtered subsets (those with training loss within 10% of the minimum)
that must be kept for each number of parameters. These subsets are used to generate the subsets for the next iteration (only subsets of these are considered). Relevant to model selection methods currentsubsetsfiltering or cvsubsetsfiltering.
subsets_filter_threshold::Float64
- This sets the percentage limit for filtering subsets in each iteration.
Only the subsets with a training loss close to the best (within this percentage) are kept. Relevant to model selection methods currentsubsetsfiltering or cvsubsetsfiltering.
Returns
NamedTuple
: A named tuple with the following fields:results
: df with train and test resultsbest_n_params
: optimal number of parametersbest_subset_row
: row of the best rate equation selected - includes fitted params
DataDrivenEnzymeRateEqs.display_rate_equation
— Methoddisplay_rate_equation(
rate_equation::Function,
metab_names::Tuple{Symbol,Vararg{Symbol}},
param_names::Tuple{Symbol,Vararg{Symbol}};
nt_param_removal_code = nothing
)
Return the symbolic rate equation for the given rate_equation
function.
Arguments
rate_equation::Function
: The rate equation function.metab_names::Tuple{Symbol,Vararg{Symbol}}
: The names of the metabolites.param_names::Tuple{Symbol,Vararg{Symbol}}
: The names of the parameters.nt_param_removal_code::NamedTuple
: The named tuple of the parameters to remove from the rate equation.
DataDrivenEnzymeRateEqs.fit_rate_equation
— Methodfit_rate_equation(
rate_equation::Function,
data::DataFrame,
metab_names::Tuple{Symbol, Vararg{Symbol}},
param_names::Tuple{Symbol, Vararg{Symbol}};
n_iter = 20
)
Fit rate_equation
to data
and return loss and best fit parameters.
Arguments
rate_equation::Function
: Function that takes a NamedTuple of metabolite concentrations (withmetab_names
keys) and parameters (withparam_names
keys) and returns an enzyme rate.data::DataFrame
: DataFrame containing the data with columnRate
and columns for eachmetab_names
where each row is one measurement. It also needs to have a columnsource
that contains a string that identifies the source of the data. This is used to calculate the weights for each figure in the publication.metab_names::Tuple{Symbol, Vararg{Symbol}}
: Tuple of metabolite names that correspond to the metabolites ofrate_equation
and column names indata
.param_names::Tuple{Symbol, Vararg{Symbol}}
: Tuple of parameter names that correspond to the parameters ofrate_equation
.n_iter::Int
: Number of iterations to run the fitting process.
Returns
loss::Float64
: Loss of the best fit.params::NamedTuple
: Best fit parameters withparam_names
keys
Example
using DataFrames
data = DataFrame(
Rate = [1.0, 2.0, 3.0],
A = [1.0, 2.0, 3.0],
source = ["Figure 1", "Figure 1", "Figure 2"]
)
rate_equation(metabs, params) = params.Vmax * metabs.S / (1 + metabs.S / params.K_S)
fit_rate_equation(rate_equation, data, (:A,), (:Vmax, :K_S))
DataDrivenEnzymeRateEqs.@derive_general_mwc_rate_eq
— Macroderive_general_mwc_rate_eq(metabs_and_regulators_kwargs...)
Derive a function that calculates the rate of a reaction using the general MWC rate equation given the list of substrates, products, and regulators that bind to specific cat or reg sites.
The general MWC rate equation is given by:
\[Rate = \frac{{V_{max}^a \prod_{i=1}^{n} \left(\frac{S_i}{K_{a, i}}\right) - V_{max, rev}^a \prod_{i=1}^{n} \left(\frac{P_i}{K_{a, i}}\right) \cdot Z_{a, cat}^{n-1} \cdot Z_{a, reg}^n + L \left(V_{max}^i \prod_{i=1}^{n} \left(\frac{S_i}{K_{i, i}}\right) - V_{max, rev}^i \prod_{i=1}^{n} \left(\frac{P_i}{K_{i, i}}\right)\right) \cdot Z_{i, cat}^{n-1} \cdot Z_{i, reg}^n}}{Z_{a, cat}^n \cdot Z_{a, reg}^n + L \cdot Z_{i, cat}^n \cdot Z_{i, reg}^n}\]
where:
- $V_{max}^a$ is the maximum rate of the forward reaction
- $V_{max, rev}^a$ is the maximum rate of the reverse reaction
- $V_{max}^i$ is the maximum rate of the forward reaction
- $V_{max, rev}^i$ is the maximum rate of the reverse reaction
- $S_i$ is the concentration of the $i^{th}$ substrate
- $P_i$ is the concentration of the $i^{th}$ product
- $I_i$ is the concentration of the $i^{th}$ catalytic site inhibitor
- $R_i$ is the concentration of the $i^{th}$ allosteric regulator
- $K_{a, X}$ is the binding constant of the $X$ metabolite for active MWC state
- $K_{i, X}$ is the binding constant of the $X$ metabolite for inactive MWC state
- $Z_{a, cat}$ is the allosteric factor for the catalytic site in the active MWC state
- $Z_{i, cat}$ is the allosteric factor for the catalytic site in the inactive MWC state
- $Z_{a, reg}$ is the allosteric factor for the regulatory site in the active MWC state
- $Z_{i, reg}$ is the allosteric factor for the regulatory site in the inactive MWC state
- $L$ is the ratio of inactive to active enzyme conformations in the absence of ligands
- $n$ is the oligomeric state of the enzyme
Arguments
metabs_and_regulators_kwargs...
: keyword arguments that specify the substrates, products, catalytic sites, regulatory sites, and other parameters of the reaction.
Returns
- A function that calculates the rate of the reaction using the general MWC rate equation
- A tuple of the names of the metabolites and parameters used in the rate equation
DataDrivenEnzymeRateEqs.@derive_general_qssa_rate_eq
— Macroderive_general_qssa_rate_eq(metabs_and_regulators_kwargs...)
Derive a function that calculates the rate of a reaction using the Quasi Steady State Approximation (QSSA) given the list of substrates, products, and regulators.
The general QSSA rate equation is given by:
\[Rate = \frac{V_{max} \left(\frac{\prod_{i=1}^{n}S_i}{(K_{S1...Sn})^n}\right) - V_{max, rev} \left(\frac{\prod_{i=1}^{n}P_i}{(K_{P1...Pn})^n}\right)}{Z}\]
where:
- $V_{max}$ is the maximum rate of the forward reaction
- $V_{max, rev}$ is the maximum rate of the reverse reaction
- $S_i$, $P_i$, $R_i$ is the concentration of the $i^{th}$ substrate (S), product (P), or regulator (R)
- $K_{X_1...X_n}$ is the kinetic constant
- $Z$ is a combination of all terms containing products of [S], [P], and [R] divided by KSP_R
Arguments
metabs_and_regulators_kwargs...
: keyword arguments that specify the substrates, products, catalytic sites, regulatory sites, and other parameters of the reaction.
Returns
- A function that calculates the rate of the reaction using the general qssa rate equation
- A tuple of the names of the metabolites and parameters used in the rate equation