Constrained Minimization Problems
We consider constrained minimization problems (CMPs) of the form:
See here for a brief introduction to constrained optimization. In this section, we will discuss how to represent CMPs using Cooper. To do this, consider the following objects:
Constraint: represents a group of either equality or inequality constraints.ConstrainedMinimizationProblem(hereafterCMPin short): represents the constrained minimization problem itself. It must include a methodCMP.compute_cmp_statethat computes the loss and constraint violations at a given point.
Moreover, in order to package the values of the loss and constraints, we will define the following objects:
ConstraintState: contains the value of the constraint violation at the given iterate.CMPState: contains the values of the loss andConstraintStateobjects for (some or all of) its associated constraints.
Example
The example below illustrates the required steps for defining a CMP class for your problem. For simplicity, we illustrate the case of a single (possibly multi-dimensional) inequality constraint.
[Line 4] Define a custom class which inherits from
CMP.[Line 7] Instantiate a multiplier object for the constraint.
[Lines 9-11] Define the constraint object, specifying the constraint type and (optionally) the formulation type. Note that
Constraints must be “registered” as attributes of theCMP.[Line 13] Implement the
CMP.compute_cmp_statemethod that evaluates the loss and constraints.[Line 18] Return the information about the loss and constraints packaged into a
CMPState.[Line 20] (Optional) Modularize the code to allow for evaluating the constraints only. This is useful for optimization algorithms that sometimes need to evaluate the constraints without computing the loss.
1import torch
2import cooper
3
4class MyCMP(cooper.ConstrainedMinimizationProblem):
5 def __init__(self):
6 super().__init__()
7 multiplier = cooper.multipliers.DenseMultiplier(num_constraints=..., device=...)
8 # By default, constraints are built using `formulation_type=cooper.formulations.Lagrangian`
9 self.my_constraint = cooper.Constraint(
10 multiplier=multiplier, constraint_type=cooper.ConstraintType.INEQUALITY
11 )
12
13 def compute_cmp_state(self, model, inputs, targets):
14 loss = ...
15 cmp_state = self.compute_violations(model, inputs, targets)
16 cmp_state.loss = loss
17
18 return cmp_state
19
20 def compute_violations(self, model, inputs, targets):
21 # This method is optional. It allows for evaluating the constraints without
22 # computing the loss.
23 violation = ... # ensure that the constraint follows the convention "g <= 0"
24 constraint_state = cooper.ConstraintState(violation=...)
25 observed_constraints = {self.my_constraint: constraint_state}
26
27 return cooper.CMPState(loss=None, observed_constraints=observed_constraints)
Loss
The CMPState dataclass includes the loss, which must be a scalar torch.Tensor representing \(f(\vx)\), and is used to update the primal variables in the optimization. For feasibility problems, where only constraint satisfaction matters, the loss should be set to None.
Constraints
Constraint objects allows grouping similar constraints together. Constraints can be classified as either equality or inequality.
It is possible to have multiple constraints represented by the same Constraint object. However, all constraints under a Constraint must share the same constraint_type (all equality or all inequality constraints) and must be handled using the same formulation_type (a subclass of a Formulation). For problems with different types of constraints or formulations, you should instantiate separate Constraint objects.
- enum cooper.ConstraintType(value)[source]
A constraint type is one of:
EQUALITYorINEQUALITY.Valid values are as follows:
- EQUALITY = <ConstraintType.EQUALITY: 1>
- INEQUALITY = <ConstraintType.INEQUALITY: 2>
Grouping constraints
Cooper allows arbitrary groupings of constraints into Constraint objects.
However, for computational or logging purposes it is sometimes desirable to group constraints according to problem-dependent structures.
For example, consider a problem with \(m + n\) constraints.
One could create a single Constraint object along with a single Multiplier.
Alternatively, one could create two (Constraint, Multiplier) pairs for handling the groups of \(m\) and \(n\) constraints separately.
Linking constraints and multipliers
Constraint objects must have an associated Multiplier if the problem formulation requires it. see the expects_multiplier attribute of a Formulation. To achieve this, a Multiplier object should be provided to the Constraint constructor.
constraint = cooper.Constraint(
multiplier=multiplier,
constraint_type=cooper.ConstraintType.INEQUALITY,
formulation_type=cooper.formulations.Lagrangian,
)
Registering constraints in a CMP
When initializing the CMP, Constraints should be defined as attributes.
We refer to this process as “registering” the constraints with the CMP. Utilities such as CMP.constraints and CMP.named_constraints enable iteration over the registered constraints.
Moreover, the utility method CMP.dual_parameters returns the torch.nn.parameter.Parameters of the multipliers for the constraints registered with the CMP. This facilitates the instantiation of the dual_optimizer required by the ConstrainedOptimizer. See the Optim module for further details.
class MyCMP(cooper.ConstrainedMinimizationProblem):
def __init__(self):
super().__init__()
self.constraint_1 = cooper.Constraint(...)
self.constraint_2 = cooper.Constraint(...)
cmp = MyCMP()
# Constraints can also be registered "on the fly"
cmp.constraint_3 = cooper.Constraint(...)
# Inspect constraints known to the CMP
print([constraint_name for constraint_name, constraint in cmp.named_constraints()])
# Output:
# ["constraint_1", "constraint_2", "constraint_3"]
- class cooper.constraints.Constraint(constraint_type, formulation_type=<class 'cooper.formulations.formulations.Lagrangian'>, multiplier=None, penalty_coefficient=None)[source]
A constraint in a constrained optimization problem.
- Parameters:
constraint_type (
ConstraintType) – One ofcooper.ConstraintType.EQUALITYorcooper.ConstraintType.INEQUALITY.formulation_type (
type[Formulation]) – The formulation type for computing the constraint’s contribution to the Lagrangian. Defaults toLagrangian.multiplier (
Optional[Multiplier]) – The Lagrange multiplier associated with the constraint. This is only used for formulations withFormulation.expects_multiplier=True, such as theLagrangian.penalty_coefficient (
Optional[PenaltyCoefficient]) – The penalty coefficient used to penalize the constraint violation. This is only used for formulations withFormulation.expects_penalty_coefficient=True, such as theAugmentedLagrangian.
ConstraintStates
In their simplest form, ConstraintState objects contain the value of the constraint violation. However, they can be extended to enable extra functionality, such as constraint sampling, the use of implicit parameterizations for the Lagrange multipliers [NCZ+20], and proxy constraints [CJG+19].
Constraint Sampling
Cooper can handle cases where only a subset of Constraint violations are observed at each step.
For instance, in problems with a large number of constraints, where assessing feasibility is costly (e.g., requiring evaluation across an entire dataset), it can be more efficient to evaluate only a subset of constraints per iteration.
To do this, specify the subset of observed constraint violations and ensure that an IndexedMultiplier is used as the multiplier associated with the constraint.
Consider the case of an \(n\)-dimensional constraint, where only \(m \leq n\) entries are observed. Cooper expects the following arguments when instantiating a ConstraintState:
violationshould be an \(m\)-dimensional tensor containing only the observed violations.constraint_featuresshould contain the indices corresponding to the observed entries inviolation. Note that permutations are allowed, as long as the ordering ofviolationandconstraint_featuresis consistent.If provided,
strict_violationandstrict_constraint_featuresshould follow the same pattern.
This setup allows Cooper to selectively account for the specified constraint entries when calculating contributions to the Lagrangian, ignoring those that are not observed. More details on indexed multipliers can be found in the section on Multipliers.
Implicit Parameterization of Lagrange Multipliers [NCZ+20]
Cooper enables implicit parameterization of the Lagrange multipliers, for example, by using a neural network. This allows Cooper to compute multipliers based on specified input features and to learn a parametric model for the multipliers, rather than defining them individually for each constraint. This approach helps scale the Lagrangian approach to problems with numerous (or even infinite) constraints.
To achieve this, follow these steps:
Implement a Lagrange multiplier model by inheriting from
ImplicitMultiplier. Given input features for a constraint, the model will map the input features to an estimate of the Lagrange multiplier for that constraint.Pass these input features as the
constraint_featuresin theConstraintStateobject.
For additional details, refer to Multipliers.
Proxy Constraints [CJG+19]
It is often the case that constrained optimization problems involve non-differentiable constraints. This non-differentiability precludes the use of gradient-based primal-dual optimization methods. The proxy constraints technique introduced by Cotter et al. [CJG+19] leverages a differentiable surrogate of the constraint when performing the primal updates, while preserving the original (non-differentiable) constraint for updating the dual variables (see here for details).
To use proxy constraints in Cooper:
Specify
violationfor updating the primal variables. Ensure that thisviolationis differentiable with respect to the primal variables.Provide (the potentially non-differentiable)
strict_violationfor updating the dual variables.
Proxy constraints can be combined with sampled constraints and implicit multipliers. To do so, include both constraint_features and strict_constraint_features as needed.
- class cooper.constraints.ConstraintState(violation, constraint_features=None, strict_violation=None, strict_constraint_features=None, contributes_to_primal_update=True, contributes_to_dual_update=True)[source]
State of a constraint, including the current constraint violation.
- Parameters:
violation (
Tensor) – The measurement of the constraint violation at some value of the primal parameters. This is expected to be differentiable with respect to the primal parameters.constraint_features (
Optional[Tensor]) –The features of the observed (differentiable) constraint violations, used to evaluate the associated Lagrange multiplier. For example:
An
IndexedMultiplierexpects the indices of the constraints whose Lagrange multipliers are to be retrieved.An
ImplicitMultiplierexpects tensor-valued features for the constraints, and can be used to measure the violation of a subset of the constraints within aConstraint.
This field can also be used with an
IndexedMultiplierto measure the violation of only a subset of the constraints within aConstraint.strict_violation (
Optional[Tensor]) – The measurement of the constraint violation used to update the dual variables. If not provided, theviolationis used to update the dual variables instead.strict_constraint_features (
Optional[Tensor]) – The features of the (possibly non-differentiable) constraint. For more details, seeconstraint_features. If not provided, theconstraint_featuresare used instead.strict_violationis expected whenstrict_constraint_featuresare provided.contributes_to_primal_update (
bool) – IfFalse, the current observed constraint violation does not contribute to the primal Lagrangian but still contributes to the dual Lagrangian. This means the violations affect the update for the dual variables but not the primal variables.contributes_to_dual_update (
bool) – IfFalse, the current observed constraint violation does not contribute to the dual Lagrangian but still contributes to the primal Lagrangian. This allows for less frequent updates to the dual variables (e.g., after several primal steps), affecting the update for the primal variables but not the dual variables. WhenTrue, Cooper will update the dual variables usingstrict_violationif provided, orviolationotherwise.
Constrained Minimization Problems (CMPs)
ConstrainedMinimizationProblem sub-classes must be implemented and instantiated by the user, as shown in the example above.
CMPs represent constrained optimization problems and provide methods to compute the problem’s state, CMPState, which includes the loss and constraints at a given point. The methods to be implemented are CMP.compute_cmp_state and, optionally,
CMP.compute_violations.
Additionally, CMPs serve as an interface between the user and Cooper, enabling access to CMP.constraints, CMP.multipliers, and CMP.penalty_coefficients.
- class cooper.ConstrainedMinimizationProblem[source]
Template for constrained minimization problems, where subclasses represent specific constrained optimization problems.
Subclasses must override the
CMP.compute_cmp_statemethod. This method should return aCMPStateinstance that encapsulates the current state of the optimization problem, including the evaluated loss and the values of the constraint violations.- named_constraints()[source]
Return an iterator over the registered constraints of the CMP, yielding tuples of the form
(constraint_name, constraint).- Return type:
- multipliers()[source]
Returns an iterator over the multipliers associated with the registered constraints of the CMP.
- Return type:
- named_multipliers()[source]
Returns an iterator over the multipliers associated with the registered constraints of the CMP, yielding tuples of the form
(constraint_name, multiplier).- Return type:
- penalty_coefficients()[source]
Returns an iterator over the penalty coefficients associated with the registered constraints of the CMP. Constraints without penalty coefficients are skipped.
- Return type:
- named_penalty_coefficients()[source]
Returns an iterator over the penalty coefficients associated with the registered constraints of the CMP, yielding tuples of the form
(constraint_name, penalty_coefficient). Constraints without penalty coefficients are skipped.- Return type:
- dual_parameters()[source]
Return an iterator over the parameters of the multipliers associated with the registered constraints of the CMP. This method is useful for instantiating the dual optimizers. If a multiplier is shared by several constraints, we only return its parameters once.
- to(*args, **kwargs)[source]
Move the CMP to a new device and/or change the dtype of the multipliers and penalty coefficients.
- Return type:
Self
- state_dict()[source]
Returns the state of the CMP. This includes the state of the multipliers and penalty coefficients.
- Return type:
- load_state_dict(state_dict)[source]
Loads the state of the CMP. This includes the state of the multipliers and penalty coefficients.
- abstract compute_cmp_state(*args, **kwargs)[source]
Computes the state of the CMP based on the current value of the primal parameters.
The signature of this function may be adjusted to accommodate situations that require a model, (mini-batched) inputs/targets, or other arguments to be passed. :rtype:
CMPStateNote
When it is prohibitively expensive to compute the loss or constraints exactly, the
CMPStatemay contain stochastic estimates. This is often the case when mini-batches are used to approximate the loss and constraints.Just as in the unconstrained case, these approximations can lead to a compromise in the stability of the optimization process.
- static sanity_check_cmp_state(cmp_state)[source]
Performs sanity checks on the CMP state. This helper method is useful for ensuring that the CMP state is well-formed.
- Raises:
ValueError – If the loss tensor does not have a valid gradient.
ValueError – If the violation tensor of any constraint does not have a valid gradient.
ValueError – If the strict violation tensor of any constraint has a gradient.
ValueError – If a constraint contributes to the dual update but the associated formulation does not expect a multiplier.
- Return type:
- compute_violations(*args, **kwargs)[source]
Computes the violation of the CMP constraints based on the current value of the primal parameters. This function returns a
CMPStateinstance containing the observed constraint values. Note that the returnedCMPStatemay haveloss=None, as the loss value is not necessarily computed when only evaluating the constraints.The function signature may be adjusted to accommodate situations that require a model, (mini-batched) inputs/targets, or other arguments.
In some cases, the computation of constraints may be independent of loss evaluation. In such situations,
CMP.compute_violationscan be called as part of the execution ofCMP.compute_cmp_state.- Return type:
CMPStates
A CMPState is a dataclass containing the information about the loss and constraint violations measured at a specific point. The constraints included in the CMPState must be passed as a dictionary, where the keys are the Constraint objects and the values are the associated ConstraintState objects.
Note
To ensure that your CMPState is correctly constructed—with loss and constraint tensors that have gradients—you can use the CMP.sanity_check_cmp_state method.
However, avoid calling this method in every iteration of your optimization loop, as it can introduce unnecessary overhead.
- class cooper.CMPState(loss=None, observed_constraints=<factory>, misc=None)[source]
Represents the state of a
ConstrainedMinimizationProblemin terms of the value of its loss and constraint violations at point \(\vx_t\).- Parameters:
loss (
Optional[Tensor]) – Value of the loss or main objective to be minimized \(f(\vx_t)\).observed_constraints (
dict[Constraint,ConstraintState]) – Dictionary withConstraintinstances as keys andConstraintStateinstances as values (containing tensors \(\vg(\vx_t)\) and \(\vh(\vx_t)\)).misc (
Optional[dict]) – Optional storage space for additional information relevant to the state of the CMP. This dictionary enables persisting the results of certain computations for post-processing. For example, one may want to retain the value of the predictions/logits computed over a given minibatch during the call tocompute_cmp_state()to measure or log training statistics.
- compute_primal_lagrangian()[source]
Computes and accumulates the primal-differentiable Lagrangian based on the loss and the contribution of the observed constraints.
- Return type:
- compute_dual_lagrangian()[source]
Computes and accumulates the dual-differentiable Lagrangian based on the contribution of the observed constraints.
The dual Lagrangian contained in
LagrangianStore.lagrangianignores the contribution of the loss, since the objective function does not depend on the dual variables. Therefore,LagrangianStore.lagrangian = 0regardless of the value ofself.loss.- Return type:
- named_observed_strict_violations()[source]
Returns an iterator over the observed strict constraint violations.
Checkpointing
You can checkpoint a CMP by saving and loading its state with the methods CMP.state_dict and CMP.load_state_dict. These methods capture the current values of the multipliers and penalty coefficients for all problem constraints, allowing you to resume the optimization process from that exact state.
# Save the state of the CMP
cmp_state = cmp.state_dict()
torch.save(cmp_state, "cmp_state.pth")
# Later, restore the state of the CMP
cmp = MyCMP(...) # A new CMP with default multiplier and coefficient values
loaded_state = torch.load("cmp_state.pth")
cmp.load_state_dict(loaded_state) # Load checkpointed multipliers and coefficients
For a full working example, see this tutorial.