Constrained Minimization Problems

We consider constrained minimization problems (CMPs) of the form:

\[\begin{split} \min_{\vx \in \reals^d} & \,\, f(\vx) \\ \text{s.t. } & \,\, \vg(\vx) \le \vzero \\ & \,\, \vh(\vx) = \vzero \end{split}\]

See here for a brief introduction to constrained optimization. In this section, we will discuss how to represent CMPs using Cooper. To do this, consider the following objects:

Constraint: represents a group of either equality or inequality constraints.
ConstrainedMinimizationProblem (hereafter CMP in short): represents the constrained minimization problem itself. It must include a method CMP.compute_cmp_state that computes the loss and constraint violations at a given point.

Moreover, in order to package the values of the loss and constraints, we will define the following objects:

ConstraintState: contains the value of the constraint violation at the given iterate.
CMPState: contains the values of the loss and ConstraintState objects for (some or all of) its associated constraints.

Example

The example below illustrates the required steps for defining a CMP class for your problem. For simplicity, we illustrate the case of a single (possibly multi-dimensional) inequality constraint.

[Line 4] Define a custom class which inherits from CMP.
[Line 7] Instantiate a multiplier object for the constraint.
[Lines 9-11] Define the constraint object, specifying the constraint type and (optionally) the formulation type. Note that Constraints must be “registered” as attributes of the CMP.
[Line 13] Implement the CMP.compute_cmp_state method that evaluates the loss and constraints.
[Line 18] Return the information about the loss and constraints packaged into a CMPState.
[Line 20] (Optional) Modularize the code to allow for evaluating the constraints only. This is useful for optimization algorithms that sometimes need to evaluate the constraints without computing the loss.

import torch
import cooper

class MyCMP(cooper.ConstrainedMinimizationProblem):
    def __init__(self):
        super().__init__()
        multiplier = cooper.multipliers.DenseMultiplier(num_constraints=..., device=...)
        # By default, constraints are built using `formulation_type=cooper.formulations.Lagrangian`
        self.my_constraint = cooper.Constraint(
            multiplier=multiplier, constraint_type=cooper.ConstraintType.INEQUALITY
        )

    def compute_cmp_state(self, model, inputs, targets):
        loss = ...
        cmp_state = self.compute_violations(model, inputs, targets)
        cmp_state.loss = loss

        return cmp_state

    def compute_violations(self, model, inputs, targets):
        # This method is optional. It allows for evaluating the constraints without
        # computing the loss.
        violation = ... # ensure that the constraint follows the convention "g <= 0"
        constraint_state = cooper.ConstraintState(violation=...)
        observed_constraints = {self.my_constraint: constraint_state}

        return cooper.CMPState(loss=None, observed_constraints=observed_constraints)

Loss

The CMPState dataclass includes the loss, which must be a scalar torch.Tensor representing \(f(\vx)\), and is used to update the primal variables in the optimization. For feasibility problems, where only constraint satisfaction matters, the loss should be set to None.

Constraints

Constraint objects allows grouping similar constraints together. Constraints can be classified as either equality or inequality.

It is possible to have multiple constraints represented by the same Constraint object. However, all constraints under a Constraint must share the same constraint_type (all equality or all inequality constraints) and must be handled using the same formulation_type (a subclass of a Formulation). For problems with different types of constraints or formulations, you should instantiate separate Constraint objects.

enum cooper.ConstraintType(value)[source]

A constraint type is one of: EQUALITY or INEQUALITY.

Valid values are as follows:

EQUALITY = <ConstraintType.EQUALITY: 1>

INEQUALITY = <ConstraintType.INEQUALITY: 2>

Grouping constraints

Cooper allows arbitrary groupings of constraints into Constraint objects. However, for computational or logging purposes it is sometimes desirable to group constraints according to problem-dependent structures.

For example, consider a problem with \(m + n\) constraints. One could create a single Constraint object along with a single Multiplier. Alternatively, one could create two (Constraint, Multiplier) pairs for handling the groups of \(m\) and \(n\) constraints separately.

Linking constraints and multipliers

Constraint objects must have an associated Multiplier if the problem formulation requires it. see the expects_multiplier attribute of a Formulation. To achieve this, a Multiplier object should be provided to the Constraint constructor.

constraint = cooper.Constraint(
    multiplier=multiplier,
    constraint_type=cooper.ConstraintType.INEQUALITY,
    formulation_type=cooper.formulations.Lagrangian,
)

Registering constraints in a CMP

When initializing the CMP, Constraints should be defined as attributes. We refer to this process as “registering” the constraints with the CMP. Utilities such as CMP.constraints and CMP.named_constraints enable iteration over the registered constraints.

Moreover, the utility method CMP.dual_parameters returns the torch.nn.parameter.Parameters of the multipliers for the constraints registered with the CMP. This facilitates the instantiation of the dual_optimizer required by the ConstrainedOptimizer. See the Optim module for further details.

class MyCMP(cooper.ConstrainedMinimizationProblem):
    def __init__(self):
        super().__init__()
        self.constraint_1 = cooper.Constraint(...)
        self.constraint_2 = cooper.Constraint(...)

cmp = MyCMP()

# Constraints can also be registered "on the fly"
cmp.constraint_3 = cooper.Constraint(...)

# Inspect constraints known to the CMP
print([constraint_name for constraint_name, constraint in cmp.named_constraints()])

# Output:
# ["constraint_1", "constraint_2", "constraint_3"]

class cooper.constraints.Constraint(constraint_type, formulation_type=<class 'cooper.formulations.formulations.Lagrangian'>, multiplier=None, penalty_coefficient=None)[source]

A constraint in a constrained optimization problem.

Parameters:

constraint_type (ConstraintType) – One of cooper.ConstraintType.EQUALITY or cooper.ConstraintType.INEQUALITY.
formulation_type (type[Formulation]) – The formulation type for computing the constraint’s contribution to the Lagrangian. Defaults to Lagrangian.
multiplier (Optional[Multiplier]) – The Lagrange multiplier associated with the constraint. This is only used for formulations with Formulation.expects_multiplier=True, such as the Lagrangian.
penalty_coefficient (Optional[PenaltyCoefficient]) – The penalty coefficient used to penalize the constraint violation. This is only used for formulations with Formulation.expects_penalty_coefficient=True, such as the AugmentedLagrangian.

ConstraintStates

In their simplest form, ConstraintState objects contain the value of the constraint violation. However, they can be extended to enable extra functionality, such as constraint sampling, the use of implicit parameterizations for the Lagrange multipliers [NCZ+20], and proxy constraints [CJG+19].

Constraint Sampling

Cooper can handle cases where only a subset of Constraint violations are observed at each step. For instance, in problems with a large number of constraints, where assessing feasibility is costly (e.g., requiring evaluation across an entire dataset), it can be more efficient to evaluate only a subset of constraints per iteration. To do this, specify the subset of observed constraint violations and ensure that an IndexedMultiplier is used as the multiplier associated with the constraint.

Consider the case of an \(n\)-dimensional constraint, where only \(m \leq n\) entries are observed. Cooper expects the following arguments when instantiating a ConstraintState:

violation should be an \(m\)-dimensional tensor containing only the observed violations.
constraint_features should contain the indices corresponding to the observed entries in violation. Note that permutations are allowed, as long as the ordering of violation and constraint_features is consistent.
If provided, strict_violation and strict_constraint_features should follow the same pattern.

This setup allows Cooper to selectively account for the specified constraint entries when calculating contributions to the Lagrangian, ignoring those that are not observed. More details on indexed multipliers can be found in the section on Multipliers.

Implicit Parameterization of Lagrange Multipliers [NCZ+20]

Cooper enables implicit parameterization of the Lagrange multipliers, for example, by using a neural network. This allows Cooper to compute multipliers based on specified input features and to learn a parametric model for the multipliers, rather than defining them individually for each constraint. This approach helps scale the Lagrangian approach to problems with numerous (or even infinite) constraints.

To achieve this, follow these steps:

Implement a Lagrange multiplier model by inheriting from ImplicitMultiplier. Given input features for a constraint, the model will map the input features to an estimate of the Lagrange multiplier for that constraint.
Pass these input features as the constraint_features in the ConstraintState object.

For additional details, refer to Multipliers.

Proxy Constraints [CJG+19]

It is often the case that constrained optimization problems involve non-differentiable constraints. This non-differentiability precludes the use of gradient-based primal-dual optimization methods. The proxy constraints technique introduced by Cotter et al. [CJG+19] leverages a differentiable surrogate of the constraint when performing the primal updates, while preserving the original (non-differentiable) constraint for updating the dual variables (see here for details).

To use proxy constraints in Cooper:

Specify violation for updating the primal variables. Ensure that this violation is differentiable with respect to the primal variables.
Provide (the potentially non-differentiable) strict_violation for updating the dual variables.

Proxy constraints can be combined with sampled constraints and implicit multipliers. To do so, include both constraint_features and strict_constraint_features as needed.

class cooper.constraints.ConstraintState(violation, constraint_features=None, strict_violation=None, strict_constraint_features=None, contributes_to_primal_update=True, contributes_to_dual_update=True)[source]

State of a constraint, including the current constraint violation.

Parameters:

violation (Tensor) – The measurement of the constraint violation at some value of the primal parameters. This is expected to be differentiable with respect to the primal parameters.
constraint_features (Optional[Tensor]) –
The features of the observed (differentiable) constraint violations, used to evaluate the associated Lagrange multiplier. For example:
- An IndexedMultiplier expects the indices of the constraints whose Lagrange multipliers are to be retrieved.
- An ImplicitMultiplier expects tensor-valued features for the constraints, and can be used to measure the violation of a subset of the constraints within a Constraint.
This field can also be used with an IndexedMultiplier to measure the violation of only a subset of the constraints within a Constraint.
strict_violation (Optional[Tensor]) – The measurement of the constraint violation used to update the dual variables. If not provided, the violation is used to update the dual variables instead.
strict_constraint_features (Optional[Tensor]) – The features of the (possibly non-differentiable) constraint. For more details, see constraint_features. If not provided, the constraint_features are used instead. strict_violation is expected when strict_constraint_features are provided.
contributes_to_primal_update (bool) – If False, the current observed constraint violation does not contribute to the primal Lagrangian but still contributes to the dual Lagrangian. This means the violations affect the update for the dual variables but not the primal variables.
contributes_to_dual_update (bool) – If False, the current observed constraint violation does not contribute to the dual Lagrangian but still contributes to the primal Lagrangian. This allows for less frequent updates to the dual variables (e.g., after several primal steps), affecting the update for the primal variables but not the dual variables. When True, Cooper will update the dual variables using strict_violation if provided, or violation otherwise.

Constrained Minimization Problems (CMPs)

ConstrainedMinimizationProblem sub-classes must be implemented and instantiated by the user, as shown in the example above.

CMPs represent constrained optimization problems and provide methods to compute the problem’s state, CMPState, which includes the loss and constraints at a given point. The methods to be implemented are CMP.compute_cmp_state and, optionally, CMP.compute_violations.

Additionally, CMPs serve as an interface between the user and Cooper, enabling access to CMP.constraints, CMP.multipliers, and CMP.penalty_coefficients.

class cooper.ConstrainedMinimizationProblem[source]

Template for constrained minimization problems, where subclasses represent specific constrained optimization problems.

Subclasses must override the CMP.compute_cmp_state method. This method should return a CMPState instance that encapsulates the current state of the optimization problem, including the evaluated loss and the values of the constraint violations.

constraints()[source]

Return an iterator over the registered constraints of the CMP.

Return type:: Iterator[Constraint]

named_constraints()[source]

Return an iterator over the registered constraints of the CMP, yielding tuples of the form (constraint_name, constraint).

Return type:: Iterator[tuple[str, Constraint]]

multipliers()[source]

Returns an iterator over the multipliers associated with the registered constraints of the CMP.

Return type:: Iterator[Multiplier]

named_multipliers()[source]

Returns an iterator over the multipliers associated with the registered constraints of the CMP, yielding tuples of the form (constraint_name, multiplier).

Return type:: Iterator[tuple[str, Multiplier]]

penalty_coefficients()[source]

Returns an iterator over the penalty coefficients associated with the registered constraints of the CMP. Constraints without penalty coefficients are skipped.

Return type:: Iterator[PenaltyCoefficient]

named_penalty_coefficients()[source]

Returns an iterator over the penalty coefficients associated with the registered constraints of the CMP, yielding tuples of the form (constraint_name, penalty_coefficient). Constraints without penalty coefficients are skipped.

Return type:: Iterator[tuple[str, PenaltyCoefficient]]

dual_parameters()[source]

Return an iterator over the parameters of the multipliers associated with the registered constraints of the CMP. This method is useful for instantiating the dual optimizers. If a multiplier is shared by several constraints, we only return its parameters once.

Return type:: Iterator[Parameter]

to(*args, **kwargs)[source]

Move the CMP to a new device and/or change the dtype of the multipliers and penalty coefficients.

Return type:: Self

state_dict()[source]

Returns the state of the CMP. This includes the state of the multipliers and penalty coefficients.

Return type:: dict

load_state_dict(state_dict)[source]

Loads the state of the CMP. This includes the state of the multipliers and penalty coefficients.

Parameters:: state_dict (dict) – A state dictionary containing the state of the CMP.
Return type:: None

abstract compute_cmp_state(*args, **kwargs)[source]

Computes the state of the CMP based on the current value of the primal parameters.

The signature of this function may be adjusted to accommodate situations that require a model, (mini-batched) inputs/targets, or other arguments to be passed. :rtype: CMPState

Note

When it is prohibitively expensive to compute the loss or constraints exactly, the CMPState may contain stochastic estimates. This is often the case when mini-batches are used to approximate the loss and constraints.

Just as in the unconstrained case, these approximations can lead to a compromise in the stability of the optimization process.

static sanity_check_cmp_state(cmp_state)[source]

Performs sanity checks on the CMP state. This helper method is useful for ensuring that the CMP state is well-formed.

Raises:

ValueError – If the loss tensor does not have a valid gradient.
ValueError – If the violation tensor of any constraint does not have a valid gradient.
ValueError – If the strict violation tensor of any constraint has a gradient.
ValueError – If a constraint contributes to the dual update but the associated formulation does not expect a multiplier.

Return type:

None

compute_violations(*args, **kwargs)[source]

Computes the violation of the CMP constraints based on the current value of the primal parameters. This function returns a CMPState instance containing the observed constraint values. Note that the returned CMPState may have loss=None, as the loss value is not necessarily computed when only evaluating the constraints.

The function signature may be adjusted to accommodate situations that require a model, (mini-batched) inputs/targets, or other arguments.

In some cases, the computation of constraints may be independent of loss evaluation. In such situations, CMP.compute_violations can be called as part of the execution of CMP.compute_cmp_state.

Return type:: CMPState

CMPStates

A CMPState is a dataclass containing the information about the loss and constraint violations measured at a specific point. The constraints included in the CMPState must be passed as a dictionary, where the keys are the Constraint objects and the values are the associated ConstraintState objects.

Note

To ensure that your CMPState is correctly constructed—with loss and constraint tensors that have gradients—you can use the CMP.sanity_check_cmp_state method.

However, avoid calling this method in every iteration of your optimization loop, as it can introduce unnecessary overhead.

class cooper.CMPState(loss=None, observed_constraints=<factory>, misc=None)[source]

Represents the state of a ConstrainedMinimizationProblem in terms of the value of its loss and constraint violations at point \(\vx_t\).

Parameters:

loss (Optional[Tensor]) – Value of the loss or main objective to be minimized \(f(\vx_t)\).
observed_constraints (dict[Constraint, ConstraintState]) – Dictionary with Constraint instances as keys and ConstraintState instances as values (containing tensors \(\vg(\vx_t)\) and \(\vh(\vx_t)\)).
misc (Optional[dict]) – Optional storage space for additional information relevant to the state of the CMP. This dictionary enables persisting the results of certain computations for post-processing. For example, one may want to retain the value of the predictions/logits computed over a given minibatch during the call to compute_cmp_state() to measure or log training statistics.

compute_primal_lagrangian()[source]

Computes and accumulates the primal-differentiable Lagrangian based on the loss and the contribution of the observed constraints.

Return type:: LagrangianStore

compute_dual_lagrangian()[source]

Computes and accumulates the dual-differentiable Lagrangian based on the contribution of the observed constraints.

The dual Lagrangian contained in LagrangianStore.lagrangian ignores the contribution of the loss, since the objective function does not depend on the dual variables. Therefore, LagrangianStore.lagrangian = 0 regardless of the value of self.loss.

Return type:: LagrangianStore

named_observed_violations()[source]

Returns an iterator over the observed constraint violations.

Return type:: Iterator[tuple[str, Tensor]]

named_observed_strict_violations()[source]

Returns an iterator over the observed strict constraint violations.

Return type:: Iterator[tuple[str, Tensor]]

named_observed_constraint_features()[source]

Returns an iterator over the observed constraint features.

Return type:: Iterator[tuple[str, Tensor]]

named_observed_strict_constraint_features()[source]

Returns an iterator over the observed strict constraint features.

Return type:: Iterator[tuple[str, Tensor]]

Checkpointing

You can checkpoint a CMP by saving and loading its state with the methods CMP.state_dict and CMP.load_state_dict. These methods capture the current values of the multipliers and penalty coefficients for all problem constraints, allowing you to resume the optimization process from that exact state.

# Save the state of the CMP
cmp_state = cmp.state_dict()
torch.save(cmp_state, "cmp_state.pth")

# Later, restore the state of the CMP
cmp = MyCMP(...) # A new CMP with default multiplier and coefficient values
loaded_state = torch.load("cmp_state.pth")
cmp.load_state_dict(loaded_state) # Load checkpointed multipliers and coefficients

For a full working example, see this tutorial.