Skip to content

(ec2): default Vpc structure results in broken networking during create/delete #21348

@laurelmay

Description

@laurelmay

Describe the bug

When creating a VPC with the default configuration, in some situations, the NAT Gateway(s) may be torn down before the private subnets are. This can cause issues for resources that rely on network egress in order to successfully (such as a Lambda-backed CloudFormation Custom Resource). Additionally, the Private subnets (and resources that depend on them) may be created before the NAT Gateways. This can result in broken initialization logic.

Expected Behavior

It should not be possible for the NAT Gateway to be deleted before the private subnets. The private subnets should depend on the Gateway.

Current Behavior

The NAT Gateway resource, the public subnets, and the internet gateway are deleted.

Reproduction Steps

import * as cdk from "aws-cdk-lib";
import * as ec2 from "aws-cdk-lib/aws-ec2";
import * as lambda from "aws-cdk-lib/aws-lambda";

const app = new cdk.App();
const stack = new cdk.Stack(app, "TestStack");
const vpc = new ec2.Vpc(stack, "TestVpc");

const fn = new lambda.Function(stack, "TestFn", {
  vpc,
  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_NAT },
  // Other stuff to require internet access
});

Building out the necessary infrastructure from here to actually create a custom resource to demonstrate is somewhat non-trivial and would e able to be copied/pasted.

Possible Solution

Whether directly or indirectly, the private subnet should depend on the NAT Gateway. This could also be done by depending on the route. This should be handled already in configureSubnet.

public configureSubnet(subnet: PrivateSubnet) {
const az = subnet.availabilityZone;
const gatewayId = this.gateways.pick(az);
subnet.addRoute('DefaultRoute', {
routerType: RouterType.NAT_GATEWAY,
routerId: gatewayId,
enablesInternetConnectivity: true,
});

But the Route Table Association has the references to the Subnet and the Route Table, and the Routes reference the Route Table and the target (NAT Gateway). But there's no reference from the Subnet to any of those resources, so there is no implicit dependency, which breaks network egress for resources in private subnets as the stack is being deleted.

Maybe this issue is limited to Lambda Functions instead of being a larger problem with EC2? Do more resources need to add a dependency on Vpc.internetConnectivityEstablished?

Additional Information/Context

No response

CDK CLI Version

2.33.0

Framework Version

No response

Node.js Version

16

OS

Linux

Language

Typescript

Language Version

No response

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-ec2Related to Amazon Elastic Compute CloudbugThis issue is a bug.effort/mediumMedium work item – several days of effortp2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions