Skip to content

accesslog: Rename HTTP "header" field to "headers"#1748

Merged
tgraf merged 1 commit intomasterfrom
l7-access-log
Oct 14, 2017
Merged

accesslog: Rename HTTP "header" field to "headers"#1748
tgraf merged 1 commit intomasterfrom
l7-access-log

Conversation

@tgraf
Copy link
Copy Markdown
Contributor

@tgraf tgraf commented Oct 11, 2017

This has been confusing to users as the field holds all HTTP headers

Fixes: #1747

This has been confusing to users as the field holds all HTTP headers

Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf requested a review from rlenglet October 12, 2017 07:52
@tgraf tgraf merged commit 9e7ffc9 into master Oct 14, 2017
@tgraf tgraf deleted the l7-access-log branch October 14, 2017 06:57
christarazi added a commit to christarazi/cilium that referenced this pull request Jul 17, 2024
As the previous commit mentions, this does not occur on main, v1.16, and
v1.15 due to cilium#29036.

However, in v1.14 and v1.13, we need to take special care because the
NameManager and SelectorCache lock can be taken while the Endpoint lock
is held during Endpoint deletion.

Here are the relevant stacktraces regarding the deadlock:

```
1: sync.Mutex.Lock [75 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086]
    sync            sema.go:77                   runtime_SemacquireMutex(*uint32(0x5), false, 43690)
    sync            mutex.go:171                 (*Mutex).lockSlow(*Mutex(cilium#1733))
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(*RWMutex(0xb0))
    fqdn            name_manager.go:70           (*NameManager).Lock(0xffffffffffffffff)
    policy          selectorcache.go:964         (*SelectorCache).RemoveSelectors(cilium#1088, {cilium#28569, 0xb, 1}, {cilium#643, cilium#32582})
    policy          l4.go:810                    (*L4Filter).removeSelectors(cilium#32582, cilium#29992)
    policy          l4.go:817                    (*L4Filter).detach(cilium#719, cilium#29993)
    policy          l4.go:988                    L4PolicyMap.Detach(...)
    policy          l4.go:1179                   (*L4Policy).Detach(cilium#20318, cilium#1383)
    policy          resolve.go:103               (*selectorPolicy).Detach(...)
    policy          distillery.go:81             (*PolicyCache).delete(cilium#1354, cilium#19354)
    policy          distillery.go:138            (*PolicyCache).LocalEndpointIdentityRemoved(cilium#523, cilium#1163)
    identitymanager manager.go:167               (*IdentityManager).remove(cilium#706, cilium#19354)
    identitymanager manager.go:147               (*IdentityManager).Remove(cilium#706, cilium#19354)
    identitymanager manager.go:52                Remove(...)
    endpoint        endpoint.go:1146             (*Endpoint).leaveLocked(cilium#1883, cilium#12221, {0x30, 0})
    endpoint        endpoint.go:2192             (*Endpoint).Delete(cilium#1883, {0x80, 0xaa})
    endpointmanager manager.go:380               (*EndpointManager).removeEndpoint(cilium#1161, 0, {0xff, 0xff})
    endpointmanager manager.go:394               (*EndpointManager).RemoveEndpoint(...)
    cmd             endpoint.go:684              (*Daemon).deleteEndpointQuiet(...)
    cmd             endpoint.go:666              (*Daemon).deleteEndpoint(cilium#1155, cilium#1883)
    cmd             endpoint.go:713              (*Daemon).DeleteEndpoint(cilium#1155, {cilium#27399, cilium#8108})
    cmd             endpoint.go:770              (*deleteEndpointID).Handle(cilium#21193, {cilium#2452, {cilium#27399, 0x4d}})
    endpoint        delete_endpoint_id.go:66     (*DeleteEndpointID).ServeHTTP(cilium#5934, {cilium#666, cilium#2242}, cilium#2452)
    middleware      operation.go:28              (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, cilium#2242}, cilium#2452)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#130), func{cilium#2242, 0x3})
    middleware      router.go:78                 NewRouter.func1({cilium#666, cilium#2242}, cilium#2451)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#718), func{cilium#2242, #59})
    middleware      redoc.go:72                  Redoc.func1({cilium#666, cilium#2242}, cilium#1251)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#4920), func{cilium#2242, #45})
    middleware      spec.go:46                   Spec.func1({cilium#666, cilium#2242}, cilium#4921)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#10532), func{cilium#2242, cilium#23015})
    metrics         middleware.go:64             (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, cilium#10533}, cilium#2451)
    api             apipanic.go:42               (*APIPanicHandler).ServeHTTP(cilium#722, {cilium#668, cilium#10533}, cilium#4922)
    http            server.go:2938               serverHandler.ServeHTTP(*Server(cilium#8105), cilium#668, cilium#10533, 0x6)
    http            server.go:2009               (*conn).serve(*conn(cilium#16005), Context{cilium#673, cilium#1554})
8: sync.Mutex.Lock [74 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086]
    sync            sema.go:77                   runtime_SemacquireMutex(*, 0x47, cilium#1154)
    sync            mutex.go:171                 (*Mutex).lockSlow(cilium#706)
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(*)
    identitymanager manager.go:99                (*IdentityManager).RemoveOldAddNew(cilium#706, 0, cilium#1154)
    identitymanager manager.go:123               RemoveOldAddNew(...)
    endpoint        policy.go:852                (*Endpoint).SetIdentity(*, cilium#1154, 0)
    endpoint        endpoint.go:1932             (*Endpoint).identityLabelsChanged(*, {cilium#674, *}, 1)
    endpoint        endpoint.go:1780             (*Endpoint).runIdentityResolver(*, {cilium#674, *}, 1, 1)
    endpoint        endpoint.go:1720             (*Endpoint).UpdateLabels(*, {cilium#674, *}, *, *, 8)
    cmd             endpoint.go:477              (*Daemon).createEndpoint(cilium#1155, {cilium#673, *}, {cilium#683, cilium#1155}, *)
    cmd             endpoint.go:542              (*putEndpointID).Handle(cilium#21192, {*, *, {*, 0xe}})
    endpoint        put_endpoint_id.go:58        (*PutEndpointID).ServeHTTP(cilium#3961, {cilium#666, *}, *)
    middleware      operation.go:28              (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    middleware      router.go:78                 NewRouter.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    middleware      redoc.go:72                  Redoc.func1({cilium#666, *}, cilium#1251)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, #45)
    middleware      spec.go:46                   Spec.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    metrics         middleware.go:64             (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, *}, *)
    api             apipanic.go:42               (*APIPanicHandler).ServeHTTP(#49, {cilium#668, *}, *)
    http            server.go:2938               serverHandler.ServeHTTP({cilium#653}, {cilium#668, *}, 6)
    http            server.go:2009               (*conn).serve(*, {cilium#673, cilium#1554})
5: sync.Mutex.Lock [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229]
    sync            sema.go:77                   runtime_SemacquireMutex(cilium#142, 0xe8, *)
    sync            mutex.go:171                 (*Mutex).lockSlow(cilium#1733)
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(0x68)
    fqdn            name_manager.go:70           (*NameManager).Lock(*)
    policy          selectorcache.go:798         (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, *}, {{*, 0x4d}, {0, 0}})
    policy          l4.go:628                    (*L4Filter).cacheFQDNSelector(...)
    policy          l4.go:623                    (*L4Filter).cacheFQDNSelectors(*, {*, 4, cilium#193}, cilium#536)
    policy          l4.go:725                    createL4Filter({cilium#680, *}, {*, 1, 1}, 0, {cilium#660, *}, {{*, 4}, ...}, ...)
    policy          l4.go:879                    createL4EgressFilter(...)
    policy          rule.go:717                  mergeEgressPortProto({cilium#680, *}, #44, {*, 0xa, 0}, *, {cilium#660, *}, {{*, ...}, ...}, ...)
    policy          rule.go:672                  mergeEgress.func1({cilium#660, *})
    api             l4.go:284                    PortRules.Iterate({*, 1, cilium#546}, *)
    policy          rule.go:624                  mergeEgress({cilium#680, *}, *, {*, 1, 1}, 0, {cilium#661, *}, {cilium#662, ...}, ...)
    policy          rule.go:753                  (*rule).resolveEgressPolicy(*, {cilium#680, *}, *, *, *, {0, 0, 0}, {0, ...})
    policy          rules.go:103                 ruleSlice.resolveL4EgressPolicy({*, *, *}, {cilium#680, *}, *)
    policy          repository.go:718            (*Repository).resolvePolicyLocked(cilium#1089, *)
    policy          distillery.go:119            (*PolicyCache).updateSelectorPolicy(cilium#1354, *)
    policy          distillery.go:153            (*PolicyCache).UpdatePolicy(...)
    endpoint        policy.go:262                (*Endpoint).regeneratePolicy(*)
    endpoint        bpf.go:744                   (*Endpoint).runPreCompilationSteps(*, *, *)
    endpoint        bpf.go:589                   (*Endpoint).regenerateBPF(*, *)
    endpoint        policy.go:457                (*Endpoint).regenerate(*, *)
    endpoint        events.go:53                 (*EndpointRegenerationEvent).Handle(*, *)
    eventqueue      eventqueue.go:245            (*EventQueue).run.func1()
    sync            once.go:74                   (*Once).doSlow(*, *)
    sync            once.go:65                   (*Once).Do(...)
    eventqueue      eventqueue.go:233            (*EventQueue).run(*)
1: select [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229]
    semaphore       semaphore.go:60              (*Weighted).Acquire(cilium#1092, {cilium#671, cilium#722}, cilium#766)
    lock            semaphored_mutex.go:30       (*SemaphoredMutex).Lock(...)
    ipcache         ipcache.go:140               (*IPCache).Lock(...)
    ipcache         cidr.go:56                   (*IPCache).AllocateCIDRs(cilium#1316, {0, 0, cilium#2037}, {0, 0, 0}, 0)
    ipcache         cidr.go:103                  (*IPCache).AllocateCIDRsForIPs(0, {0, cilium#697, 0}, 0xffffffffffffffff)
    cmd             identity.go:114              cachingIdentityAllocator.AllocateCIDRsForIPs(...)
    policy          selectorcache.go:509         (*SelectorCache).allocateIdentityMappings(cilium#1088, {{0, 0}, {cilium#5036, 0x2b}}, #45)
    policy          selectorcache.go:843         (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, cilium#27811}, {{0, 0}, {cilium#5036, 0x2b}})
    policy          l4.go:628                    (*L4Filter).cacheFQDNSelector(...)
    policy          l4.go:623                    (*L4Filter).cacheFQDNSelectors(cilium#27811, {cilium#42936, 0x287, cilium#193}, cilium#536)
    policy          l4.go:725                    createL4Filter({cilium#680, cilium#22826}, {cilium#22828, 1, 1}, 0, {cilium#660, cilium#24510}, {{cilium#4487, 3}, ...}, ...)
    policy          l4.go:879                    createL4EgressFilter(...)
    policy          rule.go:717                  mergeEgressPortProto({cilium#680, cilium#22826}, #44, {cilium#22828, 0xa, #78536}, #79633, {cilium#660, cilium#24510}, {{cilium#4487, ...}, ...}, ...)
    policy          rule.go:672                  mergeEgress.func1({cilium#660, cilium#24510})
    api             l4.go:284                    PortRules.Iterate({cilium#24510, 1, cilium#546}, cilium#11741)
    policy          rule.go:624                  mergeEgress({cilium#680, cilium#22826}, cilium#18687, {cilium#22828, 1, 1}, 0, {cilium#661, cilium#5624}, {cilium#662, ...}, ...)
    policy          rule.go:753                  (*rule).resolveEgressPolicy(cilium#24575, {cilium#680, cilium#22826}, cilium#18687, cilium#29345, cilium#4782, {0, 0, 0}, {0, ...})
    policy          rules.go:103                 ruleSlice.resolveL4EgressPolicy({cilium#10690, 0xb, 5}, {cilium#680, cilium#22826}, cilium#18687)
    policy          repository.go:718            (*Repository).resolvePolicyLocked(cilium#1089, cilium#18461)
    policy          distillery.go:119            (*PolicyCache).updateSelectorPolicy(cilium#1354, cilium#18461)
    policy          distillery.go:153            (*PolicyCache).UpdatePolicy(...)
    endpoint        policy.go:262                (*Endpoint).regeneratePolicy(cilium#1748)
    endpoint        bpf.go:744                   (*Endpoint).runPreCompilationSteps(cilium#1748, cilium#27542, cilium#4781)
    endpoint        bpf.go:589                   (*Endpoint).regenerateBPF(cilium#1748, cilium#27542)
    endpoint        policy.go:457                (*Endpoint).regenerate(cilium#1748, cilium#27542)
    endpoint        events.go:53                 (*EndpointRegenerationEvent).Handle(cilium#18609, cilium#703)
    eventqueue      eventqueue.go:245            (*EventQueue).run.func1()
    sync            once.go:74                   (*Once).doSlow(*Once(cilium#45414), func(cilium#69))
    sync            once.go:65                   (*Once).Do(...)
    eventqueue      eventqueue.go:233            (*EventQueue).run(cilium#6023)
```

Generated from pp tool: https://github.com/maruel/panicparse

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit to christarazi/cilium that referenced this pull request Aug 7, 2024
As the previous commit mentions, this does not occur on main, v1.16, and
v1.15 due to cilium#29036.

However, in v1.14 and v1.13, we need to take special care because the
NameManager and SelectorCache lock can be taken while the Endpoint lock
is held during Endpoint deletion.

Here are the relevant stacktraces regarding the deadlock:

```
1: sync.Mutex.Lock [75 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086]
    sync            sema.go:77                   runtime_SemacquireMutex(*uint32(0x5), false, 43690)
    sync            mutex.go:171                 (*Mutex).lockSlow(*Mutex(cilium#1733))
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(*RWMutex(0xb0))
    fqdn            name_manager.go:70           (*NameManager).Lock(0xffffffffffffffff)
    policy          selectorcache.go:964         (*SelectorCache).RemoveSelectors(cilium#1088, {cilium#28569, 0xb, 1}, {cilium#643, cilium#32582})
    policy          l4.go:810                    (*L4Filter).removeSelectors(cilium#32582, cilium#29992)
    policy          l4.go:817                    (*L4Filter).detach(cilium#719, cilium#29993)
    policy          l4.go:988                    L4PolicyMap.Detach(...)
    policy          l4.go:1179                   (*L4Policy).Detach(cilium#20318, cilium#1383)
    policy          resolve.go:103               (*selectorPolicy).Detach(...)
    policy          distillery.go:81             (*PolicyCache).delete(cilium#1354, cilium#19354)
    policy          distillery.go:138            (*PolicyCache).LocalEndpointIdentityRemoved(cilium#523, cilium#1163)
    identitymanager manager.go:167               (*IdentityManager).remove(cilium#706, cilium#19354)
    identitymanager manager.go:147               (*IdentityManager).Remove(cilium#706, cilium#19354)
    identitymanager manager.go:52                Remove(...)
    endpoint        endpoint.go:1146             (*Endpoint).leaveLocked(cilium#1883, cilium#12221, {0x30, 0})
    endpoint        endpoint.go:2192             (*Endpoint).Delete(cilium#1883, {0x80, 0xaa})
    endpointmanager manager.go:380               (*EndpointManager).removeEndpoint(cilium#1161, 0, {0xff, 0xff})
    endpointmanager manager.go:394               (*EndpointManager).RemoveEndpoint(...)
    cmd             endpoint.go:684              (*Daemon).deleteEndpointQuiet(...)
    cmd             endpoint.go:666              (*Daemon).deleteEndpoint(cilium#1155, cilium#1883)
    cmd             endpoint.go:713              (*Daemon).DeleteEndpoint(cilium#1155, {cilium#27399, cilium#8108})
    cmd             endpoint.go:770              (*deleteEndpointID).Handle(cilium#21193, {cilium#2452, {cilium#27399, 0x4d}})
    endpoint        delete_endpoint_id.go:66     (*DeleteEndpointID).ServeHTTP(cilium#5934, {cilium#666, cilium#2242}, cilium#2452)
    middleware      operation.go:28              (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, cilium#2242}, cilium#2452)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#130), func{cilium#2242, 0x3})
    middleware      router.go:78                 NewRouter.func1({cilium#666, cilium#2242}, cilium#2451)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#718), func{cilium#2242, #59})
    middleware      redoc.go:72                  Redoc.func1({cilium#666, cilium#2242}, cilium#1251)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#4920), func{cilium#2242, #45})
    middleware      spec.go:46                   Spec.func1({cilium#666, cilium#2242}, cilium#4921)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#10532), func{cilium#2242, cilium#23015})
    metrics         middleware.go:64             (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, cilium#10533}, cilium#2451)
    api             apipanic.go:42               (*APIPanicHandler).ServeHTTP(cilium#722, {cilium#668, cilium#10533}, cilium#4922)
    http            server.go:2938               serverHandler.ServeHTTP(*Server(cilium#8105), cilium#668, cilium#10533, 0x6)
    http            server.go:2009               (*conn).serve(*conn(cilium#16005), Context{cilium#673, cilium#1554})
8: sync.Mutex.Lock [74 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086]
    sync            sema.go:77                   runtime_SemacquireMutex(*, 0x47, cilium#1154)
    sync            mutex.go:171                 (*Mutex).lockSlow(cilium#706)
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(*)
    identitymanager manager.go:99                (*IdentityManager).RemoveOldAddNew(cilium#706, 0, cilium#1154)
    identitymanager manager.go:123               RemoveOldAddNew(...)
    endpoint        policy.go:852                (*Endpoint).SetIdentity(*, cilium#1154, 0)
    endpoint        endpoint.go:1932             (*Endpoint).identityLabelsChanged(*, {cilium#674, *}, 1)
    endpoint        endpoint.go:1780             (*Endpoint).runIdentityResolver(*, {cilium#674, *}, 1, 1)
    endpoint        endpoint.go:1720             (*Endpoint).UpdateLabels(*, {cilium#674, *}, *, *, 8)
    cmd             endpoint.go:477              (*Daemon).createEndpoint(cilium#1155, {cilium#673, *}, {cilium#683, cilium#1155}, *)
    cmd             endpoint.go:542              (*putEndpointID).Handle(cilium#21192, {*, *, {*, 0xe}})
    endpoint        put_endpoint_id.go:58        (*PutEndpointID).ServeHTTP(cilium#3961, {cilium#666, *}, *)
    middleware      operation.go:28              (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    middleware      router.go:78                 NewRouter.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    middleware      redoc.go:72                  Redoc.func1({cilium#666, *}, cilium#1251)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, #45)
    middleware      spec.go:46                   Spec.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    metrics         middleware.go:64             (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, *}, *)
    api             apipanic.go:42               (*APIPanicHandler).ServeHTTP(#49, {cilium#668, *}, *)
    http            server.go:2938               serverHandler.ServeHTTP({cilium#653}, {cilium#668, *}, 6)
    http            server.go:2009               (*conn).serve(*, {cilium#673, cilium#1554})
5: sync.Mutex.Lock [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229]
    sync            sema.go:77                   runtime_SemacquireMutex(cilium#142, 0xe8, *)
    sync            mutex.go:171                 (*Mutex).lockSlow(cilium#1733)
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(0x68)
    fqdn            name_manager.go:70           (*NameManager).Lock(*)
    policy          selectorcache.go:798         (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, *}, {{*, 0x4d}, {0, 0}})
    policy          l4.go:628                    (*L4Filter).cacheFQDNSelector(...)
    policy          l4.go:623                    (*L4Filter).cacheFQDNSelectors(*, {*, 4, cilium#193}, cilium#536)
    policy          l4.go:725                    createL4Filter({cilium#680, *}, {*, 1, 1}, 0, {cilium#660, *}, {{*, 4}, ...}, ...)
    policy          l4.go:879                    createL4EgressFilter(...)
    policy          rule.go:717                  mergeEgressPortProto({cilium#680, *}, #44, {*, 0xa, 0}, *, {cilium#660, *}, {{*, ...}, ...}, ...)
    policy          rule.go:672                  mergeEgress.func1({cilium#660, *})
    api             l4.go:284                    PortRules.Iterate({*, 1, cilium#546}, *)
    policy          rule.go:624                  mergeEgress({cilium#680, *}, *, {*, 1, 1}, 0, {cilium#661, *}, {cilium#662, ...}, ...)
    policy          rule.go:753                  (*rule).resolveEgressPolicy(*, {cilium#680, *}, *, *, *, {0, 0, 0}, {0, ...})
    policy          rules.go:103                 ruleSlice.resolveL4EgressPolicy({*, *, *}, {cilium#680, *}, *)
    policy          repository.go:718            (*Repository).resolvePolicyLocked(cilium#1089, *)
    policy          distillery.go:119            (*PolicyCache).updateSelectorPolicy(cilium#1354, *)
    policy          distillery.go:153            (*PolicyCache).UpdatePolicy(...)
    endpoint        policy.go:262                (*Endpoint).regeneratePolicy(*)
    endpoint        bpf.go:744                   (*Endpoint).runPreCompilationSteps(*, *, *)
    endpoint        bpf.go:589                   (*Endpoint).regenerateBPF(*, *)
    endpoint        policy.go:457                (*Endpoint).regenerate(*, *)
    endpoint        events.go:53                 (*EndpointRegenerationEvent).Handle(*, *)
    eventqueue      eventqueue.go:245            (*EventQueue).run.func1()
    sync            once.go:74                   (*Once).doSlow(*, *)
    sync            once.go:65                   (*Once).Do(...)
    eventqueue      eventqueue.go:233            (*EventQueue).run(*)
1: select [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229]
    semaphore       semaphore.go:60              (*Weighted).Acquire(cilium#1092, {cilium#671, cilium#722}, cilium#766)
    lock            semaphored_mutex.go:30       (*SemaphoredMutex).Lock(...)
    ipcache         ipcache.go:140               (*IPCache).Lock(...)
    ipcache         cidr.go:56                   (*IPCache).AllocateCIDRs(cilium#1316, {0, 0, cilium#2037}, {0, 0, 0}, 0)
    ipcache         cidr.go:103                  (*IPCache).AllocateCIDRsForIPs(0, {0, cilium#697, 0}, 0xffffffffffffffff)
    cmd             identity.go:114              cachingIdentityAllocator.AllocateCIDRsForIPs(...)
    policy          selectorcache.go:509         (*SelectorCache).allocateIdentityMappings(cilium#1088, {{0, 0}, {cilium#5036, 0x2b}}, #45)
    policy          selectorcache.go:843         (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, cilium#27811}, {{0, 0}, {cilium#5036, 0x2b}})
    policy          l4.go:628                    (*L4Filter).cacheFQDNSelector(...)
    policy          l4.go:623                    (*L4Filter).cacheFQDNSelectors(cilium#27811, {cilium#42936, 0x287, cilium#193}, cilium#536)
    policy          l4.go:725                    createL4Filter({cilium#680, cilium#22826}, {cilium#22828, 1, 1}, 0, {cilium#660, cilium#24510}, {{cilium#4487, 3}, ...}, ...)
    policy          l4.go:879                    createL4EgressFilter(...)
    policy          rule.go:717                  mergeEgressPortProto({cilium#680, cilium#22826}, #44, {cilium#22828, 0xa, #78536}, #79633, {cilium#660, cilium#24510}, {{cilium#4487, ...}, ...}, ...)
    policy          rule.go:672                  mergeEgress.func1({cilium#660, cilium#24510})
    api             l4.go:284                    PortRules.Iterate({cilium#24510, 1, cilium#546}, cilium#11741)
    policy          rule.go:624                  mergeEgress({cilium#680, cilium#22826}, cilium#18687, {cilium#22828, 1, 1}, 0, {cilium#661, cilium#5624}, {cilium#662, ...}, ...)
    policy          rule.go:753                  (*rule).resolveEgressPolicy(cilium#24575, {cilium#680, cilium#22826}, cilium#18687, cilium#29345, cilium#4782, {0, 0, 0}, {0, ...})
    policy          rules.go:103                 ruleSlice.resolveL4EgressPolicy({cilium#10690, 0xb, 5}, {cilium#680, cilium#22826}, cilium#18687)
    policy          repository.go:718            (*Repository).resolvePolicyLocked(cilium#1089, cilium#18461)
    policy          distillery.go:119            (*PolicyCache).updateSelectorPolicy(cilium#1354, cilium#18461)
    policy          distillery.go:153            (*PolicyCache).UpdatePolicy(...)
    endpoint        policy.go:262                (*Endpoint).regeneratePolicy(cilium#1748)
    endpoint        bpf.go:744                   (*Endpoint).runPreCompilationSteps(cilium#1748, cilium#27542, cilium#4781)
    endpoint        bpf.go:589                   (*Endpoint).regenerateBPF(cilium#1748, cilium#27542)
    endpoint        policy.go:457                (*Endpoint).regenerate(cilium#1748, cilium#27542)
    endpoint        events.go:53                 (*EndpointRegenerationEvent).Handle(cilium#18609, cilium#703)
    eventqueue      eventqueue.go:245            (*EventQueue).run.func1()
    sync            once.go:74                   (*Once).doSlow(*Once(cilium#45414), func(cilium#69))
    sync            once.go:65                   (*Once).Do(...)
    eventqueue      eventqueue.go:233            (*EventQueue).run(cilium#6023)
```

Generated from pp tool: https://github.com/maruel/panicparse

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit to christarazi/cilium that referenced this pull request Aug 13, 2024
As the previous commit mentions, this does not occur on main, v1.16, and
v1.15 due to cilium#29036.

However, in v1.14 and v1.13, we need to take special care because the
NameManager and SelectorCache lock can be taken while the Endpoint lock
is held during Endpoint deletion.

Here are the relevant stacktraces regarding the deadlock:

```
1: sync.Mutex.Lock [75 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086]
    sync            sema.go:77                   runtime_SemacquireMutex(*uint32(0x5), false, 43690)
    sync            mutex.go:171                 (*Mutex).lockSlow(*Mutex(cilium#1733))
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(*RWMutex(0xb0))
    fqdn            name_manager.go:70           (*NameManager).Lock(0xffffffffffffffff)
    policy          selectorcache.go:964         (*SelectorCache).RemoveSelectors(cilium#1088, {cilium#28569, 0xb, 1}, {cilium#643, cilium#32582})
    policy          l4.go:810                    (*L4Filter).removeSelectors(cilium#32582, cilium#29992)
    policy          l4.go:817                    (*L4Filter).detach(cilium#719, cilium#29993)
    policy          l4.go:988                    L4PolicyMap.Detach(...)
    policy          l4.go:1179                   (*L4Policy).Detach(cilium#20318, cilium#1383)
    policy          resolve.go:103               (*selectorPolicy).Detach(...)
    policy          distillery.go:81             (*PolicyCache).delete(cilium#1354, cilium#19354)
    policy          distillery.go:138            (*PolicyCache).LocalEndpointIdentityRemoved(cilium#523, cilium#1163)
    identitymanager manager.go:167               (*IdentityManager).remove(cilium#706, cilium#19354)
    identitymanager manager.go:147               (*IdentityManager).Remove(cilium#706, cilium#19354)
    identitymanager manager.go:52                Remove(...)
    endpoint        endpoint.go:1146             (*Endpoint).leaveLocked(cilium#1883, cilium#12221, {0x30, 0})
    endpoint        endpoint.go:2192             (*Endpoint).Delete(cilium#1883, {0x80, 0xaa})
    endpointmanager manager.go:380               (*EndpointManager).removeEndpoint(cilium#1161, 0, {0xff, 0xff})
    endpointmanager manager.go:394               (*EndpointManager).RemoveEndpoint(...)
    cmd             endpoint.go:684              (*Daemon).deleteEndpointQuiet(...)
    cmd             endpoint.go:666              (*Daemon).deleteEndpoint(cilium#1155, cilium#1883)
    cmd             endpoint.go:713              (*Daemon).DeleteEndpoint(cilium#1155, {cilium#27399, cilium#8108})
    cmd             endpoint.go:770              (*deleteEndpointID).Handle(cilium#21193, {cilium#2452, {cilium#27399, 0x4d}})
    endpoint        delete_endpoint_id.go:66     (*DeleteEndpointID).ServeHTTP(cilium#5934, {cilium#666, cilium#2242}, cilium#2452)
    middleware      operation.go:28              (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, cilium#2242}, cilium#2452)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#130), func{cilium#2242, 0x3})
    middleware      router.go:78                 NewRouter.func1({cilium#666, cilium#2242}, cilium#2451)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#718), func{cilium#2242, #59})
    middleware      redoc.go:72                  Redoc.func1({cilium#666, cilium#2242}, cilium#1251)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#4920), func{cilium#2242, #45})
    middleware      spec.go:46                   Spec.func1({cilium#666, cilium#2242}, cilium#4921)
    http            server.go:2136               HandlerFunc.ServeHTTP(ReadCloser(cilium#10532), func{cilium#2242, cilium#23015})
    metrics         middleware.go:64             (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, cilium#10533}, cilium#2451)
    api             apipanic.go:42               (*APIPanicHandler).ServeHTTP(cilium#722, {cilium#668, cilium#10533}, cilium#4922)
    http            server.go:2938               serverHandler.ServeHTTP(*Server(cilium#8105), cilium#668, cilium#10533, 0x6)
    http            server.go:2009               (*conn).serve(*conn(cilium#16005), Context{cilium#673, cilium#1554})
8: sync.Mutex.Lock [74 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086]
    sync            sema.go:77                   runtime_SemacquireMutex(*, 0x47, cilium#1154)
    sync            mutex.go:171                 (*Mutex).lockSlow(cilium#706)
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(*)
    identitymanager manager.go:99                (*IdentityManager).RemoveOldAddNew(cilium#706, 0, cilium#1154)
    identitymanager manager.go:123               RemoveOldAddNew(...)
    endpoint        policy.go:852                (*Endpoint).SetIdentity(*, cilium#1154, 0)
    endpoint        endpoint.go:1932             (*Endpoint).identityLabelsChanged(*, {cilium#674, *}, 1)
    endpoint        endpoint.go:1780             (*Endpoint).runIdentityResolver(*, {cilium#674, *}, 1, 1)
    endpoint        endpoint.go:1720             (*Endpoint).UpdateLabels(*, {cilium#674, *}, *, *, 8)
    cmd             endpoint.go:477              (*Daemon).createEndpoint(cilium#1155, {cilium#673, *}, {cilium#683, cilium#1155}, *)
    cmd             endpoint.go:542              (*putEndpointID).Handle(cilium#21192, {*, *, {*, 0xe}})
    endpoint        put_endpoint_id.go:58        (*PutEndpointID).ServeHTTP(cilium#3961, {cilium#666, *}, *)
    middleware      operation.go:28              (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    middleware      router.go:78                 NewRouter.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    middleware      redoc.go:72                  Redoc.func1({cilium#666, *}, cilium#1251)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, #45)
    middleware      spec.go:46                   Spec.func1({cilium#666, *}, *)
    http            server.go:2136               HandlerFunc.ServeHTTP(*, {cilium#666, *}, *)
    metrics         middleware.go:64             (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, *}, *)
    api             apipanic.go:42               (*APIPanicHandler).ServeHTTP(#49, {cilium#668, *}, *)
    http            server.go:2938               serverHandler.ServeHTTP({cilium#653}, {cilium#668, *}, 6)
    http            server.go:2009               (*conn).serve(*, {cilium#673, cilium#1554})
5: sync.Mutex.Lock [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229]
    sync            sema.go:77                   runtime_SemacquireMutex(cilium#142, 0xe8, *)
    sync            mutex.go:171                 (*Mutex).lockSlow(cilium#1733)
    sync            mutex.go:90                  (*Mutex).Lock(...)
    sync            rwmutex.go:147               (*RWMutex).Lock(0x68)
    fqdn            name_manager.go:70           (*NameManager).Lock(*)
    policy          selectorcache.go:798         (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, *}, {{*, 0x4d}, {0, 0}})
    policy          l4.go:628                    (*L4Filter).cacheFQDNSelector(...)
    policy          l4.go:623                    (*L4Filter).cacheFQDNSelectors(*, {*, 4, cilium#193}, cilium#536)
    policy          l4.go:725                    createL4Filter({cilium#680, *}, {*, 1, 1}, 0, {cilium#660, *}, {{*, 4}, ...}, ...)
    policy          l4.go:879                    createL4EgressFilter(...)
    policy          rule.go:717                  mergeEgressPortProto({cilium#680, *}, #44, {*, 0xa, 0}, *, {cilium#660, *}, {{*, ...}, ...}, ...)
    policy          rule.go:672                  mergeEgress.func1({cilium#660, *})
    api             l4.go:284                    PortRules.Iterate({*, 1, cilium#546}, *)
    policy          rule.go:624                  mergeEgress({cilium#680, *}, *, {*, 1, 1}, 0, {cilium#661, *}, {cilium#662, ...}, ...)
    policy          rule.go:753                  (*rule).resolveEgressPolicy(*, {cilium#680, *}, *, *, *, {0, 0, 0}, {0, ...})
    policy          rules.go:103                 ruleSlice.resolveL4EgressPolicy({*, *, *}, {cilium#680, *}, *)
    policy          repository.go:718            (*Repository).resolvePolicyLocked(cilium#1089, *)
    policy          distillery.go:119            (*PolicyCache).updateSelectorPolicy(cilium#1354, *)
    policy          distillery.go:153            (*PolicyCache).UpdatePolicy(...)
    endpoint        policy.go:262                (*Endpoint).regeneratePolicy(*)
    endpoint        bpf.go:744                   (*Endpoint).runPreCompilationSteps(*, *, *)
    endpoint        bpf.go:589                   (*Endpoint).regenerateBPF(*, *)
    endpoint        policy.go:457                (*Endpoint).regenerate(*, *)
    endpoint        events.go:53                 (*EndpointRegenerationEvent).Handle(*, *)
    eventqueue      eventqueue.go:245            (*EventQueue).run.func1()
    sync            once.go:74                   (*Once).doSlow(*, *)
    sync            once.go:65                   (*Once).Do(...)
    eventqueue      eventqueue.go:233            (*EventQueue).run(*)
1: select [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229]
    semaphore       semaphore.go:60              (*Weighted).Acquire(cilium#1092, {cilium#671, cilium#722}, cilium#766)
    lock            semaphored_mutex.go:30       (*SemaphoredMutex).Lock(...)
    ipcache         ipcache.go:140               (*IPCache).Lock(...)
    ipcache         cidr.go:56                   (*IPCache).AllocateCIDRs(cilium#1316, {0, 0, cilium#2037}, {0, 0, 0}, 0)
    ipcache         cidr.go:103                  (*IPCache).AllocateCIDRsForIPs(0, {0, cilium#697, 0}, 0xffffffffffffffff)
    cmd             identity.go:114              cachingIdentityAllocator.AllocateCIDRsForIPs(...)
    policy          selectorcache.go:509         (*SelectorCache).allocateIdentityMappings(cilium#1088, {{0, 0}, {cilium#5036, 0x2b}}, #45)
    policy          selectorcache.go:843         (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, cilium#27811}, {{0, 0}, {cilium#5036, 0x2b}})
    policy          l4.go:628                    (*L4Filter).cacheFQDNSelector(...)
    policy          l4.go:623                    (*L4Filter).cacheFQDNSelectors(cilium#27811, {cilium#42936, 0x287, cilium#193}, cilium#536)
    policy          l4.go:725                    createL4Filter({cilium#680, cilium#22826}, {cilium#22828, 1, 1}, 0, {cilium#660, cilium#24510}, {{cilium#4487, 3}, ...}, ...)
    policy          l4.go:879                    createL4EgressFilter(...)
    policy          rule.go:717                  mergeEgressPortProto({cilium#680, cilium#22826}, #44, {cilium#22828, 0xa, #78536}, #79633, {cilium#660, cilium#24510}, {{cilium#4487, ...}, ...}, ...)
    policy          rule.go:672                  mergeEgress.func1({cilium#660, cilium#24510})
    api             l4.go:284                    PortRules.Iterate({cilium#24510, 1, cilium#546}, cilium#11741)
    policy          rule.go:624                  mergeEgress({cilium#680, cilium#22826}, cilium#18687, {cilium#22828, 1, 1}, 0, {cilium#661, cilium#5624}, {cilium#662, ...}, ...)
    policy          rule.go:753                  (*rule).resolveEgressPolicy(cilium#24575, {cilium#680, cilium#22826}, cilium#18687, cilium#29345, cilium#4782, {0, 0, 0}, {0, ...})
    policy          rules.go:103                 ruleSlice.resolveL4EgressPolicy({cilium#10690, 0xb, 5}, {cilium#680, cilium#22826}, cilium#18687)
    policy          repository.go:718            (*Repository).resolvePolicyLocked(cilium#1089, cilium#18461)
    policy          distillery.go:119            (*PolicyCache).updateSelectorPolicy(cilium#1354, cilium#18461)
    policy          distillery.go:153            (*PolicyCache).UpdatePolicy(...)
    endpoint        policy.go:262                (*Endpoint).regeneratePolicy(cilium#1748)
    endpoint        bpf.go:744                   (*Endpoint).runPreCompilationSteps(cilium#1748, cilium#27542, cilium#4781)
    endpoint        bpf.go:589                   (*Endpoint).regenerateBPF(cilium#1748, cilium#27542)
    endpoint        policy.go:457                (*Endpoint).regenerate(cilium#1748, cilium#27542)
    endpoint        events.go:53                 (*EndpointRegenerationEvent).Handle(cilium#18609, cilium#703)
    eventqueue      eventqueue.go:245            (*EventQueue).run.func1()
    sync            once.go:74                   (*Once).doSlow(*Once(cilium#45414), func(cilium#69))
    sync            once.go:65                   (*Once).Do(...)
    eventqueue      eventqueue.go:233            (*EventQueue).run(cilium#6023)
```

Generated from pp tool: https://github.com/maruel/panicparse

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants