Which Cloudflare product(s) does this pertain to?
Wrangler
What versions are you using?
3.86.0 [Wrangler]
What operating system and version are you using?
Mac Sonoma 14.7.1
Please provide a link to a minimal reproduction
No response
Describe the Bug
We were trying to update wrangler for quite a while and here are some very weird problems we've experienced.
- first of all - everything works (almost) fine on local and in the Cloudflare. The only problem is Playwright running tests in GitHubActions (via
mcr.microsoft.com/playwright:v1.48.1-focal)
- locally problem can be reproduced by changing worker code - eventually after another reload it will stop processing requests, but will accept connection (so socket will hang). Any other change will restore the functionality
- everything works fine on
3.78.10
- any version bump will require
--x-dev-env=false, or it would not work
- as this flag was removed at
3.86.1 any newer version dont work.
- unless you use
--log-level=debug which somehow resolve the problem (tests are passing 9 out of 10)
We run 7 workers in parallel, with the last one - "gateway" starting after all. Here are logs from the area of interest
v3.78.10
[WebServer] [1] [GATEWAY] ⛅️ wrangler 3.78.10 (update available 3.95.0)
[WebServer] [1] [GATEWAY] --------------------------------------------------------
[WebServer] [1] [GATEWAY] Using vars defined in .dev.vars
[WebServer] [1] [GATEWAY] Your worker has access to the following bindings:
[WebServer] [1] [GATEWAY] - Services:
[WebServer] [1] [GATEWAY] - IMPRESSIONS: thanks-impressions
[WebServer] [1] [GATEWAY] - IMPRESSIONS-V2: thanks-impressions-v2
[WebServer] [1] [GATEWAY] - PROMOTIONS: thanks-promotion-service
[WebServer] [1] [GATEWAY] - SUBSCRIPTIONS: thanks-subscription-service
[WebServer] [1] [GATEWAY] - NODE_SERVICES: thanks-node-services
[WebServer] [1] [GATEWAY] - TEMPLATE: thanks-template-service
[WebServer] [1] [GATEWAY] [wrangler:inf] Ready on http://localhost:8081
[WebServer] [1] [GATEWAY] ⎔ Starting local server...
[WebServer] [1] [GATEWAY] @service:gateway@ : [get]/ : dev,US,IAD
[WebServer] [1] [GATEWAY] [wrangler:inf] GET / 400 Bad Request (598ms)
/// ⬆️ Playwright probing URL and ⬇️ running tests
Running 18 tests using 1 worker
[WebServer] [1] [GATEWAY] [wrangler:inf] GET /v1/widget.js 200 OK (28ms)
/// ⬆️ working
v3.86.0 and --x-dev-env=false
[WebServer] [1] [GATEWAY] ⛅️ wrangler 3.86.0 (update available 3.99.0)
[WebServer] [1] [GATEWAY] -------------------------------------------------------
[WebServer] [1] [GATEWAY] Using vars defined in .dev.vars
[WebServer] [1] [GATEWAY] Your worker has access to the following bindings:
[WebServer] [1] [GATEWAY] - Services:
[WebServer] [1] [GATEWAY] - IMPRESSIONS: thanks-impressions [not connected]
[WebServer] [1] [GATEWAY] - IMPRESSIONS-V2: thanks-impressions-v2 [not connected]
[WebServer] [1] [GATEWAY] - PROMOTIONS: thanks-promotion-service [not connected]
[WebServer] [1] [GATEWAY] - SUBSCRIPTIONS: thanks-subscription-service [not connected]
[WebServer] [1] [GATEWAY] - NODE_SERVICES: thanks-node-services [not connected]
[WebServer] [1] [GATEWAY] - TEMPLATE: thanks-template-service [not connected]
[WebServer] [1] [GATEWAY]
[WebServer] [1] [GATEWAY] Service bindings & durable object bindings connect to other `wrangler dev` processes running locally, with their connection status indicated by [connected] or [not connected]. For more details, refer to https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/#local-development
[WebServer] [1] [GATEWAY] [wrangler:inf] Ready on http://localhost:8081
[WebServer] [1] [WebServer] [GATEWAY] ⎔ Starting local server...
[WebServer] [1] [GATEWAY] [wrangler:inf] GET / 400 Bad Request (497ms)
/// ⬆️ Playwright probing URL and ⬇️ running tests
Running 27 tests using 1 worker
[WebServer] [1] [GATEWAY] ⎔ Reloading local server...
[WebServer] [1] [GATEWAY] [wrangler:inf] GET /v1/widget.js 200 OK (24ms)
/// ⬆️ working
3.86.1 (without -x-dev-env)
[WebServer] [1] [GATEWAY] ⛅️ wrangler 3.86.1 (update available 3.99.0)
[WebServer] [1] [GATEWAY] -------------------------------------------------------
[WebServer] [1] [GATEWAY] Your worker has access to the following bindings:
[WebServer] [1] [GATEWAY] - Services:
[WebServer] [1] [GATEWAY] - IMPRESSIONS: thanks-impressions [not connected]
[WebServer] [1] [GATEWAY] - IMPRESSIONS-V2: thanks-impressions-v2 [not connected]
[WebServer] [1] [GATEWAY] - PROMOTIONS: thanks-promotion-service [not connected]
[WebServer] [1] [GATEWAY] - SUBSCRIPTIONS: thanks-subscription-service [not connected]
[WebServer] [1] [GATEWAY] - NODE_SERVICES: thanks-node-services [not connected]
[WebServer] [1] [GATEWAY] - TEMPLATE: thanks-template-service [not connected]
[WebServer] [1] [GATEWAY]
[WebServer] [1] [GATEWAY] Service bindings & durable object bindings connect to other `wrangler dev` processes running locally, with their connection status indicated by [connected] or [not connected]. For more details, refer to https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/#local-development
[WebServer] [1] [GATEWAY]
[WebServer] [1] [GATEWAY] [wrangler:inf] Ready on http://localhost:8081
[WebServer] [1] [GATEWAY] Using vars defined in .dev.vars
[WebServer] [1] [GATEWAY] - Services:
[WebServer] [1] [GATEWAY] - IMPRESSIONS: thanks-impressions [connected]
[WebServer] [1] [GATEWAY] - IMPRESSIONS-V2: thanks-impressions-v2 [connected]
[WebServer] [1] [GATEWAY] - PROMOTIONS: thanks-promotion-service [connected]
[WebServer] [1] [GATEWAY] - SUBSCRIPTIONS: thanks-subscription-service [connected]
[WebServer] [1] [GATEWAY] - NODE_SERVICES: thanks-node-services [connected]
[WebServer] [1] [GATEWAY] - TEMPLATE: thanks-template-service [connected]
[WebServer] [1] [GATEWAY] ⎔ Starting local server...
[WebServer] [1] [GATEWAY] [wrangler:inf] GET / 400 Bad Request (502ms)
/// ⬆️ Playwright probing URL and ⬇️ running tests
Running 27 tests using 1 worker
/// 😵 hangs
It does not matter which URL to try. It will handle the first request, but none other.
"hangs" mean that Playwright can establish a connection to get a resource (widget.js) but will never get anything.
v3.99.0
v3.99.0 and --log-level=debug
output is too verbose, but I found an interesting difference between adding debug for everybody or just gateway
[WebServer] [1] [GATEWAY] [InspectorProxyWorker] RUNTIME INCOMING MESSAGE { id: 100000003, method: 'Network.enable', result: {} }
[WebServer] [1] [SERVICES] [@theway/template-service] [InspectorProxyWorker] SEND TO RUNTIME {"method":"Runtime.getIsolateId","id":100000011}
[WebServer] [1] [SERVICES] [@theway/template-service] [InspectorProxyWorker] RUNTIME INCOMING MESSAGE { id: 100000011, result: { id: '7296e37d6767447e' } }
[WebServer] [1] [SERVICES] [@theway/impressions-service-v2] [InspectorProxyWorker] SEND TO RUNTIME {"method":"Runtime.getIsolateId","id":100000011}
[WebServer] [1] [SERVICES] [@theway/impressions-service-v2] [InspectorProxyWorker] RUNTIME INCOMING MESSAGE { id: 100000011, result: { id: '66815b1649719b4f' } }
/// ⬆️ Playwright probing URL and ⬇️ running tests
Running 27 tests using 1 worker
[WebServer] [1] [SERVICES] [@theway/node-service] [InspectorProxyWorker] SEND TO RUNTIME {"method":"Runtime.getIsolateId","id":100000011}
[WebServer] [1] [SERVICES] [@theway/node-service] [InspectorProxyWorker] RUNTIME INCOMING MESSAGE { id: 100000011, result: { id: '85b82413505896e2' } }
[WebServer] [1] [GATEWAY] @service:gateway@ : [get]/v1/widget.js : dev,US,IAD
⬆️⬆️ its working! ⬆️⬆️
[WebServer] [1] [GATEWAY] [InspectorProxyWorker] RUNTIME INCOMING MESSAGE { id: 100000003, method: 'Network.enable', result: {} }
/// ⬆️ Playwright probing URL and ⬇️ running tests
Running 27 tests using 1 worker
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/pause 204 No Content (1ms)
⬆️ What is that? ⬆️
[WebServer] [1] [GATEWAY] [InspectorProxyWorker] SEND TO RUNTIME {"method":"Runtime.getIsolateId","id":100000004}
[WebServer] [1] [GATEWAY] [InspectorProxyWorker] RUNTIME INCOMING MESSAGE { id: 100000004, result: { id: 'd413491f66c9cc25' } }
⬆️ "no reaction" ⬆️
[WebServer] [1] [GATEWAY] [InspectorProxyWorker] SEND TO RUNTIME {"method":"Runtime.getIsolateId","id":100000005}
⬆️ that is already another request ⬆️, there would be "no reaction" as well
Adding logs makes test just a little slower, so we are ok to processed with this solution, but I dont really understand what can be a problem here
My main consern here is a call to /cdn-cgi/ProxyWorker/pause
In a working scenario the last message related to ProxyWorker is play
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/play 204 No Content (14ms)
We had a long history with wrangler flakes in CI, so it runs quite "reliable"
- first run a "dummy" worker, waits for healthcheck
- later starts all leaf services, waits for healthcheck
- then starts the gateway, waits for healthcheck
Every service will eventually produce the /cdn-cgi/ProxyWorker/pause message, and it's somehow a matter of luck of the last message is /cdn-cgi/ProxyWorker/play
In the "good scenario" this is what I can see in the logs right after gateway start
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] Ready on http://localhost:8081
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/pause 204 No Content (6ms)
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/pause 204 No Content (1ms)
[WebServer] [1] [WebServer] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/pause 204 No Content (1ms)
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/pause 204 No Content (2ms)
[WebServer] [1] [SERVICES] [@theway/promotion-service] [InspectorProxyWorker] SEND TO RUNTIME {"method":"Runtime.getIsolateId","id":100000011}
[WebServer] [1] [SERVICES] [@theway/promotion-service] [InspectorProxyWorker] RUNTIME INCOMING MESSAGE { id: 100000011, result: { id: '1107c602f204561f' } }
[WebServer] [1] [GATEWAY] RemoteRuntimeController teardown beginning...
[WebServer] [1] [GATEWAY] RemoteRuntimeController teardown complete
[WebServer] [1] [GATEWAY] ⎔ Starting local server...
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/pause 204 No Content (1ms)
[WebServer] [1] [GATEWAY] [InspectorProxyWorker] handleProxyControllerIncomingMessage {"type":"reloadStart"}
[WebServer] [1] [GATEWAY] ✘ [ERROR] workerd/server/server.c++:4017: info: Inspector is listening
[WebServer] [1] [GATEWAY]
[WebServer] [1] [GATEWAY]
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/pause 204 No Content (0ms)
[WebServer] [1] [GATEWAY] [wrangler-ProxyWorker:inf] GET /cdn-cgi/ProxyWorker/play 204 No Content (14ms)
A log of pause requests, but the very last one is play.
Please provide any relevant error logs
No response
Which Cloudflare product(s) does this pertain to?
Wrangler
What versions are you using?
3.86.0 [Wrangler]
What operating system and version are you using?
Mac Sonoma 14.7.1
Please provide a link to a minimal reproduction
No response
Describe the Bug
We were trying to update
wranglerfor quite a while and here are some very weird problems we've experienced.mcr.microsoft.com/playwright:v1.48.1-focal)3.78.10--x-dev-env=false, or it would not work3.86.1any newer version dont work.--log-level=debugwhich somehow resolve the problem (tests are passing 9 out of 10)We run 7 workers in parallel, with the last one - "gateway" starting after all. Here are logs from the area of interest
v3.78.10
v3.86.0 and
--x-dev-env=false3.86.1 (without
-x-dev-env)It does not matter which URL to try. It will handle the first request, but none other.
"hangs" mean that Playwright can establish a connection to get a resource (
widget.js) but will never get anything.v3.99.0
v3.99.0 and
--log-level=debugoutput is too verbose, but I found an interesting difference between adding
debugfor everybody or just gatewayAdding logs makes test just a little slower, so we are ok to processed with this solution, but I dont really understand what can be a problem here
My main consern here is a call to
/cdn-cgi/ProxyWorker/pauseIn a working scenario the last message related to
ProxyWorkerisplayWe had a long history with wrangler flakes in CI, so it runs quite "reliable"
Every service will eventually produce the
/cdn-cgi/ProxyWorker/pausemessage, and it's somehow a matter of luck of the last message is/cdn-cgi/ProxyWorker/playIn the "good scenario" this is what I can see in the logs right after gateway start
A log of
pauserequests, but the very last one isplay.Please provide any relevant error logs
No response