Skip to content

#12093 Include client port in WSGI request environment#12096

Merged
adiroiban merged 3 commits intotwisted:trunkfrom
thinkst-daniel:12093-include-src-port-wsgi-response
Mar 20, 2024
Merged

#12093 Include client port in WSGI request environment#12096
adiroiban merged 3 commits intotwisted:trunkfrom
thinkst-daniel:12093-include-src-port-wsgi-response

Conversation

@thinkst-daniel
Copy link
Contributor

Scope and purpose

Fixes #12093

Contributor Checklist:

This process applies to all pull requests - no matter how small.
Have a look at our developer documentation before submitting your Pull Request.

Below is a non-exhaustive list (as a reminder):

  • The title of the PR should describe the changes and starts with the associated issue number, like “WSGI applications run inside Twisted Web don’t get access to the client port #12093 Include client port in WSGI response”.
  • A release notes news fragment file was create in src/twisted/newsfragments/ (see: Release notes fragments docs.)
  • The automated tests were updated.
  • Once all checks are green, request a review by leaving a comment that contains exactly the string please review.
    Our bot will trigger the review process, by applying the pending review label
    and requesting a review from the Twisted dev team.

@thinkst-daniel thinkst-daniel force-pushed the 12093-include-src-port-wsgi-response branch from 22f0d6c to 4876659 Compare January 28, 2024 20:43
@adiroiban
Copy link
Member

Thanks Daniel for the PR.

Is this ready for review. No hurry. I just wanted to check the status for this.

Cheers

@thinkst-daniel
Copy link
Contributor Author

Thanks for checking @adiroiban, yes it is.

Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

In order to merge this PR, it needs updated tests

There is this exiting test for REMOTE_ADDR

def test_remoteAddr(self):
"""
The C{'REMOTE_ADDR'} key of the C{environ} C{dict} passed to the
application contains the address of the client making the request.
"""
d = self.render("GET", "1.1", [], [""])
d.addCallback(self.environKeyEqual("REMOTE_ADDR", "192.168.1.1"))

Can you use it as the foundation for writing a new test for the change from this PR?

Let me know if you need help with the test.

Regards

@thinkst-daniel thinkst-daniel force-pushed the 12093-include-src-port-wsgi-response branch from 4876659 to fcb2ebb Compare February 1, 2024 18:37
@thinkst-daniel
Copy link
Contributor Author

Thanks @adiroiban, I've added a new test for the remote port case (that matches the remote addr case). Let me know if the test is what you had in mind?

Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Only minor comments.
This is close the being ready to merge.

application contains the port of the client making the request.
"""
d = self.render("GET", "1.1", [], [""])
d.addCallback(self.environKeyEqual("REMOTE_PORT", "12344"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bit hard to read and understand how we endedup with 12344.

I prefer an explicit test arrangement like in the IPv6 test.

but I think it's ok. it's not worse than what we have in trunk.

this is a minor comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adiroiban, I noticed that the DummyChannel class in requesthelper.py set the port to 12344 by default, but I think being more explicit as you've suggested is always better. I've updated it.

Should I update test_remoteAddr() to do the same as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think it would help to also update test_remoteAddr.

Let me know once this PR is ready for re-review.

Thanks again for the update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adiroiban, I've also applied the same update to test_remoteAddr

self.environ = {
"REQUEST_METHOD": _wsgiString(request.method),
"REMOTE_ADDR": _wsgiString(request.getClientAddress().host),
"REMOTE_PORT": _wsgiString(str(request.getClientAddress().port)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment. have something like remotePeer = request.getClientAddress() so that we call it once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adiroiban, are you suggesting we add that in addition? i.e. still keep the existing REMOTE_PORT and REMOTE_ADDR and then also add:

"REMOTE_PEER": _wsgiString(request.getClientAddress()),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was something like this


peer = request.getClientAddress()
self.environ = {
    "REQUEST_METHOD": _wsgiString(request.method),
    "REMOTE_ADDR": _wsgiString(peer.host),
    "REMOTE_PORT": _wsgiString(str(peer.port)),
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks for clarifying! I've made the update.

@@ -0,0 +1 @@
Include client port in WSGI response No newline at end of file
Copy link
Member

@adiroiban adiroiban Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a request or a response ?

Suggested change
Include client port in WSGI response
twisted.web.wsgi request environment now contains the peer port number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in "containes" and "remote peer" is redundant. 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated :) thanks for the check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, thanks!

@exarkun
Copy link
Member

exarkun commented Feb 1, 2024

What spec does this change conform to? A link to the relevant section might be nice somewhere in here.

@adiroiban
Copy link
Member

As far as I can tell, this is not a standard environment variable https://github.com/joewalnes/websocketd/wiki/Environment-variables#remote_port

There is REMOTE_HOST in the original CGI RFC https://datatracker.ietf.org/doc/html/rfc3875#section-4.1.9

But I found this variable in the IIS implementatin https://learn.microsoft.com/en-us/iis/web-dev-reference/server-variables

The port is not available in Apache HTTPD https://httpd.apache.org/docs/current/mod/mod_cgi.html

But in modern days, I think that it helps to also have the port number ... at least for debug/audit/log

@glyph
Copy link
Member

glyph commented Feb 1, 2024

There is REMOTE_HOST in the original CGI RFC https://datatracker.ietf.org/doc/html/rfc3875#section-4.1.9

PEP 3333 says

The environ parameter is a dictionary object, containing CGI-style environment variables.

therefore it inherits the CGI spec'd version of this variable.

@thinkst-daniel thinkst-daniel force-pushed the 12093-include-src-port-wsgi-response branch from fcb2ebb to 4d566c5 Compare February 7, 2024 11:43
@exarkun
Copy link
Member

exarkun commented Feb 7, 2024

The environ parameter is a dictionary object, containing CGI-style environment variables.

therefore it inherits the CGI spec'd version of this variable.

Just checking - there is no CGI spec'd version of this variable, are we on the same page?

@adiroiban
Copy link
Member

Just checking - there is no CGI spec'd version of this variable, are we on the same page?

That is corect.
I can't find the specification for this variable.

The RFC is only for the REMOTE_HOST .

The REMOTE_PORT is not documented.

I can see the draft for the CGI started on 1998 and was finalized in 2004.
I guess that the source IP is not really that important for the majority of use cases.

Since we already expose the peer's host, I think it's safe to also expose the port.
And it can be important information to help with logging / audit / security checks.

yes... it will make the implementation non-portable.


The following variables are standard / part of the RFC for the client-side part of the socket/connection:

  • REMOTE_ADDR
  • REMOTE_HOST
  • REMOTE_IDENT
  • REMOTE_USER

For the server-side socket we have:

  • SERVER_NAME
  • SERVER_PORT
  • SERVER_PROTOCOL

@glyph
Copy link
Member

glyph commented Feb 7, 2024

Just checking - there is no CGI spec'd version of this variable, are we on the same page?

Sorry, I got confused between REMOTE_HOST and REMOTE_PEER.

@glyph glyph changed the title #12093 Include client port in WSGI response #12093 Include client port in WSGI request environment Feb 7, 2024
@glyph
Copy link
Member

glyph commented Feb 7, 2024

OK so here are some references I found for REMOTE_PORT.

This book on CGI: https://www.cgi101.com/book/ch3/text.html

"Abyss Web Server For Windows" https://aprelium.com/data/doc/2/abyssws-win-doc-html/cgivars.html

This experimentally-determined environment dump from an Apache 2.4 httpd: https://stackoverflow.com/a/25521262/13564

Which is documented officially as a "variable" you can use in an "expression" https://httpd.apache.org/docs/2.4/expr.html#vars even if this isn't technically CGI

IBM's web server: https://www.ibm.com/docs/en/i/7.2?topic=information-environment-variables

the US Naval Academy's published course on CGI: https://www.usna.edu/Users/cs/adina/teaching/it350/fall2020/lectures/set06-cgi.html

Cloudflare explicitly not supporting it: https://community.cloudflare.com/t/no-port-via-server-remote-port/37682/4

Apparently nginx's set of CGI or FastCGI variables are just specified in your own config, however you like? https://www.nginx.com/resources/wiki/start/topics/examples/phpfcgi/ but fastcgi_param REMOTE_PORT $remote_port; is frequently recommended.

the ASGI specification references it as a part of WSGI compatibility https://asgi.readthedocs.io/en/latest/specs/www.html#wsgi-compatibility

So this seems like while it is not technically standardized anywhere, it is ad-hoc copy/pasted around frequently, and everyone seems to assume that it's been inherited from CGI, because some large majority of CGI environments do in fact set this variable, although probably because of a config file that gets distributed with default installs of popular web servers or tutorials for the same.

@adiroiban
Copy link
Member

Does it mean that we can implement it as defacto variable, without having to add any warning extra when someone will use it ? Warning in the sense, that they are using a non-standard variable and that their application might break if they try to migrate to Cloudfare... for example.

I am not very familiar with CGI/WSGI. I know what they are expected to do, but I have never created a CGI application... other that the homeworks for CS classes (a very long time ago) :)

@glyph
Copy link
Member

glyph commented Feb 8, 2024

Does it mean that we can implement it as defacto variable, without having to add any warning extra when someone will use it ? Warning in the sense, that they are using a non-standard variable and that their application might break if they try to migrate to Cloudfare... for example.

Yeah, I think we can just document it. People seem to mostly expect it, and they can deal with Cloudflare when they use Cloudflare (although don't use Cloudflare, use Fastly…)

@thinkst-daniel thinkst-daniel force-pushed the 12093-include-src-port-wsgi-response branch from b04eed3 to 7db900d Compare February 14, 2024 15:29
@thinkst-daniel
Copy link
Contributor Author

thinkst-daniel commented Feb 14, 2024

Does it mean that we can implement it as defacto variable, without having to add any warning extra when someone will use it ? Warning in the sense, that they are using a non-standard variable and that their application might break if they try to migrate to Cloudfare... for example.

Yeah, I think we can just document it. People seem to mostly expect it, and they can deal with Cloudflare when they use Cloudflare (although don't use Cloudflare, use Fastly…)

Thanks for all extra info and references @glyph. As you've pointed out, it is possible get REMOTE_PORT by specifying it in an NGINX config too. So we would just like to do the same from the Twisted side.

REMOTE_PORT is helpful in cases where requests are coming from behind a PAT (Port Address Translation).

As for the documentation changes, is that something I can do during this PR?

@thinkst-daniel
Copy link
Contributor Author

Hey @adiroiban, I just want to double-check if there is any other changes or updates I should do here? Thanks!

Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

The missing part requested by Glyph

Yeah, I think we can just document it.

As this is a non-standard variable, I think it would help to have it documented in the docstring.

Maybe here ?

class WSGIResource:
"""
An L{IResource} implementation which delegates responsibility for all
resources hierarchically inferior to it to a WSGI application.
@ivar _reactor: An L{IReactorThreads} provider which will be passed on to
L{_WSGIResponse} to schedule calls in the I/O thread.
@ivar _threadpool: A L{ThreadPool} which will be passed on to
L{_WSGIResponse} to run the WSGI application object.
@ivar _application: The WSGI application object.
"""

Other than that, I think that we can merge this.

Thanks!

@@ -0,0 +1 @@
twisted.web.wsgi request environment now contains the peer port number. No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no RFC for REMOTE_PORT , I think it's important to specify the name

Suggested change
twisted.web.wsgi request environment now contains the peer port number.
twisted.web.wsgi request environment now contains the peer port number as `REMOTE_PORT`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@thinkst-daniel thinkst-daniel force-pushed the 12093-include-src-port-wsgi-response branch from b9f1f06 to cdb2cc8 Compare March 15, 2024 11:51
@thinkst-daniel
Copy link
Contributor Author

Changes look good.

The missing part requested by Glyph

Yeah, I think we can just document it.

As this is a non-standard variable, I think it would help to have it documented in the docstring.

Maybe here ?

class WSGIResource:
"""
An L{IResource} implementation which delegates responsibility for all
resources hierarchically inferior to it to a WSGI application.
@ivar _reactor: An L{IReactorThreads} provider which will be passed on to
L{_WSGIResponse} to schedule calls in the I/O thread.
@ivar _threadpool: A L{ThreadPool} which will be passed on to
L{_WSGIResponse} to run the WSGI application object.
@ivar _application: The WSGI application object.
"""

Other than that, I think that we can merge this.

Thanks!

Thanks @adiroiban let me know if that doc was what you had in mind.

Comment on lines +540 to +542
Note that the WSGI response environ includes a non-standard variable
REMOTE_PORT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard for me to review this as I have never worked with WSGI app

I am reading some docs from https://docs.twisted.org/en/stable/web/howto/web-in-60/wsgi.html

Based on that, maybe we can have it like this.

I think that Note that is kind of redendand for API docs.
All the API doc content is a note that thing :)

Suggested change
Note that the WSGI response environ includes a non-standard variable
REMOTE_PORT.
The C{environ} argument passed to the application, includes the C{REMOTE_PORT} key to complement the C{REMOTE_ADDR} key.

@adiroiban
Copy link
Member

Thanks for the update. I left a comment, but I think that also your version is ok.

Please review and let me know what you think.

We can merge this, even with your version.

I think that we all agree that the variable should be available to the applications.

If documentation is not right, this can be updated in a follow up PR.

Thanks again and sorry for the delay.


A final rant :)

WSGI is not my jam and I try to avoind using threads.
I am not sure why we have thread based web app development in Twisted.
I am using twisted so that I don't have to use threads.

@chevah-robot chevah-robot requested a review from a team March 16, 2024 19:06
@thinkst-daniel thinkst-daniel force-pushed the 12093-include-src-port-wsgi-response branch from 9538ce2 to bd615d6 Compare March 20, 2024 07:11
@thinkst-daniel
Copy link
Contributor Author

Thanks for the review, @adiroiban. I updated the documentation in any case as suggested, with your wording :) Thank you!

@adiroiban adiroiban merged commit c709d3f into twisted:trunk Mar 20, 2024
@adiroiban
Copy link
Member

Thanks for your help with this patch. Happy to see this merged.

@thinkst-daniel. Are you working for Thinkst? we are trying to (re)start a sponsorship program for Twisted.
The financial host for the Twisted project was recently switched from SFC (which was a disaster) to Python Foundation.
I know that the Thinkst company used to be a sponsor of Twisted.
After the switch , it looks like the sponsorship contact from Thinks was lost.
Can you check with the internal team at Thinks to see if you are still a sponsor for Twisted?

Thanks

@thinkst-daniel
Copy link
Contributor Author

Thanks for your help with this patch. Happy to see this merged.

@thinkst-daniel. Are you working for Thinkst? we are trying to (re)start a sponsorship program for Twisted. The financial host for the Twisted project was recently switched from SFC (which was a disaster) to Python Foundation. I know that the Thinkst company used to be a sponsor of Twisted. After the switch , it looks like the sponsorship contact from Thinks was lost. Can you check with the internal team at Thinks to see if you are still a sponsor for Twisted?

Thanks

Thanks for the help @adiroiban!

Yes, we’re still keen. Who should we reach out to?

@adiroiban
Copy link
Member

Hi

The general Python Software Foundation donation email is psf-donations@python.org our contact at PSF is Phyllis Dobbs.

You can also CC me adiroiban@gmail.com and I can followup up.
I hope PSF will respond and they will be able to help with the sponsorhip.

Thanks again!

@thinkst-daniel
Copy link
Contributor Author

Thanks @adiroiban, we'll reach out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WSGI applications run inside Twisted Web don’t get access to the client port

5 participants