Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Ability to blacklist ip ranges for federation traffic#5043

Merged
richvdh merged 30 commits intodevelopfrom
anoa/blacklist_ip_ranges
May 13, 2019
Merged

Ability to blacklist ip ranges for federation traffic#5043
richvdh merged 30 commits intodevelopfrom
anoa/blacklist_ip_ranges

Conversation

@anoadragon453
Copy link
Member

@anoadragon453 anoadragon453 commented Apr 10, 2019

Closes #3953

This PR adds a new config option that functions similar to url_preview_ip_range_blacklist, but provides a blacklist for federation traffic. It does this by making use of our IPBlacklistResolver and BlacklistingAgentWrapper classes.

Sytest PR: matrix-org/sytest#607

@codecov
Copy link

codecov bot commented Apr 10, 2019

Codecov Report

Merging #5043 into develop will increase coverage by 0.01%.
The diff coverage is 86.36%.

@@             Coverage Diff             @@
##           develop    #5043      +/-   ##
===========================================
+ Coverage    61.75%   61.76%   +0.01%     
===========================================
  Files          336      336              
  Lines        34639    34658      +19     
  Branches      5692     5694       +2     
===========================================
+ Hits         21390    21407      +17     
- Misses       11721    11723       +2     
  Partials      1528     1528

@anoadragon453 anoadragon453 marked this pull request as ready for review April 10, 2019 11:31
@anoadragon453 anoadragon453 requested a review from a team April 10, 2019 11:32
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks sane. A few suggestions

# - syd.example.com

# Prevent federation requests from being sent to the following
# blacklist IP address CIDR ranges.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please could you explain what the default is (even if it's that nothing is blacklisted)?

# Prevent federation requests from being sent to the following
# blacklist IP address CIDR ranges.
#
#federation_ip_range_blacklist:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we should uncomment this so that we blacklist the RFC1918 addresses by default for new deployments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as it would be a security benefit for the vast majority of use cases, I would argue yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so would I. Please can you make it so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done so, but requires having sytest override it so that tests don't fail.

"Dropped %s from DNS resolution to %s" % (ip_address, hostname)
)
raise SynapseError(403, "IP address blocked by IP blacklist entry")
# Only raise a 403 if this request originated from a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to remember now. I remember I worked with @erikjohnston on this.

Something will passing tests perhaps?

Copy link
Member Author

@anoadragon453 anoadragon453 May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T'was actually due to special-casing for unit tests. We've now removed this, as the existing implementation didn't make sense anyways.

self.federation_domain_whitelist[domain] = True

self.federation_ip_range_blacklist = config.get(
"federation_ip_range_blacklist", None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make the default [] rather than None we can save ourselves a bunch of special-casing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so since this is a blacklist and an empty one is the same as not having one at all.

self.assertNoResult(test_d)

# Make sure treq is trying to connect
# Make sure the req is trying to connect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treq is correct

def test_client_ip_range_blacklist(self):
"""Ensure that Synapse does not try to connect to blacklisted IPs"""
# Set up the ip_range blacklist
from netaddr import IPSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import this at the top?

anoadragon453 and others added 5 commits April 30, 2019 15:01
* develop: (34 commits)
  Add a default .m.rule.tombstone push rule (#4867)
  Fix infinite loop in presence handler
  changelog
  more logging improvements
  remove extraneous exception logging
  Clarify logging when PDU signature checking fails
  Changelog
  Add --no-pep-517 to README instructions
  set PIP_USE_PEP517 = False for tests
  Fix handling of SYNAPSE_NO_TLS in docker image (#5005)
  Config option for verifying federation certificates (MSC 1711) (#4967)
  Remove log error for .well-known/matrix/client (#4972)
  Prevent "producer not unregistered" message (#5009)
  add gpg key fingerprint
  Don't crash on lack of expiry templates
  Update debian install docs for new key and repo (#5074)
  Add management endpoints for account validity
  Send out emails with links to extend an account's validity period
  Make sure we're not registering the same 3pid twice
  Newsfile
  ...
d = do_request()
self.pump()

# Nothing has happened yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remember to test that these deferreds eventually get resolved one way or the other, otherwise we'll end up with requests blocking forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not 100% sure on how to do this. yield'ing results in the function timing out (though maybe that's what we want to test?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield'ing results in the function timing out

yes. The thing to note here is that the test function itself is running synchronously: it starts at the beginning, and doesn't return until it is finished. Note that is different to a Deferred-returning function, which only does part of its work synchronously, and leaves the rest to be done later (at which point, the Deferred will resolve).

Now, the way that we test the asynchronous code from a synchronous function is something like:

  1. run a function that returns a deferred
  2. satisfy the condition that the function is waiting for (eg: pretend that some time has passed; connect a TCP socket; spoof some results from the database).
  • Once we satisfy the awaited condition, the second half of the function should complete, and hence resolve its deferred.
  1. Check that the deferred has now resolved with the expected result.

Often, all that is needed is to "pretend some time has passed". We do this by using a fake Reactor where we can manipulate its idea of the current time. "Pretending time has passed" is then called "advancing the reactor", and is what self.pump does.

[It may be helpful to note that this means that there are actually two reactors in play, since the test runner (trial) is a Twisted application which runs on a regular reactor: so we end up with the main reactor which is running trial and the test reactor which is running the code under test. But in practice, as long as your test functions are synchronous, you can forget about the trial reactor.]

So, long story short: how do we test that the deferreds eventually get resolved? well, somewhere in the test we ought to be able to call self.successResultOf(d) or self.failureResultOf(d, cls) [iirc]. If that assertion fails, it means that the deferred isn't completing, and that is a bad thing!

@anoadragon453 anoadragon453 requested a review from richvdh May 4, 2019 01:01
richvdh
richvdh previously requested changes May 8, 2019
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is taking shape!

One thing I would say is that I think this PR is starting to sprawl a bit, and I would suggest splitting out the changes to the existing blacklisting code so that we can land them independently of the federation blacklist.

):
logger.info(
"Blocking access to %s because of blacklist" % (ip_address,)
"Blocking access to %s because of blacklist. Returning 0 results" %
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true to say we are returning 0 results here? It certainly doesn't match my way of thinking about it.

(ip_address,)
)
e = SynapseError(403, "IP address blocked by IP blacklist entry")
e = SynapseError(404, "No results found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is changing to keep it in sync with https://github.com/matrix-org/synapse/pull/5043/files#diff-3c2efbd259756c1c1dc5dbdf467b5570R341, for which see my comments there, however:

On the one hand I see the value of keeping the behaviour consistent between literal IPs, and hostnames. On the other, since we have the ability to give a better response like a 403 here, I'm inclined to do so.

logger.warn("Error downloading %s: %r", url, e)

if isinstance(e, DNSLookupError):
# DNS lookup returned no results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are other reasons that we could get a DNSLookupError

# FIXME: pass through 404s and other error messages nicely
logger.warn("Error downloading %s: %r", url, e)

if isinstance(e, DNSLookupError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this would be better handled with a separate except DNSLookupError as e clause.

# Note: This will also be the case if the found IP address
# is blacklisted
raise SynapseError(
404, "No results found", Codes.UNKNOWN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me that this is the correct thing to return. "No results found" is a pretty useless error, and why a 404?

https://matrix.org/docs/spec/client_server/unstable.html#get-matrix-media-r0-preview-url is annoying unhelpful about how to handle this situation, which would be a nice thing to rectify, but I think a 502 is the normal way for a proxy-like thing (which we are) to say "I couldn't connect to the the thing I'm supposed to be proxying".

tls_client_options_factory,
)

# Prevent direct connections to blacklisted IP addresses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find "direct connections" very clear here. How about:

"Use a BlacklistingAgentWrapper to prevent circumventing the IP blacklist via IP literals in server names"

d = do_request()
self.pump()

# Nothing has happened yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield'ing results in the function timing out

yes. The thing to note here is that the test function itself is running synchronously: it starts at the beginning, and doesn't return until it is finished. Note that is different to a Deferred-returning function, which only does part of its work synchronously, and leaves the rest to be done later (at which point, the Deferred will resolve).

Now, the way that we test the asynchronous code from a synchronous function is something like:

  1. run a function that returns a deferred
  2. satisfy the condition that the function is waiting for (eg: pretend that some time has passed; connect a TCP socket; spoof some results from the database).
  • Once we satisfy the awaited condition, the second half of the function should complete, and hence resolve its deferred.
  1. Check that the deferred has now resolved with the expected result.

Often, all that is needed is to "pretend some time has passed". We do this by using a fake Reactor where we can manipulate its idea of the current time. "Pretending time has passed" is then called "advancing the reactor", and is what self.pump does.

[It may be helpful to note that this means that there are actually two reactors in play, since the test runner (trial) is a Twisted application which runs on a regular reactor: so we end up with the main reactor which is running trial and the test reactor which is running the code under test. But in practice, as long as your test functions are synchronous, you can forget about the trial reactor.]

So, long story short: how do we test that the deferreds eventually get resolved? well, somewhere in the test we ought to be able to call self.successResultOf(d) or self.failureResultOf(d, cls) [iirc]. If that assertion fails, it means that the deferred isn't completing, and that is a bad thing!

richvdh
richvdh previously requested changes May 9, 2019
for domain in federation_domain_whitelist:
self.federation_domain_whitelist[domain] = True

self.federation_ip_range_blacklist = config.get(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we ought to follow the example of #5134 and include 0.0.0.0 and ::, whether they were explicitly listed or not.

def resolveHostName(self, recv, hostname, portNumber=0):

r = recv()
d = defer.Deferred()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you merge #5155 so that we can get this stuff out of the diff?

# Nothing happened yet
self.assertNoResult(d)

self.pump(120)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm, having to wait 120 seconds for this doesn't sound right. Why does it take so long?

Copy link
Member Author

@anoadragon453 anoadragon453 May 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it doesn't. I've just selected that value arbitrarily. Have decreased.

* develop: (45 commits)
  URL preview blacklisting fixes (#5155)
  Revert 085ae34
  Add a DUMMY stage to captcha-only registration flow
  Make Prometheus snippet less confusing on the metrics collection doc (#4288)
  Set syslog identifiers in systemd units (#5023)
  Run Black on the tests again (#5170)
  Add AllowEncodedSlashes to apache (#5068)
  remove instructions for jessie installation (#5164)
  Run `black` on per_destination_queue
  Limit the number of EDUs in transactions to 100 as expected by receiver (#5138)
  Fix bogus imports in tests (#5154)
  add options to require an access_token to GET /profile and /publicRooms on CS API (#5083)
  Do checks on aliases for incoming m.room.aliases events (#5128)
  Remove the requirement to authenticate for /admin/server_version. (#5122)
  Fix spelling in server notices admin API docs (#5142)
  Fix sample config
  0.99.3.2
  include disco in deb build target list
  changelog
  Debian: we now need libpq-dev.
  ...
@anoadragon453 anoadragon453 requested a review from richvdh May 10, 2019 18:30
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could really have done with being outside the try/catch, but nm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants