API for monthly_active_users table#3633
Conversation
erikjohnston
left a comment
There was a problem hiding this comment.
Some nits, but I think this needs to be hooked up so that we write to it, otherwise its hard to see whether this is the right way of doing things.
But broadly looks good so far!
| * limitations under the License. | ||
| */ | ||
|
|
||
| -- a table of users who have requested that their details be erased |
| sql = """ | ||
| SELECT COALESCE(count(*), 0) FROM ( | ||
| SELECT user_id FROM monthly_active_users | ||
| ) u |
There was a problem hiding this comment.
Can't this just be SELECT COALESCE(count(*), 0) FROM monthly_active_users?
| ) | ||
|
|
||
| def clean_out_monthly_active_users(self): | ||
| pass |
|
|
||
| defer.returnValue( | ||
| result | ||
| ) |
There was a problem hiding this comment.
Who's jeff?
I'd probably write it/reformat to simply be:
defer.returnValue(bool(user_present))| "timestamp": int(self._clock.time_msec()), | ||
| }, | ||
| lock=False, | ||
| ) |
There was a problem hiding this comment.
Upserts without locks are dangerous as they're racey. Upserts with locks are bad and should be avoided.
I'd quite like to see how this is actually hooked up
There was a problem hiding this comment.
Have reworked nits, and left upserts until we figure out a better way to solve this
This reverts commit ec716a3.
…o neilj/mau_tracker
synapse/storage/client_ips.py
Outdated
| @defer.inlineCallbacks | ||
| def _populate_monthly_active_users(self, user_id): | ||
| store = self.hs.get_datastore() | ||
| print "entering _populate_monthly_active_users" |
There was a problem hiding this comment.
No prints. Please use logger if you want to have that
synapse/storage/client_ips.py
Outdated
| else: | ||
| count = yield store.get_monthly_active_count() | ||
| print "count is %d" % count | ||
| if count < self.hs.config.max_mau_value: |
There was a problem hiding this comment.
You probably want to return somehow if this condition is false.
|
monthly_active_users now fully backing log in and register tests. Especially keen to get comment on adding @defer.inlineCallbacks to insert_client_ip |
synapse/storage/client_ips.py
Outdated
| user_id(str): the user_id to query | ||
| """ | ||
|
|
||
| store = self.hs.get_datastore() |
There was a problem hiding this comment.
store is the same as self here
| Cleans out monthly active user table to ensure that no stale | ||
| entries exist. | ||
| Return: | ||
| Defered() |
There was a problem hiding this comment.
Deferred with two r's. Also, Returns with an s, and a newline before it. (This is so that IDEs can read it to get type hints)
| sql = """ | ||
| DELETE FROM monthly_active_users | ||
| ORDER BY timestamp desc | ||
| LIMIT -1 OFFSET ? |
There was a problem hiding this comment.
You don't need to include a LIMIT
There was a problem hiding this comment.
I don't need it for Postgres, but best I can tell I do need it for sqlite https://sqlite.org/lang_select.html
There was a problem hiding this comment.
Ugh, that's true. But it doesn't look like postgres supports negative limits. You can instead probably structure the query as:
DELETE FROM monthly_active_users
WHERE user_id NOT IN (
SELECT user_id FROM monthly_active_users
ORDER BY timestamp DESC
LIMIT ?
)There was a problem hiding this comment.
groans, I guess the table never gets too large
| # Have resolved to invalidate the whole cache for now and do | ||
| # something about it if and when the perf becomes significant | ||
| self.is_user_monthly_active.invalidate_all() | ||
| self.get_monthly_active_count.invalidate_all() |
There was a problem hiding this comment.
These invalidations should only happen after the transaction has committed, which can be done by yielding on the self.runInteraction
synapse/api/auth.py
Outdated
| such as monthly active user limiting or global disable flag | ||
| Args: | ||
| error (Error): The error that should be raised if user is to be | ||
| blocked |
There was a problem hiding this comment.
This seems odd. Either this should raise an AuthError and callers can catch AuthErrors and re-raise as a more appropriate exception, or it should just return a boolean.
| Generates current count of monthly active users.abs | ||
| Return: | ||
| Defered(int): Number of current monthly active users | ||
| """ |
There was a problem hiding this comment.
I know its a bit nitty, but consistency is quite important for readability:
"""Generates current count of monthly active users.abs
Returns:
Defered[int]: Number of current monthly active users
""""(Indentation, returns and square brackets for the type Deferred returns)
synapse/storage/client_ips.py
Outdated
| self._batch_row_update[key] = (user_agent, device_id, now) | ||
|
|
||
| @defer.inlineCallbacks | ||
| def _populate_monthly_active_users(self, user_id): |
There was a problem hiding this comment.
This feels like it should be in monthly_active_users.py
synapse/storage/client_ips.py
Outdated
| if self.hs.config.limit_usage_by_mau: | ||
| is_user_monthly_active = yield store.is_user_monthly_active(user_id) | ||
| if is_user_monthly_active: | ||
| yield store.upsert_monthly_active_user(user_id) |
There was a problem hiding this comment.
I would avoid having to do an update on every request. One way of avoiding this is to have the is_user_monthly_active return the timestamp and use that to only insert if the difference is greater than, say, a couple of hours. Since that DB function is cached we avoid doing queries most of the time.
| lock=False, | ||
| ) | ||
| self.is_user_monthly_active.invalidate((user_id,)) | ||
| self.get_monthly_active_count.invalidate(()) |
There was a problem hiding this comment.
We should only invalidate if we actually added a new user to the table (rather than just updating), helpfully _simple_upsert returns True on insert and False on update.
I think that's fine, so long as we don't actually perform a query in the common case. |
| ) | ||
| timestamp = None | ||
| if len(result) > 0: | ||
| timestamp = result[0] |
There was a problem hiding this comment.
There's a _simple_select_one_onecol ftr
| if count < self.hs.config.max_mau_value: | ||
| yield self.upsert_monthly_active_user(user_id) | ||
| elif now - last_seen_timestamp > LAST_SEEN_GRANULARITY: | ||
| yield self.upsert_monthly_active_user(user_id) |
There was a problem hiding this comment.
Can you quickly add a comment to say why we're not always upserting?
| -- a table of monthly active users, for use where blocking based on mau limits | ||
| CREATE TABLE monthly_active_users ( | ||
| user_id TEXT NOT NULL, | ||
| timestamp BIGINT NOT NULL |
There was a problem hiding this comment.
Can you add a quick comment saying what timestamp is? In particular, its probably worth calling out that it isn't necessarily accurate.
| # If MAU user count still exceeds the MAU threshold, then delete on | ||
| # a least recently active basis. | ||
| # Note it is not possible to write this query using OFFSET due to | ||
| # incompatibilities in how sqlite an postgres support the feature. |
|
|
||
| @cached(num_args=0) | ||
| def get_monthly_active_count(self): | ||
| """Generates current count of monthly active users.abs |
There was a problem hiding this comment.
ah, editor autocomplete
| user_id TEXT NOT NULL, | ||
| -- Last time we saw the user. Not guaranteed to be accurate due to rate limiting | ||
| -- on updates, Granularity of updates governed by | ||
| -- syanpse.storage.monthly_active_users.LAST_SEEN_GRANULARITY |
erikjohnston
left a comment
There was a problem hiding this comment.
Modulo spelling mistakes, thanks @krombel
Features -------- - Add support for the SNI extension to federation TLS connections. Thanks to @vojeroen! ([\#3439](#3439)) - Add /_media/r0/config ([\#3184](#3184)) - speed up /members API and add `at` and `membership` params as per MSC1227 ([\#3568](#3568)) - implement `summary` block in /sync response as per MSC688 ([\#3574](#3574)) - Add lazy-loading support to /messages as per MSC1227 ([\#3589](#3589)) - Add ability to limit number of monthly active users on the server ([\#3633](#3633)) - Support more federation endpoints on workers ([\#3653](#3653)) - Basic support for room versioning ([\#3654](#3654)) - Ability to disable client/server Synapse via conf toggle ([\#3655](#3655)) - Ability to whitelist specific threepids against monthly active user limiting ([\#3662](#3662)) - Add some metrics for the appservice and federation event sending loops ([\#3664](#3664)) - Where server is disabled, block ability for locked out users to read new messages ([\#3670](#3670)) - set admin uri via config, to be used in error messages where the user should contact the administrator ([\#3687](#3687)) - Synapse's presence functionality can now be disabled with the "use_presence" configuration option. ([\#3694](#3694)) - For resource limit blocked users, prevent writing into rooms ([\#3708](#3708)) Bugfixes -------- - Fix occasional glitches in the synapse_event_persisted_position metric ([\#3658](#3658)) - Fix bug on deleting 3pid when using identity servers that don't support unbind API ([\#3661](#3661)) - Make the tests pass on Twisted < 18.7.0 ([\#3676](#3676)) - Don’t ship recaptcha_ajax.js, use it directly from Google ([\#3677](#3677)) - Fixes test_reap_monthly_active_users so it passes under postgres ([\#3681](#3681)) - Fix mau blocking calulation bug on login ([\#3689](#3689)) - Fix missing yield in synapse.storage.monthly_active_users.initialise_reserved_users ([\#3692](#3692)) - Improve HTTP request logging to include all requests ([\#3700](#3700)) - Avoid timing out requests while we are streaming back the response ([\#3701](#3701)) - Support more federation endpoints on workers ([\#3705](#3705), [\#3713](#3713)) - Fix "Starting db txn 'get_all_updated_receipts' from sentinel context" warning ([\#3710](#3710)) - Fix bug where `state_cache` cache factor ignored environment variables ([\#3719](#3719)) - Fix bug in v0.33.3rc1 which caused infinite loops and OOMs ([\#3723](#3723)) - Fix bug introduced in v0.33.3rc1 which made the ToS give a 500 error ([\#3732](#3732)) Deprecations and Removals ------------------------- - The Shared-Secret registration method of the legacy v1/register REST endpoint has been removed. For a replacement, please see [the admin/register API documentation](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/register_api.rst). ([\#3703](#3703)) Internal Changes ---------------- - The test suite now can run under PostgreSQL. ([\#3423](#3423)) - Refactor HTTP replication endpoints to reduce code duplication ([\#3632](#3632)) - Tests now correctly execute on Python 3. ([\#3647](#3647)) - Sytests can now be run inside a Docker container. ([\#3660](#3660)) - Port over enough to Python 3 to allow the sytests to start. ([\#3668](#3668)) - Update docker base image from alpine 3.7 to 3.8. ([\#3669](#3669)) - Rename synapse.util.async to synapse.util.async_helpers to mitigate async becoming a keyword on Python 3.7. ([\#3678](#3678)) - Synapse's tests are now formatted with the black autoformatter. ([\#3679](#3679)) - Implemented a new testing base class to reduce test boilerplate. ([\#3684](#3684)) - Rename MAU prometheus metrics ([\#3690](#3690)) - add new error type ResourceLimit ([\#3707](#3707)) - Logcontexts for replication command handlers ([\#3709](#3709)) - Update admin register API documentation to reference a real user ID. ([\#3712](#3712))
| -- on updates, Granularity of updates governed by | ||
| -- synapse.storage.monthly_active_users.LAST_SEEN_GRANULARITY | ||
| -- Measured in ms since epoch. | ||
| timestamp BIGINT NOT NULL |
There was a problem hiding this comment.
I'm really surprised this is called timestamp, which is a reserved keyword in postgres and most SQL (it's a data type), as you can from the syntax highlighting. It also doesn't describe what the column is used for - surely it should be last_active or similar?
There was a problem hiding this comment.
(it's also a problem because you can't rename columns in sqlite, so we really want to get it right the first time).
| ); | ||
|
|
||
| CREATE UNIQUE INDEX monthly_active_users_users ON monthly_active_users(user_id); | ||
| CREATE INDEX monthly_active_users_time_stamp ON monthly_active_users(timestamp); |
There was a problem hiding this comment.
this would be monthly_active_users_timestamp to be consistent, as timestamp is one word
|
since we're in the business of finding problems in PRs after they have landed:
|
|
We usually create a new delta file after a release? Anyway, do we really re run schema files even if they've been applied before? But agreed should try and remember |
A new delta file, sure. A new directory? Well, I didn't get the memo, and I question the value of doing so vs the problems it presents for anyone trying to roll back.
we do not, but the update code does get upset if either (a) the database schema number is higher than it knows about or (b) it finds records of delta files from version X already having been applied when trying to upgrade from X-1 to X.
gahhhh |
Doesn't actually do anything other than ensure that table purges old values.
Will form basis for extending #3630