synapse

mirror of https://github.com/element-hq/synapse.git synced 2025-12-17 00:01:18 -05:00

Author	SHA1	Message	Date
Erik Johnston	1bddd25a85	Port `Clock` functions to use `Duration` class (#19229 ) This changes the arguments in clock functions to be `Duration` and converts call sites and constants into `Duration`. There are still some more functions around that should be converted (e.g. `timeout_deferred`), but we leave that to another PR. We also changes `.as_secs()` to return a float, as the rounding broke things subtly. The only reason to keep it (its the same as `timedelta.total_seconds()`) is for symmetry with `as_millis()`. Follows on from https://github.com/element-hq/synapse/pull/19223	2025-12-01 13:55:06 +00:00
Devon Hudson	396de6544a	Cleanly shutdown SynapseHomeServer object (#18828 ) This PR aims to allow for a clean shutdown of the `SynapseHomeServer` object so that it can be fully deleted and cleaned up by garbage collection without shutting down the entire python process. Fix https://github.com/element-hq/synapse-small-hosts/issues/50 ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Eric Eastwood <erice@element.io>	2025-10-01 02:42:09 +00:00
Eric Eastwood	5143f93dc9	Fix `server_name` in logging context for multiple Synapse instances in one process (#18868 ) ### Background As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process. "Per-tenant logging" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/48 ### Prior art Previously, we exposed `server_name` by providing a static logging `MetadataFilter` that injected the values: `205d9e4fc4/synapse/config/logger.py (L216)` While this can work fine for the normal case of one Synapse instance per Python process, this configures things globally and isn't compatible when we try to start multiple Synapse instances because each subsequent tenant will overwrite the previous tenant. ### What does this PR do? We remove the `MetadataFilter` and replace it by tracking the `server_name` in the `LoggingContext` and expose it with our existing [`LoggingContextFilter`](`205d9e4fc4/synapse/logging/context.py (L584-L622)`) that we already use to expose information about the `request`. This means that the `server_name` value follows wherever we log as expected even when we have multiple Synapse instances running in the same process. ### A note on logcontext Anywhere, Synapse mistakenly uses the `sentinel` logcontext to log something, we won't know which server sent the log. We've been fixing up `sentinel` logcontext usage as tracked by https://github.com/element-hq/synapse/issues/18905 Any further `sentinel` logcontext usage we find in the future can be fixed piecemeal as normal. `d2a966f922/docs/log_contexts.md (L71-L81)` ### Testing strategy 1. Adjust your logging config to include `%(server_name)s` in the format ```yaml formatters: precise: format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s' ``` 1. Start Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make some requests (`curl http://localhost:8008/_matrix/client/versions`, etc) 1. Open the homeserver logs and notice the `server_name` in the logs as expected. `unknown_server_from_sentinel_context` is expected for the `sentinel` logcontext (things outside of Synapse).	2025-09-26 17:10:48 -05:00
Eric Eastwood	5266e423e2	Explain how Deferred callbacks interact with logcontexts (#18914 ) Spawning from https://github.com/matrix-org/synapse/pull/12588#discussion_r865843321 > It turns out `Deferred.cancel()` is a lot like `Deferred.callback()`/`errback()` in that it will trash the logging context: > it can resume a coroutine, which will restore its own logging context, then run: > > - until it blocks, setting the sentinel context > - or until it terminates, setting the context it was started with > > So we need to wrap it in `with PreserveLoggingContext():`, like we do with `.callback()`: > > ```python > with PreserveLoggingContext(): > self.render_deferred.cancel() > ``` > > -- @squahtx, https://github.com/matrix-org/synapse/pull/12588#discussion_r865843321	2025-09-24 16:20:42 -05:00
Eric Eastwood	0458f691b6	Fix `run_coroutine_in_background(...)` incorrectly handling logcontext (#18964 ) Regressed in https://github.com/element-hq/synapse/pull/18900#discussion_r2331554278 (see conversation there for more context) ### How is this a regression? > To give this an update with more hindsight; this logic was redundant with the early return and it is safe to remove this complexity ✅ > > It seems like this actually has to do with completed vs incomplete deferreds... > > To explain how things previously worked without the early-return shortcut: > > With the normal case of incomplete awaitable, we store the `calling_context` and the `f` function is called and runs until it yields to the reactor. Because `f` follows the logcontext rules, it sets the `sentinel` logcontext. Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is `sentinel`) and return. When the deferred completes, we restore `ctx` (which is `sentinel`) before yielding to the reactor again (all good ✅) > > With the other case where we see a completed awaitable, we store the `calling_context` and the `f` function is called and runs to completion (no logcontext change). This is where the shortcut would kick in but I'm going to continue explaining as if we commented out the shortcut. -- Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is same as the `calling_context`). Because the deferred is already completed, our extra callback is called immediately and we restore `ctx` (which is same as the `calling_context`). Since we never yield to the reactor, the `calling_context` is perfect as that's what we want again (all good ✅) > > --- > > But this also means that our early-return shortcut is no longer just an optimization and is necessary to act correctly in the completed awaitable case as we want to return with the `calling_context` and not reset to the `sentinel` context. I've updated the comment in https://github.com/element-hq/synapse/pull/18964 to explain the necessity as it's currently just described as an optimization. > > But because we made the same change to `run_coroutine_in_background(...)` which didn't have the same early-return shortcut, we regressed the correct behavior ❌ . This is being fixed in https://github.com/element-hq/synapse/pull/18964 > > > -- @MadLittleMods, https://github.com/element-hq/synapse/pull/18900#discussion_r2373582917 ### How did we find this problem? Spawning from @wrjlewis [seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io) `Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost` in the logs: <details> <summary>More logs</summary> ``` synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context. 2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost 2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response. 2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis connected. 2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/ ``` </details> From the logs there, we can see things relating to `typing.get_new_events` and `/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led me to trying out Sliding Sync with the typing extension enabled and allowed me to reproduce the problem locally. Sliding Sync is a unique scenario as it's the only place we use `gather_optional_coroutines(...)` -> `run_coroutine_in_background(...)` (introduced in https://github.com/element-hq/synapse/pull/17884) to exhibit this behavior. ### Testing strategy 1. Configure Synapse to enable [MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186): Simplified Sliding Sync which is actually under [MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575) ```yaml experimental_features: msc3575_enabled: true ``` 1. Start synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make a Sliding Sync request with one of the extensions enabled ```http POST http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync { "lists": {}, "room_subscriptions": { "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": { "required_state": [], "timeline_limit": 1 } }, "extensions": { "typing": { "enabled": true } } } ``` 1. Open your homeserver logs and notice warnings about `Starting ... from sentinel context: metrics will be lost`	2025-09-24 15:24:47 +00:00
Eric Eastwood	e7d98d3429	Remove `sentinel` logcontext in `Clock` utilities (`looping_call`, `looping_call_now`, `call_later`) (#18907 ) Part of https://github.com/element-hq/synapse/issues/18905 Lints for ensuring we use `Clock.call_later` instead of `reactor.callLater`, etc are coming in https://github.com/element-hq/synapse/pull/18944 ### Testing strategy 1. Configure Synapse to log at the `DEBUG` level 1. Start Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Wait 10 seconds for the [database profiling loop](`9cc4001778/synapse/storage/database.py (L711)`) to execute 1. Notice the logcontext being used for the `Total database time` log line Before (`sentinel`): ``` 2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - sentinel - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%} ``` After (`looping_call`): ``` 2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - looping_call - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%} ```	2025-09-22 14:51:13 -05:00
Eric Eastwood	5a9ca1e3d9	Introduce `Clock.call_when_running(...)` to include logcontext by default (#18944 ) Introduce `Clock.call_when_running(...)` to wrap startup code in a logcontext, ensuring we can identify which server generated the logs. Background: > Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` > logcontext as we want to know which server the logs came from. In practice, this is not > always the case yet especially outside of request handling. > > Global things outside of Synapse (e.g. Twisted reactor code) should run in the > `sentinel` logcontext. It's only when it calls into application code that a logcontext > gets activated. This means the reactor should be started in the `sentinel` logcontext, > and any time an awaitable yields control back to the reactor, it should reset the > logcontext to be the `sentinel` logcontext. This is important to avoid leaking the > current logcontext to the reactor (which would then get picked up and associated with > the next thing the reactor does). > > *-- `docs/log_contexts.md` Also adds a lint to prefer `Clock.call_when_running(...)` over `reactor.callWhenRunning(...)` Part of https://github.com/element-hq/synapse/issues/18905	2025-09-22 10:27:59 -05:00
Erik Johnston	20615115fb	Make `.sleep(..)` return a coroutine (#18772 ) This helps ensure that mypy can catch places where we don't await on it, like in #18763. --------- Co-authored-by: Eric Eastwood <erice@element.io>	2025-08-05 09:30:52 +01:00
Erik Johnston	23740eaa3d	Correctly mention previous copyright (#16820 ) During the migration the automated script to update the copyright headers accidentally got rid of some of the existing copyright lines. Reinstate them.	2024-01-23 11:26:48 +00:00
Patrick Cloke	8e1e62c9e0	Update license headers	2023-11-21 15:29:58 -05:00
Patrick Cloke	acea4d7a2f	Add missing types to tests.util. (#14597 ) Removes files under tests.util from the ignored by list, then fully types all tests/util/*.py files.	2022-12-02 17:58:56 +00:00
Patrick Cloke	02d708568b	Replace assertEquals and friends with non-deprecated versions. (#12092 )	2022-02-28 07:12:29 -05:00
Sean Quah	0147b3de20	Add missing type hints to `synapse.logging.context` (#11556 )	2021-12-14 17:35:28 +00:00
Patrick Cloke	48d44ab142	Record more information into structured logs. (#9654 ) Records additional request information into the structured logs, e.g. the requester, IP address, etc.	2021-04-08 08:01:14 -04:00
Patrick Cloke	38e1fac886	Fix some spelling mistakes / typos. (#7811 )	2020-07-09 09:52:58 -04:00
Richard van der Hoff	39230d2171	Clean up some LoggingContext stuff (#7120 ) * Pull Sentinel out of LoggingContext ... and drop a few unnecessary references to it * Factor out LoggingContext.current_context move `current_context` and `set_context` out to top-level functions. Mostly this means that I can more easily trace what's actually referring to LoggingContext, but I think it's generally neater. * move copy-to-parent into `stop` this really just makes `start` and `stop` more symetric. It also means that it behaves correctly if you manually `set_log_context` rather than using the context manager. * Replace `LoggingContext.alive` with `finished` Turn `alive` into `finished` and make it a bit better defined.	2020-03-24 14:45:33 +00:00
Erik Johnston	9a2223d4c8	Fix make_deferred_yieldable to work with coroutines	2019-12-10 11:22:12 +00:00
Amber Brown	463b072b12	Move logging utilities out of the side drawer of util/ and into logging/ (#5606 )	2019-07-04 00:07:04 +10:00
Amber Brown	0ee9076ffe	Fix media repo breaking (#5593 )	2019-07-02 19:01:28 +01:00
Richard van der Hoff	4a15a3e4d5	Include eventid in log lines when processing incoming federation transactions (#3959 ) when processing incoming transactions, it can be hard to see what's going on, because we process a bunch of stuff in parallel, and because we may end up recursively working our way through a chain of three or four events. This commit creates a way to use logcontexts to add the relevant event ids to the log lines.	2018-09-27 11:25:34 +01:00
black	8b3d9b6b19	Run black.	2018-08-10 23:54:09 +10:00
Amber Brown	49af402019	run isort	2018-07-09 16:09:20 +10:00
Amber Brown	77ac14b960	Pass around the reactor explicitly (#3385 )	2018-06-22 09:37:10 +01:00
Richard van der Hoff	11607006d9	Remove spurious unittest.DEBUG	2018-05-02 15:48:47 +01:00
Richard van der Hoff	f22e7cda2c	Fix a class of logcontext leaks So, it turns out that if you have a first `Deferred` `D1`, you can add a callback which returns another `Deferred` `D2`, and `D2` must then complete before any further callbacks on `D1` will execute (and later callbacks on `D1` get the result of `D2` rather than `D2` itself). So, `D1` might have `called=True` (as in, it has started running its callbacks), but any new callbacks added to `D1` won't get run until `D2` completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield` will 'block'. In conclusion: some of our assumptions in `logcontext` were invalid. We need to make sure that we don't optimise out the logcontext juggling when this situation happens. Fortunately, it is easy to detect by checking `D1.paused`.	2018-05-02 11:58:00 +01:00
Richard van der Hoff	44a498418c	Optimise LoggingContext creation and copying It turns out that the only thing we use the __dict__ of LoggingContext for is `request`, and given we create lots of LoggingContexts and then copy them every time we do a db transaction or log line, using the __dict__ seems a bit redundant. Let's try to optimise things by making the request attribute explicit.	2018-01-16 15:49:42 +00:00
Richard van der Hoff	a6ad8148b9	Fix name of test_logcontext The file under test is logcontext.py, not log_context.py	2017-10-17 10:53:34 +01:00

27 Commits