Headline
CVE-2022-39335: Faster joins: incoming federation requests during resync may be incorrectly rejected · Issue #13288 · matrix-org/synapse
Synapse is an open-source Matrix homeserver written and maintained by the Matrix.org Foundation. The Matrix Federation API allows remote homeservers to request the authorization events in a room. This is necessary so that a homeserver receiving some events can validate that those events are legitimate and permitted in their room. However, in versions of Synapse up to and including 1.68.0, a Synapse homeserver answering a query for authorization events does not sufficiently check that the requesting server should be able to access them. The issue was patched in Synapse 1.69.0. Homeserver administrators are advised to upgrade.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
richvdh opened this issue
Jul 15, 2022
· 8 comments · Fixed by #13823
Assignees
Comments
If a remote server sends us a federation request (such as get_missing_events, or requesting a space summary) during a state resync, we may well reject that request due to not thinking the remote server is in the room.
Possibly we need to define some return code to say "sorry, can’t auth you right now".
Possibly we need to define some return code to say "sorry, can’t auth you right now".
(or just rely on the list of servers we get during the join dance)
Note that if we send out an event (as per #12997) we must respond to get_missing_events correctly or we risk having our event rejected (due to the remote server not being able to get the prev events)
This was referenced
Aug 26, 2022
Note that if we send out an event (as per #12997) we must respond to get_missing_events correctly or we risk having our event rejected (due to the remote server not being able to get the prev events)
This is a very valid point. (I also think this is an interoperability landmine; it would be good for the spec to be very clear about what exactly a homeserver needs to be able to answer when sending an event! — I don’t see anywhere where it says the sending homeserver has to be able to produce the prev_events of an event it is sending…).
We could track which events are prev_events of the events that we have sent out during the partial join and allow these to be returned to get_missing_events. This supposes that this is sufficient for getting the other server to accept our event, but it would be nice if this was rigorously defined in specification!
Possibly we need to define some return code to say "sorry, can’t auth you right now".
This sounds like it might be nasty, because we can’t retroactively create an error code that non-MSC2706-implementing homeservers will understand. Maybe this is OK anyway though, because other homeservers have to treat us as potentially malicious and they need to be equipped to handle strange error responses from us by falling back to other servers as appropriate.
(or just rely on the list of servers we get during the join dance)
Note that this can fall out of sync with reality. This is probably still OK
If a new server joins the room and we receive the event, we can handle that.
But we can’t handle a server leaving the room, because we may see a leave event for one of the server’s users, but we need to know whether it’s the last user from that server (which by implication means that we need to know all of that server’s users, at least in the current model).
Given that we can’t conclusively track the set of servers in the room (because we will always have to have this indeterminate state arising from a server’s user leaving the room and us not knowing if it was the last user for their server), it seems like it would be simplest to just not bother (in cases where that’s acceptable — answering queries for event sending auth purposes is a different matter and may need special casing so that we accept relevant queries from the servers that we send our events to, such as /get_missing_events).
This makes it tempting to indeed just define an error code that says 'can’t auth you right now’. Since our server must still be syncing with another server to be successful, then it follows that there must be another server available to answer the remote homeserver’s query. Either that, or the room is scuppered since we’ll never be able to complete the state sync.
If we want to, we can later enhance this by answering queries where we can and where the membership of the requesting remote homeserver is determinate. However, requesting homeservers need to be able to cope with uncooperative homeservers anyway, so they should already have logic to try against other homeservers. I suggest that we just let them re-use that, for simplicity.
#12997 will then unfortunately have to gain some special casing, probably along the lines of 'we will answer relevant /get_missing_events queries for servers that we send our events out to’.
Here’s a thought: can we modify partial-state joins so that we can keep track of which servers are in the room?
How about changing, in the MSC3706 spec,
servers_in_room: A new field of type [string], listing the servers active in the room (ie, those with joined members) before the join.
to
servers_in_room: A new field of type {string: integer}, listing the servers active in the room (ie, those with joined members) before the join and giving the number of joined members from that server.
Both of these are abstract interpretations of the set of users in the room — in the sense that they tell you something useful about the set of users in the room, but only the latter is updatable (or at least it seems that way to me):
Assuming that we, as a partial-stated homeserver, can correctly witness join→{leave, kick, ban} and {invite, kick, leave, ø}→join transitions, we can keep updated with the list of servers in the room, whilst withstanding membership changes, just by keeping a tally of the number of users per server. If the server drops down to zero, they’re out. New users joining from unknown servers start the count as zero.
Here’s a thought: can we modify partial-state joins so that we can keep track of which servers are in the room?
This was somewhat shot down: when we complete our join after doing a partial join, we may find that unrejected events should become rejected and some rejected events may become unrejected. In other words, we may accept some events which it turns out we should have not.
This erodes my confidence that we can track servers in the room correctly, so for now I would not be in favour of doing that (or at least, not using the results of that for authorisation) — instead keeping things simple, we’ll just require servers to have fallback behaviour (which they should already have anyway) to find fully-joined servers to answer their queries.
We’ll have to make an exception for the checks that other servers want to make on the events that we ourselves send into the room.
I wonder if it’s actually OK to send our own events into the current list of hypothesised joined servers — it seems to me that the most important aspect is that the user is aware, but will this actually be the case? If we’re basing the server memberships on a list received out-of-band from the room’s state, it doesn’t hold that the user will see joined members for all the servers in the room. (is this right?)
Do we need to narrow down event sending to public rooms where this isn’t a concern?
Note that if we send out an event (as per #12997) we must respond to get_missing_events correctly or we risk having our event rejected (due to the remote server not being able to get the prev events)
but… in that particular case, we know the list of servers we sent the event to, so can’t we just use the same list to decide whether they are allowed to get_missing_events?
Note that if we send out an event (as per #12997) we must respond to get_missing_events correctly or we risk having our event rejected (due to the remote server not being able to get the prev events)
but… in that particular case, we know the list of servers we sent the event to, so can’t we just use the same list to decide whether they are allowed to get_missing_events?
@reivilibre points out that this runs the risk of us divulging room content to a server that has since left the room, but also makes a good suggestion for how to avoid this: ensure that all events that we create during the room resync are maintained in a fork of the DAG starting at our join.
Related news
### Impact The Matrix Federation API allows remote homeservers to request the *authorisation events* of events in a room. This is necessary so that a homeserver receiving some events can validate that those events are legitimate and permitted in their room. However, in versions of Synapse up to and including 1.68.0, a Synapse homeserver answering a query for authorisation events does not sufficiently check that the requesting server should be able to access them. Authorisation events include power level events (the list of user IDs and their power levels at the time) and relevant membership events (including the display name of the sender of that event), as well as events like `m.room.create`, `m.room.third_party_invite` and `m.room.join_rules`. Non-authorisation events are unaffected, so it isn't possible to e.g. extract message contents this way. This issue is only exploitable when a malicious actor knows the ID of a target room and the ID of an event from that room. In most cases...