This way we avoid parsing messages with unexpected message IDs, which
might not even be possible if we don't have the keys anymore. However,
the next commit should avoid the latter and this way we avoid deriving
keys for retransmits or unexpected messages.
This also changes how retransmits for fragmented messages are triggered.
Previously, we waited for all fragments and reconstructed the message
before retransmitting the response. Now we only track the first
fragment and if we receive a retransmit of it respond immediately
without waiting for other fragments (which are now ignored). This is in
compliance with RFC 7383, section 2.6.1, and can avoid issues if there
are lots of fragments.
This gives us more flexibility with tasks that return NEED_MORE (currently
none of the colliding tasks do, but that will change with multi-KE
rekeyings). The active task has to check itself if the passive task is
done and should be removed from the task manager.
We don't actually check that SA out (i.e. it's not registered with the
manager). That was originally different but had to be changed with
86993d6b9037 to avoid that SAs created for rekeying don't block other
threads on the manager.
This allows more fine grained control over what's updated and does not
require multiple calls of the method. Plus we'll be able to use it in
the ike-mobike task.
The message_t object used for defragmentation was only cleared after
all fragments have been received and the message was delivered. So
if we received only some fragments of a retransmitted message, the
fragments of the next message were not processed (message_t returns
INVALID_ARG if the message ID does not match causing the message to
get ignored). This rendered the IKE_SA unusable as the client
obviously never retransmitted the fragments of that previous message
after it received our response.
If a MOBIKE task is deferred, the retransmission counter is reset to 0
when reinitiating. So if there were retransmits before, this alert would
not be triggered if a response is received now without retransmits.
Retransmission jobs for old requests for which we already received a
response previously left the impression that messages were sent more
recently than was actually the case.
task_manager_t always defined INVALID_STATE as possible return value if
no retransmit was sent, this just was never actually returned.
I guess we could further differentiate between actual invalid states
(e.g. if we already received the response) and when we don't send a
retransmit for other reasons e.g. because the IKE_SA became stale.
Due to the exponential backoff a high number of retransmits only
makes sense if retransmit_limit is set. However, even with that there
was a problem.
We first calculated the timeout for the next retransmit and only then
compared that to the configured limit. Depending on the configured
base and timeout the calculation overflowed the range of uint32_t after
a relatively low number of retransmits (with the default values after 23)
causing the timeout to first get lower (on a high level) before constantly
resulting in 0 (with the default settings after 60 retransmits).
Since that's obviously lower than any configured limit, all remaining
retransmits were then sent without any delay, causing a lot of concurrent
messages if the number of retransmits was high.
This change determines the maximum number of retransmits until an
overflow occurs based on the configuration and defaults to UINT32_MAX
if that value is exceeded. Note that since the timeout is in milliseconds
UINT32_MAX equals nearly 50 days.
The calculation in task_manager_total_retransmit_timeout() uses a double
variable and the result is in seconds so the maximum number would be higher
there (with the default settings 1205). However, we want its result to
be based on the actual IKE retransmission behavior.
RFC 7296, section 2.21.3:
If a peer parsing a request notices that it is badly formatted (after
it has passed the message authentication code checks and window
checks) and it returns an INVALID_SYNTAX notification, then this
error notification is considered fatal in both peers, meaning that
the IKE SA is deleted without needing an explicit Delete payload.
RFC 7296, section 2.21.3:
If a peer parsing a request notices that it is badly formatted (after
it has passed the message authentication code checks and window
checks) and it returns an INVALID_SYNTAX notification, then this
error notification is considered fatal in both peers, meaning that
the IKE SA is deleted without needing an explicit Delete payload.
This allows switching to probing mode if the client is on a public IP
and this is the active task and connectivity gets restored. We only add
NAT-D payloads if we are currently behind a NAT (to detect changed NAT
mappings), a MOBIKE update that might follow will add them in case we
move behind a NAT.
Since these are installed overlapping (like during a rekeying) we have to use
the same (unique) marks (and possibly reqid) that were used previously,
otherwise, the policy installation will fail.
Fixes#2610.
Instead of destroying the new task and keeping the existing one we
update any already queued task, so we don't loose any work (e.g. if a
DPD task is active and address update is queued and we'd actually like
to queue a roam task).
We do something similar in reestablish() for break-before-make reauth.
If we don't abort we'd be sending an IKE_AUTH without any TS payloads.
References #2430.
If this is the first message by the peer, i.e. we expect MID 0, the
message is not pre-processed in the task manager so we ignore it in the
task.
We also make sure to ignore such messages if the extension is disabled
and the peer already sent us one INFORMATIONAL, e.g. a DPD (we'd otherwise
consider the message with MID 0 as a retransmit).
If the responder never sent a message the expected MID is 0. While
the sent MID (M1) SHOULD be increased beyond the known value, it's
not necessarily the case.
Since M2 - 1 would then equal UINT_MAX setting that MID would get ignored
and while we'd return 0 in the notify we'd actually expect 1 afterwards.
If a responder is natted it will usually be a static NAT (unless it's a
mediated connection) in which case adding these notifies makes not much
sense (if the initiator's NAT mapping had changed the responder wouldn't
be able to reach it anyway). It's also problematic as some clients refuse
to respond to DPDs if they contain such notifies.
Fixes#2126.
Such a task is not initiated unless a certain time has passed. This
allows delaying certain tasks but avoids problems if we'd do this
via a scheduled job (e.g. if the IKE_SA is rekeyed in the meantime).
If the IKE_SA is rekeyed the delay of such tasks is reset when the
tasks are adopted i.e. they get executed immediately on the new IKE_SA.
This hasn't been implemented for IKEv1 yet.
This makes handling such IKE_SAs more specifically compared to keeping them
in state IKE_CONNECTING or IKE_ESTABLISHED (which we did when we lost a
collision - even triggering the ike_updown event), or using IKE_REKEYING for
them, which would also be ambiguous.
For instance, we can now reject anything but DELETES for such SAs.
We do these checks after the SA is fully established.
When establishing an SA the responder is always able to install the
CHILD_SA created with the IKE_SA before the initiator can do so.
During make-before-break reauthentication this could cause traffic sent
by the responder to get dropped if the installation of the SA on the
initiator is delayed e.g. by OCSP/CRL checks.
In particular, if the OCSP/CRL URIs are reachable via IPsec tunnel (e.g.
with rightsubnet=0.0.0.0/0) the initiator is unable to reach them during
make-before-break reauthentication as it wouldn't be able to decrypt the
response that the responder sends using the new CHILD_SA.
By delaying the revocation checks until the make-before-break
reauthentication is completed we avoid the problems described above.
Since this only affects reauthentication, not the original IKE_SA, and the
delay until the checks are performed is usually not that long this
doesn't impose much of a reduction in the overall security.