there were a lot of issues with bookkeeping this at the connection level; in the end, the timers infra was a much better proxy for all of this; set timer after write; cancel it on reading data to parse
pools are then used only to fetch new conenctions; selectors are discarded when not needed anymore; HTTPX.wrap is for now patched, but would ideally be done with in the future
connection bookkeeping on the pool changes, where all conections are kept around, even the ones that close during the scope of the requests; new requests may then find them, reset them, an reselect them. this is a major improvement, as objects get more reused, so less gc and object movement. this also changes the way pool terminates, as connections nonw follow a termination protocol, instead of just closing (which they can while scope is open)
the change to read/write cancellation-driven timeouts as the default
timeout strategy revealed a performance regression; because these were
built on Timers, which never got unsubscribed, this meant that they were
kept beyond the duration of the request they were created for, and
needlessly got picked up for the next timeout tick.
This was fixed by adding a callback on timer intervals, which
unsubscribes them from the timer group when called; these would then be
activated after the timeout is not needed anymore (request send /
response received), thereby removing the overhead on subsequent
requests.
An additional intervals array is also kept in the connection itself;
timeouts from timers are signalled via socket wait calls, however they
were always resulting in timeouts, even when they shouldn't (ex: expect
timeout and send full response payload as a result), and with the wrong
exception class in some cases. By keeping intervals from its requests
around, and monitoring whether there are relevant request triggers, the
connection can therefore handle a timeout or bail out (so that timers
can fire the correct callback).
for instance, in multi-homed networks, ´/etc/hosts` will have both
"127.0.0.1" and "::1" pointing from localhost; still only one of
them may be reachable, if a server binds only to "127.0.0.1", for
exammple. In such cases, the early exit placed to prevent the loop
from b0777c61e was preventing the dual-stack IP resolve to pass the
second set of responses, thereby potentiallly making only the
unreachable IP accessible to the connection.
yet another compliance fix for the DNS protocol; while udp is the
preferred transport, in case a truncated response is received, the
resolver will switch to tcp, and performm the DNS query again.
This introduces a new resolver option, `:socket_type`, which is `:udp`
by default.
domain not found
since httpx supports candidate calculations for dns queries, candidates
were always traversed when no answers were back. However, the DNS
message response contains a code set by the server, indicating whether
we should consider the domain existing **but** has no adderss, and
unexisting; candidates should only be queried in the latter.
Implemementing the following fixes:
* connections are now marked by IP family of the IO;
* connection "mergeable" status dependent on matching origin when conn
is already open;
* on a connection error, in case it happened while connection, an event
is emitted, rather than handling it; if there is another connection
for the same origin still doing the handshake, the error is ignored;
if not, the error is handled;
* a new event, `:tcp_open`, is emitted when the tcp socket conn is
established; this allows for concurrent handshakes to be promptly
terminated, instead of being dependent on TLS handshake;
* connection cloning now happens early, as connection is set for
resolving; this way, 2-way callbacks are set as early as possible;
resolvers
All kinds of errors happening during the select loop, will be handled as
abrupt select loop errors, and terminate all connections; this also
includes timmeout errors. This is not ideal, for some reasons:
connection timeout errors happening on the loop close all connections,
although it may be only triggered for one (or a subset of) connection
for which the timeout should trigger; second, errors on the DS channel
propagate errors to connections indirectly (the emission mentioned
above), wrongly (connections for different hostnames not yet queried,
will also fail with timeout), and won't clean the resolver state (so
subsequent queries will be done for the same hostname which failed in
the first place).
This fix is a first step to solving this problem. It does not totally
address the first, but i'll fix dealing with errors from the second
use-case.
for multi-backed resolvers, resolving is attempted before sending it to
the resolver. in this way, cached, local or ip resolves get
propagated to the proper resolver by ip family, instead of the
previous mess.
the system resolver doesn't do these shenanigans (trust getaddrinfo)
the ruby `resolver` library does everthing in ruby, and sequentially
(first ipv4 then ipv6 resolution). we already have native for that, and
getaddrinfo should be considered the ideal way to use DNS (potentially
in the future, it becomes the default resolver).
Two resolver are kept (IPv6/IPv4) along in the pool, to which all
names are sent to and read from in the same pool. IPv4 resolves are
subject to a 50ms delay (as per rfc) before they're used for connecting.
IPv6 addresses have preference, in that if they arrive before the delay,
they are immediately used. If they arrive after the delay, they do not
interrupt the connection, but they'll be the next-in-line in case
connection handshake fails.
Two resolvers are kept, but the inherent Connection will be shared,
thereby sending name resolving requests to the same HTTP/2 connection in
bulk. The resolution delay logic from above also applies.
Currently handles resolving via `resolv` lib. This happens synchronously
though, so we're not there yet.
during interest calculation
A quirk was found whereby a connection which failed while connecting
(such as the badssl test) was properly unregistered from the pool, was
however kept in the selectables selector pool, because if this operation
happening during the interest calculation pool, and the var substitution
being performed right afterwards, leaving the pool and selector out of
sync and causing all sorts of miscalculations around timers later on.