for instance, in multi-homed networks, ´/etc/hosts` will have both
"127.0.0.1" and "::1" pointing from localhost; still only one of
them may be reachable, if a server binds only to "127.0.0.1", for
exammple. In such cases, the early exit placed to prevent the loop
from b0777c61e was preventing the dual-stack IP resolve to pass the
second set of responses, thereby potentiallly making only the
unreachable IP accessible to the connection.
As per the ruby IO reader protocol, which Response::Body was aimed at
suppoorting since the beginning, the call to #rewind was impeding it
from consuming the body buffer, and instead delivering the same
substring everytime.
These internnal registries were a bit magical to use, difficult to
debug, not thread-safe, and overall a nuisance when it came to type
checking. So long.
yet another compliance fix for the DNS protocol; while udp is the
preferred transport, in case a truncated response is received, the
resolver will switch to tcp, and performm the DNS query again.
This introduces a new resolver option, `:socket_type`, which is `:udp`
by default.
The reference for a request verb is now the string which is used
everywhere else, instead of the symbol corresponding to it. This was an
artifact from the import from httprb, and there is no advantage in it,
since these strings are frozen in most use cases, and the
transformations from symbol to strings being performed everywhere are
prooof that keeping the atom isn't really bringing any benefit.
connections weren't being correctly initiated, as proxies were filtered
for the whole session based on URI.find_proxy for the first call. This
fixes it by:
* applying it to all used uris;
* falling back to proxy options instead;
* apply no_proxy option in case it's used, using
`URI::Generic.use_proxy?
domain not found
since httpx supports candidate calculations for dns queries, candidates
were always traversed when no answers were back. However, the DNS
message response contains a code set by the server, indicating whether
we should consider the domain existing **but** has no adderss, and
unexisting; candidates should only be queried in the latter.
Implemementing the following fixes:
* connections are now marked by IP family of the IO;
* connection "mergeable" status dependent on matching origin when conn
is already open;
* on a connection error, in case it happened while connection, an event
is emitted, rather than handling it; if there is another connection
for the same origin still doing the handshake, the error is ignored;
if not, the error is handled;
* a new event, `:tcp_open`, is emitted when the tcp socket conn is
established; this allows for concurrent handshakes to be promptly
terminated, instead of being dependent on TLS handshake;
* connection cloning now happens early, as connection is set for
resolving; this way, 2-way callbacks are set as early as possible;
This returns the filename advertised in the content-disposition header.
It reuses the same logic which existed for parsing multipart responses,
which itself was based on `rack`'s.
Until now, httpx was issuing concurrent DNS requests, but it'd only
start connecting to the first, and then on the following by the right
order, but sequentially.
With this change, httpx will now continue the process by connecting
concurrently to both IPv6 and IPv4, and close the other connection once
one is established. This means both TCP and TLS (when applicable) need
to succeed before the second connection is cancelled.
These are deadline oriented for the request and response, i.e. a write
timeout tracks the full time it takes to write the request, whereas the
read timeout does the same for receiving the response.
For back-compat, they're infinite by default. v1 may change that, and
will have to provide a safe fallback for endless "stream" requests and
responses.
A certain behaviour was observed, when performing some tests using the
hackernews script, where after a failed request on a non-initiated
connection, a new DNS resolution would be emitted, although the
connection still had other IPs to try on. This led to a cascading
behaviour where the DNS response would fill up the connection with the
same repeated IPs and trigger coalescing, which would loop indefinitely
after emitting the resolve event.
This was fixed by not allowing DNS resolution on already resolved names,
to propagate to connections which already contain the advertised IPs.
This seems to address the github issue 5, which description matches the
observed behaviour.
resolvers
All kinds of errors happening during the select loop, will be handled as
abrupt select loop errors, and terminate all connections; this also
includes timmeout errors. This is not ideal, for some reasons:
connection timeout errors happening on the loop close all connections,
although it may be only triggered for one (or a subset of) connection
for which the timeout should trigger; second, errors on the DS channel
propagate errors to connections indirectly (the emission mentioned
above), wrongly (connections for different hostnames not yet queried,
will also fail with timeout), and won't clean the resolver state (so
subsequent queries will be done for the same hostname which failed in
the first place).
This fix is a first step to solving this problem. It does not totally
address the first, but i'll fix dealing with errors from the second
use-case.
A subtle bug slipped through the cracks, where if a resolve timeout
error happened, the connection would remain in the pool. Subsequent
requests to the same domain would activate it, although no requests
would go through it; the actual desired behaviour is to outright remove
it from the pool on such errors.
This was achieved by registering the "unregister_connection" callback
earlier. However, the connection accounting step would only trigger if
taking it out of the selector worked, meaning it had been registered
before.
In order to expose other auth schemes in proxy, the basic, digest and
ntlm modules were extracted from the plugins, these being left with the
request management. So now, an extra parameter, `:scheme`, can be
passed (it'll be "basic" for http and "socks5" for socks5 by default,
can also be "digest" or "ntlm", haven't tested those yet).
for multi-backed resolvers, resolving is attempted before sending it to
the resolver. in this way, cached, local or ip resolves get
propagated to the proper resolver by ip family, instead of the
previous mess.
the system resolver doesn't do these shenanigans (trust getaddrinfo)
the ruby `resolver` library does everthing in ruby, and sequentially
(first ipv4 then ipv6 resolution). we already have native for that, and
getaddrinfo should be considered the ideal way to use DNS (potentially
in the future, it becomes the default resolver).