Compare commits

...

8 Commits

Author SHA1 Message Date
HoneyryderChuck
da0ef24b09 bumped version to 0.24.4 2022-08-01 22:46:41 +01:00
HoneyryderChuck
8359d6b007 Merge branch 'issue-200' into 'master'
response_cache: fixes and improvements

Closes #200

See merge request honeyryderchuck/httpx!216
2022-08-01 17:50:21 +00:00
HoneyryderChuck
e691cbdf5e Merge branch 'github-issue-5' into 'master'
fix for loop on resolution and retry on new connection

See merge request honeyryderchuck/httpx!215
2022-08-01 17:49:45 +00:00
HoneyryderChuck
61c36c4ef9 response cache: caching several instances for the same URL
by relying on vary header, this should have the effect of not
overflowinng, and doing what the user wants.
2022-08-01 18:40:21 +01:00
HoneyryderChuck
f48f4e414a Fixes and improvements on the response_cache plugin
The following improvements were done:

* only cacheable status codes are allowed now (200, 203, 300, 301, 410)
* only responses considered fresh are cached; fresh response means:
  * no-store directive not present in cache-control
  * response hasn’t expired (when s-maxage, max-age or expires are
      present)
2022-08-01 17:40:05 +01:00
HoneyryderChuck
122b75a14c fixing hackernews script 2022-07-31 19:07:41 +01:00
HoneyryderChuck
b0777c61e5 fix for loop on resolution and retry on new connection
A certain behaviour was observed, when performing some tests using the
hackernews script, where after a failed request on a non-initiated
connection, a new DNS resolution would be emitted, although the
connection still had other IPs to try on. This led to a cascading
behaviour where the DNS response would fill up the connection with the
same repeated IPs and trigger coalescing, which would loop indefinitely
after emitting the resolve event.

This was fixed by not allowing DNS resolution on already resolved names,
to propagate to connections which already contain the advertised IPs.

This seems to address the github issue 5, which description matches the
observed behaviour.
2022-07-31 19:07:41 +01:00
HoneyryderChuck
32a81f2025 fix: response cache now also takes verb into account when caching
The previous strategy was working only with URLs. This strategy would
fall flat if the same url could be used with several HTTP verbs.
2022-07-31 17:03:30 +01:00
17 changed files with 318 additions and 71 deletions

View File

@ -116,6 +116,7 @@ group :assorted do
gem "pry-byebug", "~> 3.4.3" gem "pry-byebug", "~> 3.4.3"
else else
gem "pry-byebug" gem "pry-byebug"
gem "debug" if RUBY_VERSION >= "3.1.0"
end end
end end
end end

View File

@ -0,0 +1,17 @@
# 0.20.4
## Improvements
The `:response_cache` plugin is now more compliant with how the RFC 2616 defines which behaviour caches shall have:
* it caches only responses with one of the following status codes: 200, 203, 300, 301, 410.
* it discards cached responses which become stale.
* it supports "cache-control" header directives to decided when to cache, to store, what the response "age" is.
* it can cache more than one response for the same request, provided that the request presents different header values for the headers declared in the "vary" response header (previously, it was only caching the first response, and discarding the remainder).
## Bugfixes
* fixed DNS resolution bug which caused a loop when a failed connection attempt would cause a new DNS request to be triggered for the same domain, filling up and giving preference to the very IP which failed the attempt.
* response_cache: request verb is now taken into account, not causing HEAD/GET confusion for the same URL.

View File

@ -1,25 +1,36 @@
require "httpx" require "httpx"
require "oga" require "oga"
http = HTTPX.plugin(:compression).plugin(:persistent).with(timeout: { operation_timeut: 5, connect_timeout: 5})
PAGES = (ARGV.first || 10).to_i PAGES = (ARGV.first || 10).to_i
pages = PAGES.times.map do |page| pages = PAGES.times.map do |page|
"https://news.ycombinator.com/?p=#{page+1}" "https://news.ycombinator.com/?p=#{page+1}"
end end
links = [] links = Array.new(PAGES) { [] }
HTTPX.plugin(:compression).get(*pages).each_with_index.map do |response, i| Array(http.get(*pages)).each_with_index.map do |response, i|
if response.is_a?(HTTPX::ErrorResponse) if response.is_a?(HTTPX::ErrorResponse)
puts "error: #{response.error}" puts "error: #{response.error}"
next next
end end
html = Oga.parse_html(response.to_s) html = Oga.parse_html(response.to_s)
page_links = html.css('.itemlist a.storylink').map{|link| link.get('href') } # binding.irb
page_links = html.css('.itemlist a.titlelink').map{|link| link.get('href') }
puts "page(#{i+1}): #{page_links.size}" puts "page(#{i+1}): #{page_links.size}"
if page_links.size == 0 if page_links.size == 0
puts "error(#{response.status}) on page #{i+1}" puts "error(#{response.status}) on page #{i+1}"
end end
links << page_links # page_links.each do |link|
# puts "link: #{link}"
# links[i] << http.get(link)
# end
links[i].concat(http.get(*page_links))
end end
links = links.flatten links = links.each_with_index do |pages, i|
puts "Pages: #{PAGES}\t Links: #{links.size}" puts "Page: #{i+1}\t Links: #{pages.size}"
pages.each do |page|
puts "URL: #{page.uri} (#{page.status})"
end
end

View File

@ -83,22 +83,41 @@ module HTTPX
end end
module ArrayExtensions module ArrayExtensions
refine Array do module FilterMap
refine Array do
def filter_map def filter_map
return to_enum(:filter_map) unless block_given? return to_enum(:filter_map) unless block_given?
each_with_object([]) do |item, res| each_with_object([]) do |item, res|
processed = yield(item) processed = yield(item)
res << processed if processed res << processed if processed
end
end end
end unless Array.method_defined?(:filter_map) end unless Array.method_defined?(:filter_map)
end
def sum(accumulator = 0, &block) module Sum
values = block_given? ? map(&block) : self refine Array do
values.inject(accumulator, :+) def sum(accumulator = 0, &block)
values = block_given? ? map(&block) : self
values.inject(accumulator, :+)
end
end unless Array.method_defined?(:sum) end unless Array.method_defined?(:sum)
end end
module Intersect
refine Array do
def intersect?(arr)
if size < arr.size
smaller = self
else
smaller, arr = arr, self
end
(array & smaller).size > 0
end
end unless Array.method_defined?(:intersect?)
end
end end
module IOExtensions module IOExtensions

View File

@ -72,7 +72,7 @@ module HTTPX
end end
module ResponseBodyMethods module ResponseBodyMethods
using ArrayExtensions using ArrayExtensions::FilterMap
attr_reader :encodings attr_reader :encodings

View File

@ -9,7 +9,9 @@ module HTTPX
# #
module ResponseCache module ResponseCache
CACHEABLE_VERBS = %i[get head].freeze CACHEABLE_VERBS = %i[get head].freeze
CACHEABLE_STATUS_CODES = [200, 203, 206, 300, 301, 410].freeze
private_constant :CACHEABLE_VERBS private_constant :CACHEABLE_VERBS
private_constant :CACHEABLE_STATUS_CODES
class << self class << self
def load_dependencies(*) def load_dependencies(*)
@ -17,14 +19,28 @@ module HTTPX
end end
def cacheable_request?(request) def cacheable_request?(request)
CACHEABLE_VERBS.include?(request.verb) CACHEABLE_VERBS.include?(request.verb) &&
(
!request.headers.key?("cache-control") || !request.headers.get("cache-control").include?("no-store")
)
end end
def cacheable_response?(response) def cacheable_response?(response)
response.is_a?(Response) && response.is_a?(Response) &&
# partial responses shall not be cached, only full ones. (
response.cache_control.nil? ||
# TODO: !response.cache_control.include?("private") && is shared cache
!response.cache_control.include?("no-store")
) &&
CACHEABLE_STATUS_CODES.include?(response.status) &&
# RFC 2616 13.4 - A response received with a status code of 200, 203, 206, 300, 301 or
# 410 MAY be stored by a cache and used in reply to a subsequent
# request, subject to the expiration mechanism, unless a cache-control
# directive prohibits caching. However, a cache that does not support
# the Range and Content-Range headers MUST NOT cache 206 (Partial
# Content) responses.
response.status != 206 && ( response.status != 206 && (
response.headers.key?("etag") || response.headers.key?("last-modified-at") response.headers.key?("etag") || response.headers.key?("last-modified-at") || response.fresh?
) )
end end
@ -52,7 +68,7 @@ module HTTPX
def build_request(*) def build_request(*)
request = super request = super
return request unless ResponseCache.cacheable_request?(request) && @options.response_cache_store.cached?(request.uri) return request unless ResponseCache.cacheable_request?(request) && @options.response_cache_store.cached?(request)
@options.response_cache_store.prepare(request) @options.response_cache_store.prepare(request)
@ -62,25 +78,99 @@ module HTTPX
def fetch_response(request, *) def fetch_response(request, *)
response = super response = super
if response && ResponseCache.cached_response?(response) return unless response
if ResponseCache.cached_response?(response)
log { "returning cached response for #{request.uri}" } log { "returning cached response for #{request.uri}" }
cached_response = @options.response_cache_store.lookup(request.uri) cached_response = @options.response_cache_store.lookup(request)
response.copy_from_cached(cached_response) response.copy_from_cached(cached_response)
else
@options.response_cache_store.cache(request, response)
end end
@options.response_cache_store.cache(request.uri, response) if response && ResponseCache.cacheable_response?(response)
response response
end end
end end
module RequestMethods
def response_cache_key
@response_cache_key ||= Digest::SHA1.hexdigest("httpx-response-cache-#{@verb}#{@uri}")
end
end
module ResponseMethods module ResponseMethods
def copy_from_cached(other) def copy_from_cached(other)
@body = other.body @body = other.body
@body.__send__(:rewind) @body.__send__(:rewind)
end end
# A response is fresh if its age has not yet exceeded its freshness lifetime.
def fresh?
if cache_control
return false if cache_control.include?("no-cache")
# check age: max-age
max_age = cache_control.find { |directive| directive.start_with?("s-maxage") }
max_age ||= cache_control.find { |directive| directive.start_with?("max-age") }
max_age = max_age[/age=(\d+)/, 1] if max_age
max_age = max_age.to_i if max_age
return max_age > age if max_age
end
# check age: expires
if @headers.key?("expires")
begin
expires = Time.httpdate(@headers["expires"])
rescue ArgumentError
return true
end
return (expires - Time.now).to_i.positive?
end
true
end
def cache_control
return @cache_control if defined?(@cache_control)
@cache_control = begin
return unless @headers.key?("cache-control")
@headers["cache-control"].split(/ *, */)
end
end
def vary
return @vary if defined?(@vary)
@vary = begin
return unless @headers.key?("vary")
@headers["vary"].split(/ *, */)
end
end
private
def age
return @headers["age"].to_i if @headers.key?("age")
(Time.now - date).to_i
end
def date
@date ||= Time.httpdate(@headers["date"])
rescue NoMethodError, ArgumentError
Time.now.httpdate
end
end end
end end
register_plugin :response_cache, ResponseCache register_plugin :response_cache, ResponseCache

View File

@ -13,35 +13,38 @@ module HTTPX::Plugins
@store = {} @store = {}
end end
def lookup(uri) def lookup(request)
@store[uri] responses = @store[request.response_cache_key]
return unless responses
response = responses.find(&method(:match_by_vary?).curry(2)[request])
return unless response && response.fresh?
response
end end
def cached?(uri) def cached?(request)
@store.key?(uri) lookup(request)
end end
def cache(uri, response) def cache(request, response)
@store[uri] = response return unless ResponseCache.cacheable_request?(request) && ResponseCache.cacheable_response?(response)
responses = (@store[request.response_cache_key] ||= [])
responses.reject!(&method(:match_by_vary?).curry(2)[request])
responses << response
end end
def prepare(request) def prepare(request)
cached_response = @store[request.uri] cached_response = lookup(request)
return unless cached_response return unless cached_response
original_request = cached_response.instance_variable_get(:@request) return unless match_by_vary?(request, cached_response)
if (vary = cached_response.headers["vary"])
if vary == "*"
return unless request.headers.same_headers?(original_request.headers)
else
return unless vary.split(/ *, */).all? do |cache_field|
cache_field.downcase!
!original_request.headers.key?(cache_field) || request.headers[cache_field] == original_request.headers[cache_field]
end
end
end
if !request.headers.key?("if-modified-since") && (last_modified = cached_response.headers["last-modified"]) if !request.headers.key?("if-modified-since") && (last_modified = cached_response.headers["last-modified"])
request.headers.add("if-modified-since", last_modified) request.headers.add("if-modified-since", last_modified)
@ -51,6 +54,23 @@ module HTTPX::Plugins
request.headers.add("if-none-match", etag) request.headers.add("if-none-match", etag)
end end
end end
private
def match_by_vary?(request, response)
vary = response.vary
return true unless vary
original_request = response.instance_variable_get(:@request)
return request.headers.same_headers?(original_request.headers) if vary == %w[*]
vary.all? do |cache_field|
cache_field.downcase!
!original_request.headers.key?(cache_field) || request.headers[cache_field] == original_request.headers[cache_field]
end
end
end end
end end
end end

View File

@ -7,7 +7,7 @@ require "httpx/resolver"
module HTTPX module HTTPX
class Pool class Pool
using ArrayExtensions using ArrayExtensions::FilterMap
extend Forwardable extend Forwardable
def_delegator :@timers, :after def_delegator :@timers, :after

View File

@ -6,7 +6,7 @@ require "resolv"
module HTTPX module HTTPX
class Resolver::Multi class Resolver::Multi
include Callbacks include Callbacks
using ArrayExtensions using ArrayExtensions::FilterMap
attr_reader :resolvers attr_reader :resolvers

View File

@ -8,6 +8,8 @@ module HTTPX
include Callbacks include Callbacks
include Loggable include Loggable
using ArrayExtensions::Intersect
RECORD_TYPES = { RECORD_TYPES = {
Socket::AF_INET6 => Resolv::DNS::Resource::IN::AAAA, Socket::AF_INET6 => Resolv::DNS::Resource::IN::AAAA,
Socket::AF_INET => Resolv::DNS::Resource::IN::A, Socket::AF_INET => Resolv::DNS::Resource::IN::A,
@ -48,6 +50,10 @@ module HTTPX
addresses.map! do |address| addresses.map! do |address|
address.is_a?(IPAddr) ? address : IPAddr.new(address.to_s) address.is_a?(IPAddr) ? address : IPAddr.new(address.to_s)
end end
# double emission check
return if connection.addresses && !addresses.intersect?(connection.addresses)
log { "resolver: answer #{connection.origin.host}: #{addresses.inspect}" } log { "resolver: answer #{connection.origin.host}: #{addresses.inspect}" }
if @pool && # if triggered by early resolve, pool may not be here yet if @pool && # if triggered by early resolve, pool may not be here yet
!connection.io && !connection.io &&
@ -56,8 +62,12 @@ module HTTPX
addresses.first.to_s != connection.origin.host.to_s addresses.first.to_s != connection.origin.host.to_s
log { "resolver: A response, applying resolution delay..." } log { "resolver: A response, applying resolution delay..." }
@pool.after(0.05) do @pool.after(0.05) do
connection.addresses = addresses # double emission check
emit(:resolve, connection) unless connection.addresses && addresses.intersect?(connection.addresses)
connection.addresses = addresses
emit(:resolve, connection)
end
end end
else else
connection.addresses = addresses connection.addresses = addresses

View File

@ -310,9 +310,12 @@ module HTTPX
class ErrorResponse class ErrorResponse
include Loggable include Loggable
extend Forwardable
attr_reader :request, :error attr_reader :request, :error
def_delegator :@request, :uri
def initialize(request, error, options) def initialize(request, error, options)
@request = request @request = request
@error = error @error = error

View File

@ -9,7 +9,7 @@ module HTTPX::Transcoder
module_function module_function
class Encoder class Encoder
using HTTPX::ArrayExtensions using HTTPX::ArrayExtensions::Sum
extend Forwardable extend Forwardable
def_delegator :@raw, :to_s def_delegator :@raw, :to_s

View File

@ -1,5 +1,5 @@
# frozen_string_literal: true # frozen_string_literal: true
module HTTPX module HTTPX
VERSION = "0.20.3" VERSION = "0.20.4"
end end

View File

@ -8,15 +8,19 @@ module HTTPX
def self?.cached_response?: (response response) -> bool def self?.cached_response?: (response response) -> bool
class Store class Store
@store: Hash[URI::Generic, Response] @store: Hash[String, Array[Response]]
def lookup: (URI::Generic uri) -> Response? def lookup: (Request request) -> Response?
def cached?: (URI::Generic uri) -> bool def cached?: (Request request) -> boolish
def cache: (URI::Generic uri, Response response) -> void def cache: (Request request, Response response) -> void
def prepare: (Request request) -> void def prepare: (Request request) -> void
private
def match_by_vary?: (Request request, Response response) -> bool
end end
module InstanceMethods module InstanceMethods
@ -25,8 +29,24 @@ module HTTPX
def clear_response_cache: () -> void def clear_response_cache: () -> void
end end
module RequestMethods
def response_cache_key: () -> String
end
module ResponseMethods module ResponseMethods
def copy_from_cached: (Response other) -> void def copy_from_cached: (Response other) -> void
def fresh?: () -> bool
def cache_control: () -> Array[String]?
def vary: () -> Array[String]?
private
def age: () -> Integer
def date: () -> Time
end end
end end

View File

@ -96,6 +96,7 @@ module HTTPX
class ErrorResponse class ErrorResponse
include _Response include _Response
include Loggable include Loggable
extend Forwardable
@options: Options @options: Options
@error: Exception @error: Exception
@ -104,6 +105,8 @@ module HTTPX
def status: () -> (Integer | _ToS) def status: () -> (Integer | _ToS)
def uri: () -> URI::Generic
private private
def initialize: (Request, Exception, options) -> untyped def initialize: (Request, Exception, options) -> untyped

View File

@ -7,55 +7,105 @@ class ResponseCacheStoreTest < Minitest::Test
include HTTPX include HTTPX
def test_store_cache def test_store_cache
request = Request.new(:get, "http://example.com/") request = request_class.new(:get, "http://example.com/")
response = cached_response(request) response = cached_response(request)
assert store.lookup(request.uri) == response assert store.lookup(request) == response
assert store.cached?(request.uri) assert store.cached?(request)
request2 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" }) request2 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
assert store.lookup(request2.uri) == response assert store.lookup(request2) == response
request3 = request_class.new(:post, "http://example.com/", headers: { "accept" => "text/plain" })
assert store.lookup(request3).nil?
end
def test_store_error_status
request = request_class.new(:get, "http://example.com/")
_response = cached_response(request, status: 404)
assert !store.cached?(request)
_response = cached_response(request, status: 410)
assert store.cached?(request)
end
def test_store_no_store
request = request_class.new(:get, "http://example.com/")
_response = cached_response(request, extra_headers: { "cache-control" => "private, no-store" })
assert !store.cached?(request)
end
def test_store_maxage
request = request_class.new(:get, "http://example.com/")
response = cached_response(request, extra_headers: { "cache-control" => "max-age=2" })
assert store.lookup(request) == response
sleep(3)
assert store.lookup(request).nil?
request2 = request_class.new(:get, "http://example2.com/")
_response2 = cached_response(request2, extra_headers: { "cache-control" => "no-cache, max-age=2" })
assert store.lookup(request2).nil?
end
def test_store_expires
request = request_class.new(:get, "http://example.com/")
response = cached_response(request, extra_headers: { "expires" => (Time.now + 2).httpdate })
assert store.lookup(request) == response
sleep(3)
assert store.lookup(request).nil?
request2 = request_class.new(:get, "http://example2.com/")
_response2 = cached_response(request2, extra_headers: { "cache-control" => "no-cache", "expires" => (Time.now + 2).httpdate })
assert store.lookup(request2).nil?
end end
def test_prepare_vary def test_prepare_vary
request = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" }) request = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
cached_response(request, { "vary" => "Accept" }) cached_response(request, extra_headers: { "vary" => "Accept" })
request2 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/html" }) request2 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/html" })
store.prepare(request2) store.prepare(request2)
assert !request2.headers.key?("if-none-match") assert !request2.headers.key?("if-none-match")
request3 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" }) request3 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
store.prepare(request3) store.prepare(request3)
assert request3.headers.key?("if-none-match") assert request3.headers.key?("if-none-match")
request4 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain", "user-agent" => "Linux Bowser" }) request4 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain", "user-agent" => "Linux Bowser" })
store.prepare(request4) store.prepare(request4)
assert request4.headers.key?("if-none-match") assert request4.headers.key?("if-none-match")
end end
def test_prepare_vary_asterisk def test_prepare_vary_asterisk
request = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" }) request = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
cached_response(request, { "vary" => "*" }) cached_response(request, extra_headers: { "vary" => "*" })
request2 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/html" }) request2 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/html" })
store.prepare(request2) store.prepare(request2)
assert !request2.headers.key?("if-none-match") assert !request2.headers.key?("if-none-match")
request3 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" }) request3 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
store.prepare(request3) store.prepare(request3)
assert request3.headers.key?("if-none-match") assert request3.headers.key?("if-none-match")
request4 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain", "user-agent" => "Linux Bowser" }) request4 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain", "user-agent" => "Linux Bowser" })
store.prepare(request4) store.prepare(request4)
assert !request4.headers.key?("if-none-match") assert !request4.headers.key?("if-none-match")
end end
private private
def request_class
@request_class ||= HTTPX.plugin(:response_cache).class.default_options.request_class
end
def response_class
@response_class ||= HTTPX.plugin(:response_cache).class.default_options.response_class
end
def store def store
@store ||= Plugins::ResponseCache::Store.new @store ||= Plugins::ResponseCache::Store.new
end end
def cached_response(request, extra_headers = {}) def cached_response(request, status: 200, extra_headers: {})
response = Response.new(request, 200, "2.0", { "etag" => "ETAG" }.merge(extra_headers)) response = response_class.new(request, status, "2.0", { "date" => Time.now.httpdate, "etag" => "ETAG" }.merge(extra_headers))
store.cache(request.uri, response) store.cache(request, response)
response response
end end
end end

View File

@ -1,4 +1,7 @@
- -
-
name: "0.20.4"
path: "0_20_4_md.html"
- -
name: "0.20.3" name: "0.20.3"
path: "0_20_3_md.html" path: "0_20_3_md.html"