Compare commits

...

8 Commits

Author SHA1 Message Date
HoneyryderChuck
da0ef24b09 bumped version to 0.24.4 2022-08-01 22:46:41 +01:00
HoneyryderChuck
8359d6b007 Merge branch 'issue-200' into 'master'
response_cache: fixes and improvements

Closes #200

See merge request honeyryderchuck/httpx!216
2022-08-01 17:50:21 +00:00
HoneyryderChuck
e691cbdf5e Merge branch 'github-issue-5' into 'master'
fix for loop on resolution and retry on new connection

See merge request honeyryderchuck/httpx!215
2022-08-01 17:49:45 +00:00
HoneyryderChuck
61c36c4ef9 response cache: caching several instances for the same URL
by relying on vary header, this should have the effect of not
overflowinng, and doing what the user wants.
2022-08-01 18:40:21 +01:00
HoneyryderChuck
f48f4e414a Fixes and improvements on the response_cache plugin
The following improvements were done:

* only cacheable status codes are allowed now (200, 203, 300, 301, 410)
* only responses considered fresh are cached; fresh response means:
  * no-store directive not present in cache-control
  * response hasn’t expired (when s-maxage, max-age or expires are
      present)
2022-08-01 17:40:05 +01:00
HoneyryderChuck
122b75a14c fixing hackernews script 2022-07-31 19:07:41 +01:00
HoneyryderChuck
b0777c61e5 fix for loop on resolution and retry on new connection
A certain behaviour was observed, when performing some tests using the
hackernews script, where after a failed request on a non-initiated
connection, a new DNS resolution would be emitted, although the
connection still had other IPs to try on. This led to a cascading
behaviour where the DNS response would fill up the connection with the
same repeated IPs and trigger coalescing, which would loop indefinitely
after emitting the resolve event.

This was fixed by not allowing DNS resolution on already resolved names,
to propagate to connections which already contain the advertised IPs.

This seems to address the github issue 5, which description matches the
observed behaviour.
2022-07-31 19:07:41 +01:00
HoneyryderChuck
32a81f2025 fix: response cache now also takes verb into account when caching
The previous strategy was working only with URLs. This strategy would
fall flat if the same url could be used with several HTTP verbs.
2022-07-31 17:03:30 +01:00
17 changed files with 318 additions and 71 deletions

View File

@ -116,6 +116,7 @@ group :assorted do
gem "pry-byebug", "~> 3.4.3"
else
gem "pry-byebug"
gem "debug" if RUBY_VERSION >= "3.1.0"
end
end
end

View File

@ -0,0 +1,17 @@
# 0.20.4
## Improvements
The `:response_cache` plugin is now more compliant with how the RFC 2616 defines which behaviour caches shall have:
* it caches only responses with one of the following status codes: 200, 203, 300, 301, 410.
* it discards cached responses which become stale.
* it supports "cache-control" header directives to decided when to cache, to store, what the response "age" is.
* it can cache more than one response for the same request, provided that the request presents different header values for the headers declared in the "vary" response header (previously, it was only caching the first response, and discarding the remainder).
## Bugfixes
* fixed DNS resolution bug which caused a loop when a failed connection attempt would cause a new DNS request to be triggered for the same domain, filling up and giving preference to the very IP which failed the attempt.
* response_cache: request verb is now taken into account, not causing HEAD/GET confusion for the same URL.

View File

@ -1,25 +1,36 @@
require "httpx"
require "oga"
http = HTTPX.plugin(:compression).plugin(:persistent).with(timeout: { operation_timeut: 5, connect_timeout: 5})
PAGES = (ARGV.first || 10).to_i
pages = PAGES.times.map do |page|
"https://news.ycombinator.com/?p=#{page+1}"
end
links = []
HTTPX.plugin(:compression).get(*pages).each_with_index.map do |response, i|
links = Array.new(PAGES) { [] }
Array(http.get(*pages)).each_with_index.map do |response, i|
if response.is_a?(HTTPX::ErrorResponse)
puts "error: #{response.error}"
next
end
html = Oga.parse_html(response.to_s)
page_links = html.css('.itemlist a.storylink').map{|link| link.get('href') }
# binding.irb
page_links = html.css('.itemlist a.titlelink').map{|link| link.get('href') }
puts "page(#{i+1}): #{page_links.size}"
if page_links.size == 0
puts "error(#{response.status}) on page #{i+1}"
end
links << page_links
# page_links.each do |link|
# puts "link: #{link}"
# links[i] << http.get(link)
# end
links[i].concat(http.get(*page_links))
end
links = links.flatten
puts "Pages: #{PAGES}\t Links: #{links.size}"
links = links.each_with_index do |pages, i|
puts "Page: #{i+1}\t Links: #{pages.size}"
pages.each do |page|
puts "URL: #{page.uri} (#{page.status})"
end
end

View File

@ -83,22 +83,41 @@ module HTTPX
end
module ArrayExtensions
refine Array do
module FilterMap
refine Array do
def filter_map
return to_enum(:filter_map) unless block_given?
def filter_map
return to_enum(:filter_map) unless block_given?
each_with_object([]) do |item, res|
processed = yield(item)
res << processed if processed
each_with_object([]) do |item, res|
processed = yield(item)
res << processed if processed
end
end
end unless Array.method_defined?(:filter_map)
end
def sum(accumulator = 0, &block)
values = block_given? ? map(&block) : self
values.inject(accumulator, :+)
module Sum
refine Array do
def sum(accumulator = 0, &block)
values = block_given? ? map(&block) : self
values.inject(accumulator, :+)
end
end unless Array.method_defined?(:sum)
end
module Intersect
refine Array do
def intersect?(arr)
if size < arr.size
smaller = self
else
smaller, arr = arr, self
end
(array & smaller).size > 0
end
end unless Array.method_defined?(:intersect?)
end
end
module IOExtensions

View File

@ -72,7 +72,7 @@ module HTTPX
end
module ResponseBodyMethods
using ArrayExtensions
using ArrayExtensions::FilterMap
attr_reader :encodings

View File

@ -9,7 +9,9 @@ module HTTPX
#
module ResponseCache
CACHEABLE_VERBS = %i[get head].freeze
CACHEABLE_STATUS_CODES = [200, 203, 206, 300, 301, 410].freeze
private_constant :CACHEABLE_VERBS
private_constant :CACHEABLE_STATUS_CODES
class << self
def load_dependencies(*)
@ -17,14 +19,28 @@ module HTTPX
end
def cacheable_request?(request)
CACHEABLE_VERBS.include?(request.verb)
CACHEABLE_VERBS.include?(request.verb) &&
(
!request.headers.key?("cache-control") || !request.headers.get("cache-control").include?("no-store")
)
end
def cacheable_response?(response)
response.is_a?(Response) &&
# partial responses shall not be cached, only full ones.
(
response.cache_control.nil? ||
# TODO: !response.cache_control.include?("private") && is shared cache
!response.cache_control.include?("no-store")
) &&
CACHEABLE_STATUS_CODES.include?(response.status) &&
# RFC 2616 13.4 - A response received with a status code of 200, 203, 206, 300, 301 or
# 410 MAY be stored by a cache and used in reply to a subsequent
# request, subject to the expiration mechanism, unless a cache-control
# directive prohibits caching. However, a cache that does not support
# the Range and Content-Range headers MUST NOT cache 206 (Partial
# Content) responses.
response.status != 206 && (
response.headers.key?("etag") || response.headers.key?("last-modified-at")
response.headers.key?("etag") || response.headers.key?("last-modified-at") || response.fresh?
)
end
@ -52,7 +68,7 @@ module HTTPX
def build_request(*)
request = super
return request unless ResponseCache.cacheable_request?(request) && @options.response_cache_store.cached?(request.uri)
return request unless ResponseCache.cacheable_request?(request) && @options.response_cache_store.cached?(request)
@options.response_cache_store.prepare(request)
@ -62,25 +78,99 @@ module HTTPX
def fetch_response(request, *)
response = super
if response && ResponseCache.cached_response?(response)
return unless response
if ResponseCache.cached_response?(response)
log { "returning cached response for #{request.uri}" }
cached_response = @options.response_cache_store.lookup(request.uri)
cached_response = @options.response_cache_store.lookup(request)
response.copy_from_cached(cached_response)
else
@options.response_cache_store.cache(request, response)
end
@options.response_cache_store.cache(request.uri, response) if response && ResponseCache.cacheable_response?(response)
response
end
end
module RequestMethods
def response_cache_key
@response_cache_key ||= Digest::SHA1.hexdigest("httpx-response-cache-#{@verb}#{@uri}")
end
end
module ResponseMethods
def copy_from_cached(other)
@body = other.body
@body.__send__(:rewind)
end
# A response is fresh if its age has not yet exceeded its freshness lifetime.
def fresh?
if cache_control
return false if cache_control.include?("no-cache")
# check age: max-age
max_age = cache_control.find { |directive| directive.start_with?("s-maxage") }
max_age ||= cache_control.find { |directive| directive.start_with?("max-age") }
max_age = max_age[/age=(\d+)/, 1] if max_age
max_age = max_age.to_i if max_age
return max_age > age if max_age
end
# check age: expires
if @headers.key?("expires")
begin
expires = Time.httpdate(@headers["expires"])
rescue ArgumentError
return true
end
return (expires - Time.now).to_i.positive?
end
true
end
def cache_control
return @cache_control if defined?(@cache_control)
@cache_control = begin
return unless @headers.key?("cache-control")
@headers["cache-control"].split(/ *, */)
end
end
def vary
return @vary if defined?(@vary)
@vary = begin
return unless @headers.key?("vary")
@headers["vary"].split(/ *, */)
end
end
private
def age
return @headers["age"].to_i if @headers.key?("age")
(Time.now - date).to_i
end
def date
@date ||= Time.httpdate(@headers["date"])
rescue NoMethodError, ArgumentError
Time.now.httpdate
end
end
end
register_plugin :response_cache, ResponseCache

View File

@ -13,35 +13,38 @@ module HTTPX::Plugins
@store = {}
end
def lookup(uri)
@store[uri]
def lookup(request)
responses = @store[request.response_cache_key]
return unless responses
response = responses.find(&method(:match_by_vary?).curry(2)[request])
return unless response && response.fresh?
response
end
def cached?(uri)
@store.key?(uri)
def cached?(request)
lookup(request)
end
def cache(uri, response)
@store[uri] = response
def cache(request, response)
return unless ResponseCache.cacheable_request?(request) && ResponseCache.cacheable_response?(response)
responses = (@store[request.response_cache_key] ||= [])
responses.reject!(&method(:match_by_vary?).curry(2)[request])
responses << response
end
def prepare(request)
cached_response = @store[request.uri]
cached_response = lookup(request)
return unless cached_response
original_request = cached_response.instance_variable_get(:@request)
if (vary = cached_response.headers["vary"])
if vary == "*"
return unless request.headers.same_headers?(original_request.headers)
else
return unless vary.split(/ *, */).all? do |cache_field|
cache_field.downcase!
!original_request.headers.key?(cache_field) || request.headers[cache_field] == original_request.headers[cache_field]
end
end
end
return unless match_by_vary?(request, cached_response)
if !request.headers.key?("if-modified-since") && (last_modified = cached_response.headers["last-modified"])
request.headers.add("if-modified-since", last_modified)
@ -51,6 +54,23 @@ module HTTPX::Plugins
request.headers.add("if-none-match", etag)
end
end
private
def match_by_vary?(request, response)
vary = response.vary
return true unless vary
original_request = response.instance_variable_get(:@request)
return request.headers.same_headers?(original_request.headers) if vary == %w[*]
vary.all? do |cache_field|
cache_field.downcase!
!original_request.headers.key?(cache_field) || request.headers[cache_field] == original_request.headers[cache_field]
end
end
end
end
end

View File

@ -7,7 +7,7 @@ require "httpx/resolver"
module HTTPX
class Pool
using ArrayExtensions
using ArrayExtensions::FilterMap
extend Forwardable
def_delegator :@timers, :after

View File

@ -6,7 +6,7 @@ require "resolv"
module HTTPX
class Resolver::Multi
include Callbacks
using ArrayExtensions
using ArrayExtensions::FilterMap
attr_reader :resolvers

View File

@ -8,6 +8,8 @@ module HTTPX
include Callbacks
include Loggable
using ArrayExtensions::Intersect
RECORD_TYPES = {
Socket::AF_INET6 => Resolv::DNS::Resource::IN::AAAA,
Socket::AF_INET => Resolv::DNS::Resource::IN::A,
@ -48,6 +50,10 @@ module HTTPX
addresses.map! do |address|
address.is_a?(IPAddr) ? address : IPAddr.new(address.to_s)
end
# double emission check
return if connection.addresses && !addresses.intersect?(connection.addresses)
log { "resolver: answer #{connection.origin.host}: #{addresses.inspect}" }
if @pool && # if triggered by early resolve, pool may not be here yet
!connection.io &&
@ -56,8 +62,12 @@ module HTTPX
addresses.first.to_s != connection.origin.host.to_s
log { "resolver: A response, applying resolution delay..." }
@pool.after(0.05) do
connection.addresses = addresses
emit(:resolve, connection)
# double emission check
unless connection.addresses && addresses.intersect?(connection.addresses)
connection.addresses = addresses
emit(:resolve, connection)
end
end
else
connection.addresses = addresses

View File

@ -310,9 +310,12 @@ module HTTPX
class ErrorResponse
include Loggable
extend Forwardable
attr_reader :request, :error
def_delegator :@request, :uri
def initialize(request, error, options)
@request = request
@error = error

View File

@ -9,7 +9,7 @@ module HTTPX::Transcoder
module_function
class Encoder
using HTTPX::ArrayExtensions
using HTTPX::ArrayExtensions::Sum
extend Forwardable
def_delegator :@raw, :to_s

View File

@ -1,5 +1,5 @@
# frozen_string_literal: true
module HTTPX
VERSION = "0.20.3"
VERSION = "0.20.4"
end

View File

@ -8,15 +8,19 @@ module HTTPX
def self?.cached_response?: (response response) -> bool
class Store
@store: Hash[URI::Generic, Response]
@store: Hash[String, Array[Response]]
def lookup: (URI::Generic uri) -> Response?
def lookup: (Request request) -> Response?
def cached?: (URI::Generic uri) -> bool
def cached?: (Request request) -> boolish
def cache: (URI::Generic uri, Response response) -> void
def cache: (Request request, Response response) -> void
def prepare: (Request request) -> void
private
def match_by_vary?: (Request request, Response response) -> bool
end
module InstanceMethods
@ -25,8 +29,24 @@ module HTTPX
def clear_response_cache: () -> void
end
module RequestMethods
def response_cache_key: () -> String
end
module ResponseMethods
def copy_from_cached: (Response other) -> void
def fresh?: () -> bool
def cache_control: () -> Array[String]?
def vary: () -> Array[String]?
private
def age: () -> Integer
def date: () -> Time
end
end

View File

@ -96,6 +96,7 @@ module HTTPX
class ErrorResponse
include _Response
include Loggable
extend Forwardable
@options: Options
@error: Exception
@ -104,6 +105,8 @@ module HTTPX
def status: () -> (Integer | _ToS)
def uri: () -> URI::Generic
private
def initialize: (Request, Exception, options) -> untyped

View File

@ -7,55 +7,105 @@ class ResponseCacheStoreTest < Minitest::Test
include HTTPX
def test_store_cache
request = Request.new(:get, "http://example.com/")
request = request_class.new(:get, "http://example.com/")
response = cached_response(request)
assert store.lookup(request.uri) == response
assert store.cached?(request.uri)
assert store.lookup(request) == response
assert store.cached?(request)
request2 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
assert store.lookup(request2.uri) == response
request2 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
assert store.lookup(request2) == response
request3 = request_class.new(:post, "http://example.com/", headers: { "accept" => "text/plain" })
assert store.lookup(request3).nil?
end
def test_store_error_status
request = request_class.new(:get, "http://example.com/")
_response = cached_response(request, status: 404)
assert !store.cached?(request)
_response = cached_response(request, status: 410)
assert store.cached?(request)
end
def test_store_no_store
request = request_class.new(:get, "http://example.com/")
_response = cached_response(request, extra_headers: { "cache-control" => "private, no-store" })
assert !store.cached?(request)
end
def test_store_maxage
request = request_class.new(:get, "http://example.com/")
response = cached_response(request, extra_headers: { "cache-control" => "max-age=2" })
assert store.lookup(request) == response
sleep(3)
assert store.lookup(request).nil?
request2 = request_class.new(:get, "http://example2.com/")
_response2 = cached_response(request2, extra_headers: { "cache-control" => "no-cache, max-age=2" })
assert store.lookup(request2).nil?
end
def test_store_expires
request = request_class.new(:get, "http://example.com/")
response = cached_response(request, extra_headers: { "expires" => (Time.now + 2).httpdate })
assert store.lookup(request) == response
sleep(3)
assert store.lookup(request).nil?
request2 = request_class.new(:get, "http://example2.com/")
_response2 = cached_response(request2, extra_headers: { "cache-control" => "no-cache", "expires" => (Time.now + 2).httpdate })
assert store.lookup(request2).nil?
end
def test_prepare_vary
request = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
cached_response(request, { "vary" => "Accept" })
request = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
cached_response(request, extra_headers: { "vary" => "Accept" })
request2 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/html" })
request2 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/html" })
store.prepare(request2)
assert !request2.headers.key?("if-none-match")
request3 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
request3 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
store.prepare(request3)
assert request3.headers.key?("if-none-match")
request4 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain", "user-agent" => "Linux Bowser" })
request4 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain", "user-agent" => "Linux Bowser" })
store.prepare(request4)
assert request4.headers.key?("if-none-match")
end
def test_prepare_vary_asterisk
request = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
cached_response(request, { "vary" => "*" })
request = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
cached_response(request, extra_headers: { "vary" => "*" })
request2 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/html" })
request2 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/html" })
store.prepare(request2)
assert !request2.headers.key?("if-none-match")
request3 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
request3 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain" })
store.prepare(request3)
assert request3.headers.key?("if-none-match")
request4 = Request.new(:get, "http://example.com/", headers: { "accept" => "text/plain", "user-agent" => "Linux Bowser" })
request4 = request_class.new(:get, "http://example.com/", headers: { "accept" => "text/plain", "user-agent" => "Linux Bowser" })
store.prepare(request4)
assert !request4.headers.key?("if-none-match")
end
private
def request_class
@request_class ||= HTTPX.plugin(:response_cache).class.default_options.request_class
end
def response_class
@response_class ||= HTTPX.plugin(:response_cache).class.default_options.response_class
end
def store
@store ||= Plugins::ResponseCache::Store.new
end
def cached_response(request, extra_headers = {})
response = Response.new(request, 200, "2.0", { "etag" => "ETAG" }.merge(extra_headers))
store.cache(request.uri, response)
def cached_response(request, status: 200, extra_headers: {})
response = response_class.new(request, status, "2.0", { "date" => Time.now.httpdate, "etag" => "ETAG" }.merge(extra_headers))
store.cache(request, response)
response
end
end

View File

@ -1,4 +1,7 @@
-
-
name: "0.20.4"
path: "0_20_4_md.html"
-
name: "0.20.3"
path: "0_20_3_md.html"