mirror of
https://github.com/HoneyryderChuck/httpx.git
synced 2025-10-05 00:02:38 -04:00
Merge branch 'blog' into 'master'
post See merge request honeyryderchuck/httpx!160
This commit is contained in:
commit
f18da78832
@ -20,8 +20,6 @@ variables:
|
||||
stage: test
|
||||
services:
|
||||
- docker:dind
|
||||
except:
|
||||
- blog
|
||||
artifacts:
|
||||
paths:
|
||||
- coverage/
|
||||
@ -115,8 +113,6 @@ coverage:
|
||||
stage: prepare
|
||||
variables:
|
||||
BUNDLE_WITHOUT: test:website:assorted
|
||||
except:
|
||||
- blog
|
||||
|
||||
image: "ruby:3.0-alpine"
|
||||
script:
|
||||
|
@ -10,6 +10,13 @@ links:
|
||||
gitlab: https://gitlab.com/honeyryderchuck/httpx
|
||||
rubygems: https://rubygems.org
|
||||
|
||||
keywords:
|
||||
- httpx
|
||||
- ruby
|
||||
- documentation
|
||||
- wiki
|
||||
- HTTP/2
|
||||
|
||||
pagination:
|
||||
enabled: true
|
||||
debug: true
|
||||
|
@ -1,7 +1,8 @@
|
||||
<head>
|
||||
<link href="http://gmpg.org/xfn/11" rel="profile">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta charset="UTF-8">
|
||||
<meta http-equiv="content-type" content="text/html; charset=utf-8">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<link href="http://gmpg.org/xfn/11" rel="profile">
|
||||
|
||||
<!-- Enable responsiveness on mobile devices-->
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">
|
||||
@ -13,6 +14,9 @@
|
||||
{{ page.title }} · {{ site.title }}
|
||||
{% endif %}
|
||||
</title>
|
||||
<link href="{{ site.baseurl }}" rel="canonical">
|
||||
<link href="{{ site.description }}" rel="description">
|
||||
<meta name="keywords" content="{{ site.keywords | join: ', ' | append: ', ' | append: page.keywords }}" />
|
||||
|
||||
<!-- CSS -->
|
||||
<link rel="stylesheet" href="{{ '/styles/poole.css' | prepend: site.baseurl }}">
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: Welcome to HTTPX the blog
|
||||
keywords: introduction, first post
|
||||
---
|
||||
|
||||
First of all, welcome. This is the first post about HTTPX, the ruby http client library for the future.
|
||||
@ -11,4 +12,3 @@ but a milestone has been reached: httpx is now part of the [awesome-ruby resourc
|
||||
|
||||
|
||||
So long folks. Be excellent to each other.
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: Falacies about HTTP
|
||||
keywords: HTTP, protocol, mistakes, advanced, streaming, upgrade
|
||||
---
|
||||
|
||||
When I first started working on `httpx`, I wanted to support as many HTTP features and corner-cases as possible. Although I wasn't exhaustively devouring the RFCs looking for things to implement, I was rather hoping that my experience with and knowledge about different http tools (cURL, postman, different http libraries from different languages) could help me narrow them down.
|
||||
@ -16,11 +17,11 @@ Some of these packages were probably created by the same developers mentioned ab
|
||||
|
||||
One of the most spread-out axioms of HTTP is that it is a "request-response" protocol. And in a way, this might have been the way it was designed in the beginning: send a request, receive a response. However, things started getting more complicated.
|
||||
|
||||
First, redirects came along. A request would be thrown, a response would come back, but "oh crap!", it has status code 302, 301, the new "Location" is there, so let me send a new request to the new request. It could be quite a few "hops" (see how high level protocols tend to re-use concepts from lower level protocols) until we would get to our resource with a status code 2XX. What is the response in this case?
|
||||
First, redirects came along. A request would be thrown, a response would come back, but "oh crap!", it has status code 302, 301, the new "Location" is there, so let me send a new request to the new request. It could be quite a few "hops" (see how high level protocols tend to re-use concepts from lower level protocols) until we would get to our resource with a status code 2XX. What is the response in this case?
|
||||
|
||||
But this is the simplest bending of "request-response". Then HTTP started being used to upload files. Quite good at it actually, but people started noticing that waiting for the whole request to be sent to then fail on an authentication failure was not a great use of the resources at our disposal. Along came: 100 Continue. In this variant, a Request would send the Headers frame with the "Expect: 100-continue" header, wait on a response from the server, and if this had status code 100, then the Body frame would be sent, and then we would get our final response. So, I count two responses for that interaction. Nevermind that a lot of servers don't implement it (cURL, for instance, sends the body frame if the server doesn't send a response after a few seconds, to circumvent this).
|
||||
But this is the simplest bending of "request-response". Then HTTP started being used to upload files. Quite good at it actually, but people started noticing that waiting for the whole request to be sent to then fail on an authentication failure was not a great use of the resources at our disposal. Along came: 100 Continue. In this variant, a Request would send the Headers frame with the "Expect: 100-continue" header, wait on a response from the server, and if this had status code 100, then the Body frame would be sent, and then we would get our final response. So, I count two responses for that interaction. Nevermind that a lot of servers don't implement it (cURL, for instance, sends the body frame if the server doesn't send a response after a few seconds, to circumvent this).
|
||||
|
||||
Or take HTTP CONNECT tunnels: In order to send our desired request, we have to first send an HTTP Connect request, receive a successful response (tunnel established) then send our request and get our response.
|
||||
Or take HTTP CONNECT tunnels: In order to send our desired request, we have to first send an HTTP Connect request, receive a successful response (tunnel established) then send our request and get our response.
|
||||
|
||||
But one could argue that, for all of the examples above, usually there is a final desired response for a request. So what?
|
||||
|
||||
@ -42,7 +43,7 @@ And along came HTTP/2, and TPC-to-HTTP mapping was never the same. Multiple requ
|
||||
|
||||
Many of these improvements have benefitted browsers first and foremost, and things have evolved to minimize the number of network interactions necessary to render an HTML page. HTTP/2 having decreased the number of TCP connections necessary, HTTP/3 will aim at decreasing the number of round-trips necessary. All of this without breaking request and response semantics.
|
||||
|
||||
Most of these things aren't as relevant when all you want is send a notification request to a third-party. Therefore, most client implementations choose not to implement most of these semantics. And most are fine implementing "open socket, write request, read response, close socket".
|
||||
Most of these things aren't as relevant when all you want is send a notification request to a third-party. Therefore, most client implementations choose not to implement most of these semantics. And most are fine implementing "open socket, write request, read response, close socket".
|
||||
|
||||
Ruby's `net-http` by default closes the TCP socket after receiving the response (even sending the `Connection: close` header). It does implement keep-alive, but this requires a bit more set-up.
|
||||
|
||||
@ -116,7 +117,7 @@ GIF87a.............,...........D..;
|
||||
Later, an addition to HTTP was made: Trailer headers. These are defined as headers which are sent by the peer **after** the data has been transmitted. Its main benefits are beyond the scope of this mention, but this fundamentally changed the expectation of what an HTTP message looks like: after all, headers can be transmitted before and after the data.
|
||||
|
||||
|
||||
A lot of client implementations re-use an already existing HTTP parser. Others write their own. I've seen very few supporting trailer headers. I don't know of any, other than `httpx`, that does (and `httpx` only reliably supports it since ditching `http_parser.rb`, ruby bindings for an outdated version of the node HTTP parser). I also don't know of any in python. Go's `net/http` client supports it.
|
||||
A lot of client implementations re-use an already existing HTTP parser. Others write their own. I've seen very few supporting trailer headers. I don't know of any, other than `httpx`, that does (and `httpx` only reliably supports it since ditching `http_parser.rb`, ruby bindings for an outdated version of the node HTTP parser). I also don't know of any in python. Go's `net/http` client supports it.
|
||||
|
||||
5. HTTP Bytes are readable
|
||||
|
||||
@ -146,12 +147,12 @@ This is not to say that you should not react on data frames sent, but usually a
|
||||
|
||||
Besides, if you're using HTTP/2, there is no other chance: unless you can guarantee that there's only one socket for one HTTP/2 connection, you can't just read chunks from it. And even if you can, reading a data chunk involves so much ceremony (flow control, other streams, etc...) that you might as well end up regretting using it in the first place.
|
||||
|
||||
Client implementations that map a 1-to-1 relationship between socket and HTTP connection are able to provide such an API, but won't save you from the trouble. If connections hang from the server, time out, or you get blocked from accessing an origin, consider switching.
|
||||
Client implementations that map a 1-to-1 relationship between socket and HTTP connection are able to provide such an API, but won't save you from the trouble. If connections hang from the server, time out, or you get blocked from accessing an origin, consider switching.
|
||||
|
||||
|
||||
7. Using HTTP as a transport "dumb pipe"
|
||||
|
||||
According to the OSI model, HTTP belongs to layer 7, to the so called application protocols. These are perceived as the higher-level interfaces which programs use to communicate among each other over the network. HTTP is actually a very feature-rich protocol, supporting feature like content-negotiation, caching, virtual hosting, cross-origin resource sharing, tunneling, load balancing, the list goes on.
|
||||
According to the OSI model, HTTP belongs to layer 7, to the so called application protocols. These are perceived as the higher-level interfaces which programs use to communicate among each other over the network. HTTP is actually a very feature-rich protocol, supporting feature like content-negotiation, caching, virtual hosting, cross-origin resource sharing, tunneling, load balancing, the list goes on.
|
||||
|
||||
However, most client use HTTP as a dump pipe where data is sent and received, as if it were a plain TCP stream.
|
||||
|
||||
@ -165,7 +166,7 @@ Even cURL is partially to blame: it is probably the most widely used and deploye
|
||||
|
||||
You're a) not negotiation payload compression; b) not checking if a cached version of the resource is still up-to-date. Can you do it with cURL? Yes. Do you have to be verbose to do it? Pretty much.
|
||||
|
||||
Most 3rd-party JSON API SDKs suffer from this issue, because the underlying library is not doing these things. The only reason why we're sending JSON over HTTP is because proxies have to be bypassed, but it is done in an inefficient way.
|
||||
Most 3rd-party JSON API SDKs suffer from this issue, because the underlying library is not doing these things. The only reason why we're sending JSON over HTTP is because proxies have to be bypassed, but it is done in an inefficient way.
|
||||
|
||||
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: Enumerable IO Streams
|
||||
keyowrds: IO, enumerable, streaming, API
|
||||
---
|
||||
|
||||
I've been recently working on CSV generation with ruby in my day job, in order to solve a bottleneck we found because of a DB table, whose number or rows grew too large for the infrastructure to handle our poorly optimized code. This led me in a journey of discovery on how to use and play with raw ruby APIs to solve a complex problem.
|
||||
@ -107,7 +108,7 @@ An IO-like object must implement a few methods to be usable by certain functions
|
||||
|
||||
You know some of ruby's classes which implement a few (some, all) of these APIs: `File`, `TCPSocket`, and the aforementioned `StringIO`.
|
||||
|
||||
A few ruby APIs expect arguments which implement the IO interface, but aren't necessarily instances of IO.
|
||||
A few ruby APIs expect arguments which implement the IO interface, but aren't necessarily instances of IO.
|
||||
|
||||
* `IO.select` can be passed IO wrappers
|
||||
* `IO.copy_stream(src, dst)`, takes an IO reader and an IO writer as arguments.
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: 10 things a library needs
|
||||
keywords: top 10, components, library, maintenance, testing, CI, dependencies
|
||||
---
|
||||
|
||||
When I first started working on `httpx`, the mission was clear: make an easy-to-use open-source HTTP client that could support both legacy and future versions of the protocol, and all, or at least most, of its quirks.
|
||||
@ -143,7 +144,7 @@ I started receiving bug reports which didn't seem to come from `httpx` itself. A
|
||||
|
||||
So I decided to fork `http-2` and release it as `http-2-next` (no, I'm not writing an HTTP/2 parser from scratch, ahahahah). The result was a parser that passes all specs of the [`h2spec` suite](https://github.com/summerwind/h2spec). It's probably not gonna stop there, but it's pretty good for now.
|
||||
|
||||
So now I own own the runtime dependencies of `httpx` I started with, so a lot of worries I used to have about external dependencies (whether API breaks, bugs aren't fixed or project is abandoned) are not concerns anymore.
|
||||
So now I own own the runtime dependencies of `httpx` I started with, so a lot of worries I used to have about external dependencies (whether API breaks, bugs aren't fixed or project is abandoned) are not concerns anymore.
|
||||
|
||||
## 6. Forwards compatibility
|
||||
|
||||
@ -181,7 +182,7 @@ A user of your library works in a completely different setup from you. Not only
|
||||
|
||||
Github tried to solve this with templates. And most projects took them to a level of detail, that they've become a separate form no one has the time nor the desire to fill up. What version? What CPU? What browser? Templates can become just another filter limiting the pool of users who want to reach out to you.
|
||||
|
||||
Asking for a stacktrace from the get-go can be invaluable. Some users struggle, but most of them know how to get one. Asking for a reproducible script might help, but sometimes the error lies so deep in the logic of the user's application, that asking him to take it out of its context not only is an awful lot of work, it might even mask the error.
|
||||
Asking for a stacktrace from the get-go can be invaluable. Some users struggle, but most of them know how to get one. Asking for a reproducible script might help, but sometimes the error lies so deep in the logic of the user's application, that asking him to take it out of its context not only is an awful lot of work, it might even mask the error.
|
||||
|
||||
Finding an error that happened to a remote user boils down to know 1) when the problem happened, and 2) what was the state of the world at the time. Stacktraces help with the former, but not with the latter.
|
||||
|
||||
@ -237,4 +238,4 @@ Now pause for a second. Stripe. AWS. Both serve companies from all sizes. Both s
|
||||
|
||||
## Conclusion
|
||||
|
||||
These 10 practices are not to be taken as commandments (I did mention when I couldn't follow them), but they help me maintaining a fairly wide and complex set of features with no budget. And that's the key aspect of this: Open Source projects are not just about writing code; in order to survive long term, they must excel at communication, collaboration, education. And that's the hardest task.
|
||||
These 10 practices are not to be taken as commandments (I did mention when I couldn't follow them), but they help me maintaining a fairly wide and complex set of features with no budget. And that's the key aspect of this: Open Source projects are not just about writing code; in order to survive long term, they must excel at communication, collaboration, education. And that's the hardest task.
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: Ramblings about initial design decisions, internals, and devise
|
||||
keywords: authentication, plugins, devise, rodauth, extensibility, features, OTP, modern
|
||||
---
|
||||
|
||||
Yesterday I was reading [this twitter thread](https://twitter.com/jankomarohnic/status/1286640026588151808), where [Janko Marohnić](https://twitter.com/jankomarohnic), the maintainer of [Shrine](https://github.com/shrinerb/shrine), who has recently [integrated rodauth in Rails](https://github.com/janko/rodauth-rails) and is preparing a series of articles about it, describes the internals of [devise](https://github.com/heartcombo/devise), the most popular authentication gem for Ruby on Rails, as "making every mistake in the book", claiming that [rodauth](https://github.com/jeremyevans/rodauth), the most advanced authentication framework for ruby, is much better because its internals are "easier to understand", thereby sparking some controversy and replies, with some people taking issue with these claims, and also with his approach of criticizing another gem because of "look how awful its internals look like".
|
||||
@ -57,7 +58,7 @@ So why are people making a case against it? Why go with `rodauth` instead?
|
||||

|
||||
|
||||
|
||||
The vision for `devise` was fully accomplished by 2010: a no-friction email/password authentication add-on for Ruby on Rails.
|
||||
The vision for `devise` was fully accomplished by 2010: a no-friction email/password authentication add-on for Ruby on Rails.
|
||||
|
||||
In hindsight, I don't think that anyone in 2009 could anticipate today's practices: microservices, SPAs, Mobile applications, cross-device platforms... and authentication also evolved: phone numbers instead of email accounts, multi-factor authentication, SMS tokens, JWTs, OTPs, OpenID, SAML, Yubikeys, Webauthn... and stakes are higher, especially since [Edward Snowden and PRISM proved that theoretically breaking into accounts isn't so theoretical after all](https://en.wikipedia.org/wiki/PRISM_%28surveillance_program%29).
|
||||
|
||||
@ -89,7 +90,7 @@ Looking at the [CHANGELOG](https://github.com/heartcombo/devise/blob/master/CHAN
|
||||
It does seem that the main concern has been on stability rather than new features. Which I can relate, breaking other people's integration does suck. But is this by design? Is `devise` feature-complete? Did it achieve all its intended initial goals, that nothing is left beyond maintaining it for the community? Is the refactoring of its internals necessary to build new features? Would less logic in models and less AR callbacks help develop new features? I guess only the core maintenanceship can answer that.
|
||||
|
||||
|
||||
But it does feel that `devise` is legacy software.
|
||||
But it does feel that `devise` is legacy software.
|
||||
|
||||
|
||||
## To infinity... and beyond!
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: Ruby 2 features, and why I avoid keyword arguments
|
||||
keywords: ruby 2, keyword arguments, inconsistencies
|
||||
---
|
||||
|
||||
Some politician one day said "may you live in interesting times". Recently, it was announced that the next ruby release will be the long-awaited [next major ruby version, ruby 3](https://github.com/ruby/ruby/commit/21c62fb670b1646c5051a46d29081523cd782f11). That's pretty interesting, if you ask me.
|
||||
@ -172,5 +173,3 @@ Do keyword argument-only method signatures get more maintainable? I guess you'll
|
||||
Reality check time: keyword arguments aren't going anywhere any time soon. They're a language feature now. Some people like it, inconsistencies notwithstanding. And let's face it, where's the value in removing keyword arguments from your projects? Code is running. Don't change it! I know I didn't. [ruby-netsnmp, a gem I maintain, still uses keyword arguments](https://github.com/swisscom/ruby-netsnmp/blob/master/lib/netsnmp/client.rb#L26), and that ain't gonna change, not by my hand at least.
|
||||
|
||||
But if you're authoring a new gem in 2020, or writing new code in the application you work at daily, do consider the advice: avoid keyword arguments. Your future upgrading self will thank you.
|
||||
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: Ruby 2 features, and why I really like refinements
|
||||
keywords: ruby 2, refinements, extensions
|
||||
---
|
||||
|
||||
This is my second entry in a series of thought-pieces around features and enhancements in the ruby 2 series, initiated in the Christmas of 2013, and now scheduled for termination in the Christmas of 2020, when ruby 3 is expected to be released. It's supposed to be about the good, the bad and the ugly. It's not to be read as "preachy", although you're definitely entitled not to like what I write, to what I just say "that's just my opinion", and "I totally respect that you have a different one".
|
||||
@ -47,7 +48,7 @@ module Refined
|
||||
using Plus
|
||||
1 + 2 #=> "1+2"
|
||||
end
|
||||
1 + 2 #=> 3
|
||||
1 + 2 #=> 3
|
||||
```
|
||||
|
||||
|
||||
@ -117,4 +118,3 @@ This is refinements at its finest.
|
||||
Refinements are a great way to express your individuality and perception of the world, while not shoveling that perception of the world onto your users; a safe way for you to experiment; and a great way to keep backwards compatibility, and by extension, your end users happy.
|
||||
|
||||
Unfortunately they will never accomplish its main goal, which was to "fix" Active Support. But maybe Active Support was never meant to be fixed, and that's all right. Refinements have to keep moving forward, and so do we. Hopefully away from Active Support.
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: Gitlab CI suite optimizations, and how I decreased the carbon footprint of my CI suite
|
||||
keywords: CI, Gitlab CI, caching, optimizations, fork-join
|
||||
---
|
||||
|
||||
|
||||
@ -304,5 +305,3 @@ And that concludes my "jesus complex" post of the month.
|
||||
<iframe src="https://www.youtube.com/embed/6wbaGf4fU9w" frameborder="0" allowfullscreen="true"> </iframe>
|
||||
</figure>
|
||||
<!-- blank line -->
|
||||
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: HTTPX multipart plugin, lessons in the wild
|
||||
keywords: multipart, dependencies, enumerable readers
|
||||
---
|
||||
|
||||
Some say that open source maintainers should "dogfood" their own software, in order to "feel the pain" of its users, and apply their lessons learned "from the trenches" in order to improve it. Given that no one else is better positioned to make improvements big and small, it's hard to dispute against this.
|
||||
@ -23,7 +24,7 @@ Where I work, we handle a lot of media files, such as photos and videos. Users u
|
||||
|
||||
Some tasks often involve uploading data in bulk. And although uploading photos is not a big deal, uploading videos is, as some information about what happens in the video must also be transferred, as extra metadata, in a widely known encoding format.
|
||||
|
||||
And for that, the `multipart/form-data` media type is used.
|
||||
And for that, the `multipart/form-data` media type is used.
|
||||
|
||||
## multipart/form-data
|
||||
|
||||
@ -66,7 +67,7 @@ POST /upload-picture HTTP/1.1
|
||||
Content-Type: multipart/form-data; boundary=--abc123--
|
||||
....
|
||||
|
||||
--abc123--
|
||||
--abc123--
|
||||
Content-Disposition: form-data; name="file"; filename="in-the-shower.jpg"
|
||||
Content-Type: image/jpeg
|
||||
|
||||
@ -134,7 +135,7 @@ session = HTTPX.plugin(:multipart)
|
||||
|
||||
opts = {
|
||||
metadata1: HTTP::Part.new(
|
||||
JSON.encode(foo: "bar"),
|
||||
JSON.encode(foo: "bar"),
|
||||
content_type: "application/json"
|
||||
),
|
||||
metadata2: HTTP::Part.new(
|
||||
@ -208,4 +209,3 @@ So I decided to do something about it. Firstly, I updated the wiki from the `mul
|
||||
Second, I'll probably be replacing `http-form_data` at some point in the future. As a dependency should, it served me quite well, and allowed me to iterate quickly and concentrate on other parts of the project. I've even contributed to it. But at this point, owning the multipart encoding is something I'd like to keep closer to the project, and I think that I know enough about multipart requests by now, to warrant "writing my own" and kill yet another dependency. It'll not happen immediately, though.
|
||||
|
||||
Lastly, I'll try to make a better job at talking about my work, so that hopefully my co-workers one day ask me if I know `httpx`. And it starts with this post.
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: RBS, duck-typing, meta-programming, and typing at httpx
|
||||
keywords: RBS, type checking, duck typing, type syntax, runtime type checking
|
||||
---
|
||||
|
||||
Ruby 3 is just around the corner, and with the recent release candidate, there's been some experimentation in the ruby community, along with the usual posts, comments and unavoidable rants.
|
||||
@ -16,7 +17,7 @@ Ruby 3 has 3 major features:
|
||||
|
||||
From the point of view of `httpx`, JIT is implicit, and Ractors won't do much for it (although I have to make sure if calls can be made across ractors). The autofibers feature seems to be interesting, and will be experimented with at some point.
|
||||
|
||||
But typing is where, IMO, a library like `httpx` can immediately get the most benefit from.
|
||||
But typing is where, IMO, a library like `httpx` can immediately get the most benefit from.
|
||||
|
||||
Typing is a very controversial topic in the Ruby community. Most of us started our journey as Ruby developers by running away from statically-typed languages (mostly Java), and fell in love with the quick feedback loop and fast prototyping that Ruby, and its lack of typing, enabled. Over time, we've crossed the ["Peak of inflated Expectations" all the way into the "Through of Disillusionnment"](https://en.wikipedia.org/wiki/Hype_cycle), where the monolithic codebases most of us find ourselves working in, fail in the most unpredictable ways due to runtime errors out of the happy path (`NoMethodError`s everywhere), and the act of simply updating external dependencies, let alone big refactorings, introduces so much risk, that most businesses prefer to halt upgrades indefinitely, until it's 2020 and you're still running Rails 2 in production.
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: 2020, a year in review
|
||||
keywords: year in review, happy new year
|
||||
---
|
||||
|
||||
2020 has been an incredibly changing year for humanity. Being thrown head-on into a pandemic no one was prepared to deal with, we were forced to clumsily speed-up the transition into the digital age, being more and more dependent of digital identity, online shopping, remote work, and more gifs and memes than we could ever imagine could exist in the ether. Commuting fatigue was replaced by "notifications" fatigue. For all of its faults, the internet backbone managed to assimilate annd withstand way more activity than naysayers ever thought it was prepared for, and I can tell for a fact that I experience way less interruptions in video calls in 2020 than I used to in 2018. We were unfortunately forced to keep in touch with our loved ones at a "safety distance", in most cases a video chat. I just hope that, whenever we're done with this state of affairs, we can retain the good habits, while swiftly eliminating the bad.
|
||||
@ -51,7 +52,7 @@ I've also planned to prepare for its release, and looked at `rbs` in order to im
|
||||
|
||||
The runtime type checking layer, which runs alongside the tests, helped fix some critical issues as well.
|
||||
|
||||
Since v0.10.0, `httpx` ships with `rbs` type signatures.
|
||||
Since v0.10.0, `httpx` ships with `rbs` type signatures.
|
||||
|
||||
The tests also run in "GC auto compact" mode.
|
||||
|
||||
@ -81,7 +82,7 @@ Support for the `ORIGIN` frame was added. It was a bittersweet endeavour though,
|
||||
|
||||
### Improvements
|
||||
|
||||
Support for more recent rubies, including preparing for ruby 3 and RBS signatures, has been added. Overall, this library tries to use more performant ruby APIs than its parent project, although, to be fair, it'll never compare to a C parser such as `nghttpx`.
|
||||
Support for more recent rubies, including preparing for ruby 3 and RBS signatures, has been added. Overall, this library tries to use more performant ruby APIs than its parent project, although, to be fair, it'll never compare to a C parser such as `nghttpx`.
|
||||
|
||||
### Going forward
|
||||
|
||||
@ -169,7 +170,7 @@ And then I wanted to merge coverage results. Except, you can't. So, you have thi
|
||||
|
||||
So while I did manage to take a page from all those PRs and migrate the tests, I still don't have a worthy coverage report to show for.
|
||||
|
||||
[I asked for help in a community forum, since Github makes it so hard to ask for help or questions](https://github.community/t/workflow-run-not-triggering-for-matrix-job/150204). Still waiting for a reply though.
|
||||
[I asked for help in a community forum, since Github makes it so hard to ask for help or questions](https://github.community/t/workflow-run-not-triggering-for-matrix-job/150204). Still waiting for a reply though.
|
||||
|
||||
All in all, Github Actions seems to fit application flows better (test-build-deploy) than libraries. So yeah, stick to Gitlab if you don't want to dea so much with the "side-stuff".
|
||||
|
||||
@ -182,17 +183,3 @@ A thing I've been working in is [a MIB parser]|(https://github.com/swisscom/ruby
|
||||
|
||||
|
||||
That's it folks. Stay healthy!
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: HTTPX AWS Sigv4 plugin - Use cases
|
||||
keywords: aws sdk, sigv4, implementation
|
||||
---
|
||||
|
||||
|
||||
@ -90,5 +91,3 @@ Also, data migrations. When moving data from AWS to GCP, and from GCP to Rackspa
|
||||
The AWS Sigv4 plugins are just another layer in the `httpx` "swiss army knife". Hope it'll be of use to someone, as it'll be for myself (I'll be sure to integrate it in some of the S3 integrations I maintain).
|
||||
|
||||
Hack on.
|
||||
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: How to build an OIDC provider using rodauth-oauth on Rails
|
||||
keywords: rodauth-oauth, rodauth, rails, rodauth-rails, OAuth2, OIDC, OIDC Connect, tutorial
|
||||
---
|
||||
|
||||
One of my most recent ruby open-source projects is [rodauth-oauth](https://honeyryderchuck.gitlab.io/rodauth-oauth), a rack-based toolkit to help easily build OAuth and OpenID Connect providers, built on top of [rodauth](http://rodauth.jeremyevans.net/) (the most advanced authentication provider library for ruby). I summarized my [initial motivation for "rolling my own" in the project Wiki](https://honeyryderchuck.gitlab.io/rodauth-oauth/wiki/FAQ), namely the lack of a decent framework-agnostic alternative (I didn't want to have to use Rails), and what I perceived as the limitations of the "de-facto" OAuth provider Rails extension, "doorkeeper".
|
||||
@ -366,7 +367,7 @@ before_authorize do
|
||||
require_two_factor_authenticated
|
||||
end
|
||||
|
||||
```
|
||||
```
|
||||
|
||||
So now I setup TOTP in my test account:
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: Introducing idnx
|
||||
keywords: new gem, idnx, IDNA, IDNA 2008, punycode, libidn2, windows, winAPI, mac OS, linux
|
||||
---
|
||||
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
layout: post
|
||||
title: HTTPX responses can be pattern matched
|
||||
keywords: pattern matching, HTTP responses, status codes, headers, body
|
||||
---
|
||||
|
||||
|
||||
|
201
www/_posts/2021-08-26-tensorflow-serving-with-ruby.md
Normal file
201
www/_posts/2021-08-26-tensorflow-serving-with-ruby.md
Normal file
@ -0,0 +1,201 @@
|
||||
---
|
||||
layout: post
|
||||
title: Tensorflow Serving with Ruby
|
||||
keywords: grpc, tensorflow, machine learning
|
||||
---
|
||||
|
||||
The [Tensorflow framework](https://www.tensorflow.org/) is the most used framework when it comes to develop, train and deploy Machine Learning models. It ships with first class API support for `python` and `C++`, the former being a favourite of most data scientists, which explains the pervasiveness of `python` in virtually all of the companies relying on ML for their products.
|
||||
|
||||
When it comes to deploying ML-based web services, there are two options. The first one is to develop a `python` web service, using something like `flask` or `django`, add `tensorflow` as a dependency, and run the model from within it. This approach is straightforward, but it comes with its own set of problems: rolling out model upgrades has to be done for each application using it, and even ensuring that the same `tensorflow` library version is used everywhere tends to be difficult, it being a pretty heavy dependency, which often conflicts with other libraries in the python ecosystem, and is frequently the subject of CVEs. All of this introduces risk in the long run.
|
||||
|
||||
The other approach is to deploy the models using [Tensorflow Serving](https://www.tensorflow.org/tfx/guide/serving) ([pytorch has something similar, torchserve](https://pytorch.org/serve/inference_api.html)). In short, it exposes the execution of the ML models over the network "as a service". It supports model versioning, and can be interfaced with via gRPC or REST API, which solves the main integration issues from the previously described approach. It thus allows to compartimentalize the risks from the other approach, while also enabling the possibilitiy of throwing dedicated hardware at it.
|
||||
|
||||
It also allows you to ditch `python` when building applications.
|
||||
|
||||
### Research and Development
|
||||
|
||||
Now, I'm not a `python` hater. It's an accessible programming language. It shares a lot of benefits and drawbacks with `ruby`. But by the time a company decides to invest in ML to improve their product, the tech team might already be heavily familiar with a different tech stack. Maybe it's `ruby`, maybe `java`, maybe `go`. It's unreasonable to replace all of them with `python` experts. It's possible to ask them to use a bit of `python`, but that comes at the cost of learning a new stack (thereby decreasing quality of delivery) and alienating the employees (thereby increasing turnover).
|
||||
|
||||
It's also unreasonable to ask from the new data science team to not use their preferred `python` tech stack. It's an ML *lingua franca*, and there's way more years of investment and resources poured into libraries like [numpy](https://numpy.org/) or [scikit](https://scikit-learn.org/stable/index.html). And although there's definitely value in improving the state of ML in your preferred languages (shout out at the [SciRuby](http://sciruby.com/) folks) and diminish the overall industry dependency on `python`, that should not come at the cost of decreasing the quality of your product.
|
||||
|
||||
Therefore, `tensorflow-serving` allows the tech team to focus on developing and shipping the best possible product, and the research team to focus on developing the best possible models. Everyone's productive and happy.
|
||||
|
||||
### Tensorflow Serving with JSON
|
||||
|
||||
As stated above, `tensorflow serving` services are exposed using `gRPC` and REST APIs. IF you didn't use `gRPC` before, you'll probably privilege the latter; you've done HTTP JSON clients for other APIs before, how hard can it be creating an HTTP client for it?
|
||||
|
||||
While certainly possible, going this route will come at a cost; besides ensuring that the HTTP layer works reliably, using persistent connections, timeouts, etc, there's the cost of JSON.
|
||||
|
||||
`tensorflow` (and other ML frameworks in general) makes heavy use of "tensors", multi-dimensional same-type arrays (vectors, matrixes...), describing, for example, the coordinates of a face recognized in an image. These tensors are represented in memory as contiguous array objects, and can be therefore easily serialized into a bytestream. Libraries like `numpy` (or `numo` in ruby) take advantage of this memory layout to provide high-performance mathematical and logical operations.
|
||||
|
||||
JSON is UTF-8, and can't encode byte streams; in order to send and receive byte streams using the REST API interface, you'll have to convert to and from base 64 notation. This means that, besides the CPU usage overhead for these operations, you should expect a ~33% increase in the transmitted payload.
|
||||
|
||||
The `tensorflow-serving` REST API proxies to the `gRPC` layer, so there's also this extra level of indirection to account for.
|
||||
|
||||
`gRPC` doesn't suffer from these drawbacks; on top of `HTTP/2`, it not only improves connnectivity, it also solves multiplexing and streaming; using `protobufs`, it has a typed message serialization protocol which supports byte streams.
|
||||
|
||||
How can it be used in `ruby` then?
|
||||
|
||||
### Tensorflow Serving with Protobufs
|
||||
|
||||
Tensorflow Serving calls are performed using a standardized set of common protobufs, which `.proto` definitions can be found both in the [tensorflow](https://github.com/tensorflow/tensorflow) repo, as well as in the [tensorflow-serving](https://github.com/tensorflow/serving) repo. The most important for our case are declared under [prediction_service.proto](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_service.proto), which defines request and response protobufs declaring which model version to run, and how input and output tensors are laid out.
|
||||
|
||||
Both libraries above already package the `python` protobufs. To use them in `ruby`, you have to compile them yourself using the [protobuf](https://github.com/ruby-protobuf/protobuf) gem. For this particular case, compiling can be a pretty involved process, which looks like this:
|
||||
|
||||
```bash
|
||||
# gem install grpc-tools
|
||||
|
||||
TF_VERSION="2.5.0"
|
||||
TF_SERVING_VERSION="2.5.1"
|
||||
PROTO_PATH=path/to/protos
|
||||
set -o pipefail
|
||||
|
||||
curl -L -o tensorflow.zip https://github.com/tensorflow/tensorflow/archive/v$TF_VERSION.zip
|
||||
unzip tensorflow.zip && rm tensorflow.zip
|
||||
mv tensorflow-$TF_VERSION ${PROTO_PATH}/tensorflow
|
||||
|
||||
curl -L -o tf-serving.zip https://github.com/tensorflow/serving/archive/$TF_SERVING_VERSION.zip
|
||||
unzip tf-serving.zip && rm tf-serving.zip
|
||||
mv serving-$TF_SERVING_VERSION/tensorflow_serving ${PROTO_PATH}/tensorflow
|
||||
|
||||
|
||||
TF_SERVING_PROTO=${PROTO_PATH}/ruby
|
||||
mkdir ${TF_SERVING_PROTO}
|
||||
|
||||
grpc_tools_ruby_protoc \
|
||||
-I ${PROTO_PATH}/tensorflow/tensorflow/core/framework/*.proto \
|
||||
--ruby_out=${TF_SERVING_PROTO} \
|
||||
--grpc_out=${TF_SERVING_PROTO} \
|
||||
--proto_path=${PROTO_PATH}/tensorflow
|
||||
|
||||
grpc_tools_ruby_protoc \
|
||||
-I ${PROTO_PATH}/tensorflow/tensorflow/core/example/*.proto \
|
||||
--ruby_out=${TF_SERVING_PROTO} \
|
||||
--grpc_out=${TF_SERVING_PROTO} \
|
||||
--proto_path=${PROTO_PATH}/tensorflow
|
||||
|
||||
grpc_tools_ruby_protoc \
|
||||
-I ${PROTO_PATH}/tensorflow/tensorflow/core/protobuf/*.proto \
|
||||
--ruby_out=${TF_SERVING_PROTO} \
|
||||
--grpc_out=${TF_SERVING_PROTO} \
|
||||
--proto_path=${PROTO_PATH}/tensorflow
|
||||
|
||||
grpc_tools_ruby_protoc \
|
||||
${PROTO_PATH}/tensorflow/tensorflow_serving/apis/*.proto \
|
||||
--ruby_out=${TF_SERVING_PROTO} \
|
||||
--grpc_out=${TF_SERVING_PROTO} \
|
||||
--proto_path=${PROTO_PATH}/tensorflow
|
||||
|
||||
ls $TF_SERVING_PROTO
|
||||
```
|
||||
|
||||
**NOTE**: There's also the [tensorflow-serving-client](https://github.com/nubbel/tensorflow_serving_client-ruby), which already ships with the necessary `ruby` protobufs, however there hasn't been any updates in more than 5 years, so I can't attest to its state of maintenance. So if you want to use this in production, make sure you generate ruby stubs from the latest version of definitons.
|
||||
|
||||
Once the protobufs are available, creating a `PredictRequest` is simple. Here's how you'd encode a request to a model called `mnist`, taking a 784-wide float array as input:
|
||||
|
||||
```ruby
|
||||
require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"
|
||||
|
||||
tensor = [0.0] * 784
|
||||
|
||||
request = Tensorflow::Serving::PredictRequest.new
|
||||
request.model_spec = Tensorflow::Serving::ModelSpec.new name: 'mnist'
|
||||
request.inputs['images'] = Tensorflow::TensorProto.new(
|
||||
float_val: tensor,
|
||||
tensor_shape: Tensorflow::TensorShapeProto.new(
|
||||
dim: [
|
||||
Tensorflow::TensorShapeProto::Dim.new(size: 1),
|
||||
Tensorflow::TensorShapeProto::Dim.new(size: 784)
|
||||
]
|
||||
),
|
||||
dtype: Tensorflow::DataType::DT_FLOAT
|
||||
)
|
||||
```
|
||||
|
||||
**NOTE**: `tensorflow` python API ships with a very useful function called [make_tensor_proto](https://www.tensorflow.org/api_docs/python/tf/make_tensor_proto), which could do the above as a "one-liner". While it's certainly possible to code a similar function in `ruby`, it's a pretty involved process which is beyond the scope of this post.
|
||||
|
||||
As an example, this one is easy to grasp. However, we'll have to deal with much larger tensors in production, which is going to get heavier and slower to deal with using `ruby` arrays.
|
||||
|
||||
### Tensorflow Serving with Numo and GRPC
|
||||
|
||||
In `python`, the standard for using n-dimensional arrays is [numpy](https://numpy.org/). `ruby` has a similar library called [numo](https://github.com/ruby-numo/numo).
|
||||
|
||||
It aims at providing the same APIs as `numpy`, which is mostly an aspirational goal, as keeping up with `numpy` is hard (progress can be tracked [here](https://github.com/ruby-numo/numo-narray/wiki/Numo-vs-numpy)).
|
||||
|
||||
A lot can be done already though, such as [image processing](https://github.com/yoshoku/magro). If our model requires an image, this is how it can be done in `python`:
|
||||
|
||||
```python
|
||||
# using numpy
|
||||
import grpc
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
import tensorflow as tf
|
||||
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc
|
||||
|
||||
img = Image.open('test-image.png')
|
||||
tensor = np.asarray(img)
|
||||
tensor.shape #=> [512,512,3]
|
||||
|
||||
|
||||
request = predict_pb2.PredictRequest()
|
||||
request.model_spec.name = "mnist"
|
||||
request.inputs['images'].CopyFrom(tf.make_tensor_proto(tensor))
|
||||
|
||||
|
||||
stub = prediction_service_pb2_grpc.PredictionServiceStub(grpc.insecure_channel("localhost:9000"))
|
||||
response = stub.Predict(request)
|
||||
print(response.outputs)
|
||||
```
|
||||
|
||||
And this is the equivalent `ruby` code:
|
||||
|
||||
```ruby
|
||||
require "grpc"
|
||||
require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"
|
||||
|
||||
# magro reads images to numo arrays
|
||||
require "magro"
|
||||
|
||||
|
||||
def build_predict_request(tensor)
|
||||
request = Tensorflow::Serving::PredictRequest.new
|
||||
request.model_spec = Tensorflow::Serving::ModelSpec.new name: 'mnist'
|
||||
request.inputs['images'] = Tensorflow::TensorProto.new(
|
||||
binary_val: tensor.to_binary,
|
||||
tensor_shape: Tensorflow::TensorShapeProto.new(
|
||||
dim: tensor.shape.map{ |size| Tensorflow::TensorShapeProto::Dim.new(size: size) }
|
||||
),
|
||||
dtype: Tensorflow::DataType::DT_UINT8
|
||||
)
|
||||
end
|
||||
|
||||
tensor = Magro::IO.imread("test-image.png")
|
||||
tensor.shape #=> [512,512,3]
|
||||
|
||||
# using tensorflow-serving-client example
|
||||
stub = Tensorflow::Serving::PredictionService::Stub.new('localhost:9000', :this_channel_is_insecure)
|
||||
res = stub.predict( build_predict_request(tensor) )
|
||||
puts res.outputs # returns PredictResponses
|
||||
```
|
||||
|
||||
That's it!
|
||||
|
||||
### GRPC over HTTPX
|
||||
|
||||
[httpx ships with a grpc plugin](https://honeyryderchuck.gitlab.io/httpx/wiki/GRPC). This being a blog mostly about `httpx`, it's only fitting I show how to do the above using it :) .
|
||||
|
||||
```ruby
|
||||
require "httpx"
|
||||
require "magro"
|
||||
require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"
|
||||
|
||||
# ... same as above ...
|
||||
|
||||
stub = HTTPX.plugin(:grpc).build_stub("localhost:9000", service: Tensorflow::Serving::PredictionService)
|
||||
res = stub.predict( build_predict_request(tensor) )
|
||||
puts res.outputs # returns PredictResponses
|
||||
```
|
||||
|
||||
### Conclusion
|
||||
|
||||
Hopefully you've gained enough interest about some `ruby` ML toolchain to investigate further. Who knows, maybe you can teach your researcher friends about. However, the ML industry won't move away from `python` soon, so at least you know some more about how you can still use `ruby` to build your services, while interfacing remotely with ML models, running on dedicated hardware, using the gRPC protocol.
|
Loading…
x
Reference in New Issue
Block a user