It was a bit confusing having two different API libraries. Instead of opening `Fibreslib`, it is now suggested to open `Eio.Std`, which exports fewer things.
eio -- effects based parallel IO for OCaml
This library implements an effects-based direct-style IO stack for multicore OCaml.
The library is very much a work-in-progress, so this is an unreleased repository.
Contents
- Motivation
- Structure of the code
- Getting started
- Testing with mocks
- Fibres
- Tracing
- Switches, errors and cancellation
- Design note: results vs exceptions
- Performance
- Networking
- Design note: object capabilities
- Further reading
Motivation
The Unix library provided with OCaml uses blocking IO operations, and is not well suited to concurrent programs such as network services or interactive applications.
For many years, the solution to this has been libraries such as Lwt and Async, which provide a monadic interface.
These libraries allow writing code as if there were multiple threads of execution, each with their own stack, but the stacks are simulated using the heap.
The multicore version of OCaml adds support for "effects", removing the need for monadic code here. Using effects brings several advantages:
- It is faster, because no heap allocations are needed to simulate a stack.
- Concurrent code can be written in the same style as plain non-concurrent code.
- Because a real stack is used, backtraces from exceptions work as expected.
- Other features of the language (such as
try ... with ...) can be used in concurrent code.
In addition, modern operating systems provide high-performance alternatives to the old Unix select call.
For example, Linux's io-uring system has applications write the operations they want to perform to a ring buffer,
which Linux handles asynchronously.
Due to this, we anticipate many OCaml users will want to rewrite their IO code at some point, once effects have been merged into the the official version of OCaml. It would be very beneficial if we could use this opportunity to standardise on a single concurrency API for OCaml.
This project is therefore exploring what this new API should look like by building an effects-based IO library and then using it to create or port larger applications.
The API is expected to change a great deal over the next year or so. If you are looking for a stable library for your application, you should continue using Lwt or Async for now. However, if you'd like to help with these experiments, please get in touch!
At present, Linux with io-uring is the only backend available. It is able to run a web-server with good performance, but most features are still missing.
Structure of the code
eioprovides concurrency primitives (promises, etc), and a high-level, cross-platform OS API.eunixprovides a Linux io-uring backend for these APIs, plus a low-level API that can be used directly (in non-portable code).eio_mainselects an appropriate backend (e.g.eunix), depending on your platform.ctfprovides tracing support.
Getting started
You will need a version of the OCaml compiler with effects. You can get one like this:
opam switch create 4.12.0+domains+effects --packages=ocaml-variants.4.12.0+domains+effects --repositories=multicore=git+https://github.com/ocaml-multicore/multicore-opam.git,default
Then you'll need to install this library (and utop if you want to try it interactively):
git clone --recursive https://github.com/ocaml-multicore/eio.git
cd eio
opam pin -yn ./ocaml-uring
opam pin -yn .
opam depext -i eio_main utop
To try out the examples interactively, run utop and require the eio_main library.
It is also convenient to open the Eio.Std module:
# #require "eio_main";;
# open Eio.Std;;
This function writes a greeting to stdout:
let main ~stdout =
Eio.Flow.copy_string "Hello, world!\n" stdout
To run it, we use Eio_main.run to run the event loop and call it from there:
# Eio_main.run @@ fun env ->
main ~stdout:(Eio.Stdenv.stdout env);;
Hello, world!
- : unit = ()
Note that:
-
The
envargument represents the standard environment of a Unix process, allowing it to interact with the outside world. A program will typically start by extracting fromenvwhatever things the program will need and then callingmainwith them. -
The type of the
mainfunction here tells us that this program only interacts viastdout. -
Eio_main.runautomatically calls the appropriate run function for your platform. For example, on Linux this will callEunix.run. For non-portable code you can use the platform-specific library directly.
Testing with mocks
Because external resources are provided to main as arguments, we can easily replace them with mocks for testing.
e.g.
# Eio_main.run @@ fun _env ->
let buffer = Buffer.create 20 in
main ~stdout:(Eio.Flow.buffer_sink buffer);
traceln "Main would print %S" (Buffer.contents buffer);;
Main would print "Hello, world!\n"
- : unit = ()
traceln provides convenient printf-style debugging, without requiring you to plumb stderr through your code.
It's actually using the Format module, so you can use the extended formatting directives here too.
The MDX documentation system this README uses doesn't handle exceptions very well, so let's make a little wrapper to simplify future examples:
let run fn =
Eio_main.run @@ fun env ->
try fn env
with Failure msg -> traceln "Error: %s" msg
Fibres
Here's an example running two threads of execution (fibres) concurrently:
let main _env =
Switch.top @@ fun sw ->
Fibre.both ~sw
(fun () -> for x = 1 to 3 do traceln "x = %d" x; Fibre.yield ~sw () done)
(fun () -> for y = 1 to 3 do traceln "y = %d" y; Fibre.yield ~sw () done)
# run main;;
x = 1
y = 1
x = 2
y = 2
x = 3
y = 3
- : unit = ()
Notes:
-
The two fibres run on a single core, so only one can be running at a time. Calling an operation that performs an effect (such as
yield) can switch to a different thread. -
The
swargument is used to handle exceptions (described later).
Tracing
The library can write traces in CTF format, showing when threads (fibres) are created, when they run, and how they interact.
We can run the previous code with tracing enabled (writing to a new trace.ctf file) like this:
# let () =
let buffer = Ctf.Unix.mmap_buffer ~size:0x100000 "trace.ctf" in
let trace_config = Ctf.Control.make buffer in
Ctf.Control.start trace_config;
run main;
Ctf.Control.stop trace_config;;
x = 1
y = 1
x = 2
y = 2
x = 3
y = 3
The trace can be viewed using mirage-trace-viewer. This should work even while the program is still running. The file is a ring buffer, so when it gets full old events will start to be overwritten with new ones.
This shows the two counting threads, as well as the lifetime of the sw switch.
Note that the output from traceln appears in the trace as well as on the console.
Switches, errors and cancellation
A switch is used to group fibres together so that they can be cancelled or waited on together. This is a form of structured concurrency.
Here's what happens if one of the two threads above fails:
# run @@ fun _env ->
Switch.top @@ fun sw ->
Fibre.both ~sw
(fun () -> for x = 1 to 3 do traceln "x = %d" x; Fibre.yield ~sw () done)
(fun () -> failwith "Simulated error");;
x = 1
Error: Simulated error
- : unit = ()
What happened here was:
- The first fibre ran, printed
x = 1and yielded. - The second fibre raised an exception.
Fibre.bothcaught the exception and turned off the switch.- The first thread's
yieldsaw the switch was off and raised the exception there too. - Once both threads had finished,
Fibre.bothre-raised the exception.
Note that turning off a switch only asks the other thread(s) to cancel. A thread is free to ignore the switch and continue (perhaps to clean up some resources).
Any operation that can be cancelled should take a ~sw argument.
Switches can also be used to wait for threads even when there isn't an error. e.g.
# run @@ fun _env ->
Switch.top (fun sw ->
Fibre.fork_ignore ~sw (fun () -> for i = 1 to 3 do traceln "i = %d" i; Fibre.yield ~sw () done);
traceln "First thread forked";
Fibre.fork_ignore ~sw (fun () -> for j = 1 to 3 do traceln "j = %d" j; Fibre.yield ~sw () done);
traceln "Second thread forked; top-level code is finished"
);
traceln "Switch is finished";;
i = 1
First thread forked
j = 1
i = 2
Second thread forked; top-level code is finished
j = 2
i = 3
j = 3
Switch is finished
- : unit = ()
Switch.top is used for top-level switches. You can also use Fibre.fork_sub_ignore to create a child sub-switch.
Turning off the parent switch will also turn off the child switch, but turning off the child doesn't disable the parent.
For example, a web-server might use one switch for the whole server and then create one sub-switch for each incoming connection. This allows you to end all fibres handling a single connection by turning off that connection's switch, or to exit the whole application using the top-level switch.
Design note: results vs exceptions
The OCaml standard library uses exceptions to report errors in most cases.
Many libraries instead use the result type, which has the advantage of tracking the possible errors in the type system.
However, using result is slower, as it requires more allocations, and explicit code to propagate errors.
As part of the effects work, OCaml is expected to gain a typed effects extension to the type system,
allowing it to track both effects and exceptions statically.
In anticipation of this, the Eio library prefers to use exceptions in most cases,
reserving the use of result for cases where the caller is likely to want to handle the problem immediately
rather than propagate it.
Performance
As mentioned above, Eio allows you to supply your own implementations of its abstract interfaces.
This is in contrast to OCaml's standard library, for example, which only operates on OS file descriptors.
You might wonder what the performance impact of this is.
Here's a simple implementation of cat using the standard OCaml functions:
# let () =
let buf = Bytes.create 4096 in
let rec copy () =
match input stdin buf 0 4096 with
| 0 -> ()
| got ->
output stdout buf 0 got;
copy ()
in
copy ()
And here is the equivalent using Eio:
# let () =
Eio_main.run @@ fun env ->
Eio.Flow.copy
(Eio.Stdenv.stdin env)
(Eio.Stdenv.stdout env)
Testing on a fresh 10G file with pv on my machine gives:
$ truncate -s 10G dummy
$ cat_ocaml_unix.exe < dummy | pv >/dev/null
10.0GiB 0:00:04 [2.33GiB/s]
$ cat < dummy | pv >/dev/null
10.0GiB 0:00:04 [2.42GiB/s]
$ cat_ocaml_eio.exe < dummy | pv >/dev/null
10.0GiB 0:00:03 [3.01GiB/s]
Eio.Flow.copy src dst asks dst to copy from src.
As dst here is a Unix file descriptor,
it first calls the probe method on the src object to check whether it is too.
Discovering that src is also a file descriptor, it switches to a faster code path optimised for that case.
On my machine, this code path uses the Linux-specific splice system call for maximum performance.
Note that not all cases are well optimised yet, but the idea is for each backend to choose the most efficient way to implement the operation.
Networking
Eio provides a simple high-level API for networking.
Here is a client that connects to address addr using network and sends a message:
let run_client ~sw ~network ~addr =
traceln "Connecting to server...";
let flow = Eio.Network.connect ~sw network addr in
Eio.Flow.copy_string "Hello from client" flow;
Eio.Flow.close flow
Note: The flow is attached to sw and will be closed automatically when it finishes.
The explicit close here just ensures it is closed promptly,
rather than waiting for the server to finish too.
Here is a server that listens on socket and handles a single connection by reading a message:
let run_server ~sw socket =
Eio.Network.Listening_socket.accept_sub socket ~sw (fun ~sw flow _addr ->
traceln "Server accepted connection from client";
let b = Buffer.create 100 in
Eio.Flow.copy flow (Eio.Flow.buffer_sink b);
traceln "Server received: %S" (Buffer.contents b)
) ~on_error:(fun ex -> traceln "Error handling connection: %s" (Printexc.to_string ex));
traceln "(normally we'd loop and accept more connections here)"
Notes:
accept_subhandles the connection in a new fibre, with its own sub-switch.- Normally, a server would call
accept_subin a loop to handle multiple connections. - When the child switch created by
accept_subfinishes,flowis closed automatically.
We can test them in a single process using Fibre.both:
let main ~network ~addr =
Switch.top @@ fun sw ->
let server = Eio.Network.bind network ~sw ~reuse_addr:true addr in
Eio.Network.Listening_socket.listen server 5;
traceln "Server ready...";
Fibre.both ~sw
(fun () -> run_server ~sw server)
(fun () -> run_client ~sw ~network ~addr)
# run @@ fun env ->
main
~network:(Eio.Stdenv.network env)
~addr:Unix.(ADDR_INET (inet_addr_loopback, 8080))
Server ready...
Connecting to server...
Server accepted connection from client
(normally we'd loop and accept more connections here)
Server received: "Hello from client"
- : unit = ()
Design note: object capabilities
The Eio high-level API follows the principles of the Object-capability model (ocaps).
In this model, having a reference to an "object" (which could be a function or closure) grants permission to use it.
The only ways to get a reference are to create a new object, or to be passed an existing reference by another object.
For A to pass a reference B to another object C, A requires access (i.e. references) to both B and C.
In particular, for B to get a reference to C there must be a path in the reference graph between them
on which all objects allow it.
This is all just standard programming practice, really, except that it disallows patterns that break this model:
- Global variables are not permitted. Otherwise, B could store itself in a global variable and C could collect it.
- Modules that use C code or the OS to provide the effect of globals are also not permitted.
For example, OCaml's Unix module provides access to the network and filesystem to any code that wants it.
By contrast, an Eio module that wants such access must receive it explicitly.
Consider the network example in the previous section. Imagine this is a large program and we want to know:
- Does this program modify the filesystem?
- Does this program send telemetry data over the network?
In an ocap language, we don't have to read the entire code-base to find the answers:
- All authority starts at the (privileged)
runfunction with theenvparameter, so we must check this code. - We see that only
env's network access is used, so we know this program doesn't access the filesystem, answering question 1 immediately. - To check whether telemetry is sent, we need to follow the
networkauthority as it is passed tomain. mainusesnetworkto open a listening socket on the loopback interface, which it passes torun_server.run_serverdoes not get the fullnetworkaccess, so we probably don't need to read that code (we might want to check whether we granted other parties access to this port on our loopback network).run_clientdoes getnetwork, so we do need to read that. We could make that code easier to audit by passing it(fun () -> Eio.Network.connect network addr)instead ofnetwork. Then we could see thatrun_clientcould only connect to our loopback address.
Since OCaml is not an ocap language, code can ignore Eio and use the non-ocap APIs directly. Therefore, this cannot be used as a security mechanism. However, it still makes non-malicious code easier to understand and test, and may allow for an ocap extension to the language in the future. See Emily for a previous attempt at this.
Further reading
Some background about the effects system can be found in:
- "Retrofitting Concurrency onto OCaml" (to appear, PLDI 2021)
- https://kcsrk.info/ocaml/multicore/2015/05/20/effects-multicore/
- Effects examples: https://github.com/ocaml-multicore/effects-examples/tree/master/aio
- Concurrent System Programming with Effect Handlers
- Asynchronous effect based IO using effect handlers