mirror of https://github.com/ocaml-multicore/eio.git synced 2025-12-07 00:01:52 -05:00

Go to file

Thomas Leonard e189fc4004 Merge fibreslib into eio

It was a bit confusing having two different API libraries. Instead of
opening `Fibreslib`, it is now suggested to open `Eio.Std`, which
exports fewer things.

2021-06-09 09:50:44 +01:00

doc

Add switches for structured concurrency

2021-05-14 10:33:42 +01:00

lib_ctf

Add switches for structured concurrency

2021-05-14 10:33:42 +01:00

lib_eio

Merge fibreslib into eio

2021-06-09 09:50:44 +01:00

lib_eunix

Merge fibreslib into eio

2021-06-09 09:50:44 +01:00

lib_main

Add eio_main library to select backend

2021-05-25 11:27:36 +01:00

ocaml-uring @ b2cb631953

Allow cancelling accept, read and write operations

2021-06-08 15:03:14 +01:00

tests

Merge fibreslib into eio

2021-06-09 09:50:44 +01:00

.gitignore

initial import

2021-03-02 15:19:17 +01:00

.gitmodules

Fix error in tracing patch

2021-04-28 17:34:36 +01:00

ctf.opam

Rename repository to eio

2021-05-14 10:53:10 +01:00

dune

Add eio_main library to select backend

2021-05-25 11:27:36 +01:00

dune-project

Merge fibreslib into eio

2021-06-09 09:50:44 +01:00

eio_main.opam

Add eio_main library to select backend

2021-05-25 11:27:36 +01:00

eio.opam

Merge fibreslib into eio

2021-06-09 09:50:44 +01:00

eunix.opam

Merge fibreslib into eio

2021-06-09 09:50:44 +01:00

Makefile

Split promises into their own library

2021-04-14 09:21:06 +01:00

README.md

Merge fibreslib into eio

2021-06-09 09:50:44 +01:00

README.md

eio -- effects based parallel IO for OCaml

This library implements an effects-based direct-style IO stack for multicore OCaml.

The library is very much a work-in-progress, so this is an unreleased repository.

Motivation
Structure of the code
Getting started
Testing with mocks
Fibres
Tracing
Switches, errors and cancellation
Design note: results vs exceptions
Performance
Networking
Design note: object capabilities
Further reading

Motivation

The Unix library provided with OCaml uses blocking IO operations, and is not well suited to concurrent programs such as network services or interactive applications. For many years, the solution to this has been libraries such as Lwt and Async, which provide a monadic interface. These libraries allow writing code as if there were multiple threads of execution, each with their own stack, but the stacks are simulated using the heap.

The multicore version of OCaml adds support for "effects", removing the need for monadic code here. Using effects brings several advantages:

It is faster, because no heap allocations are needed to simulate a stack.
Concurrent code can be written in the same style as plain non-concurrent code.
Because a real stack is used, backtraces from exceptions work as expected.
Other features of the language (such as try ... with ...) can be used in concurrent code.

In addition, modern operating systems provide high-performance alternatives to the old Unix select call. For example, Linux's io-uring system has applications write the operations they want to perform to a ring buffer, which Linux handles asynchronously.

Due to this, we anticipate many OCaml users will want to rewrite their IO code at some point, once effects have been merged into the the official version of OCaml. It would be very beneficial if we could use this opportunity to standardise on a single concurrency API for OCaml.

This project is therefore exploring what this new API should look like by building an effects-based IO library and then using it to create or port larger applications.

The API is expected to change a great deal over the next year or so. If you are looking for a stable library for your application, you should continue using Lwt or Async for now. However, if you'd like to help with these experiments, please get in touch!

At present, Linux with io-uring is the only backend available. It is able to run a web-server with good performance, but most features are still missing.

Structure of the code

eio provides concurrency primitives (promises, etc), and a high-level, cross-platform OS API.
eunix provides a Linux io-uring backend for these APIs, plus a low-level API that can be used directly (in non-portable code).
eio_main selects an appropriate backend (e.g. eunix), depending on your platform.
ctf provides tracing support.

Getting started

You will need a version of the OCaml compiler with effects. You can get one like this:

opam switch create 4.12.0+domains+effects --packages=ocaml-variants.4.12.0+domains+effects --repositories=multicore=git+https://github.com/ocaml-multicore/multicore-opam.git,default

Then you'll need to install this library (and utop if you want to try it interactively):

git clone --recursive https://github.com/ocaml-multicore/eio.git
cd eio
opam pin -yn ./ocaml-uring
opam pin -yn .
opam depext -i eio_main utop

To try out the examples interactively, run utop and require the eio_main library. It is also convenient to open the Eio.Std module:

# #require "eio_main";;
# open Eio.Std;;

This function writes a greeting to stdout:

let main ~stdout =
  Eio.Flow.copy_string "Hello, world!\n" stdout

To run it, we use Eio_main.run to run the event loop and call it from there:

# Eio_main.run @@ fun env ->
  main ~stdout:(Eio.Stdenv.stdout env);;
Hello, world!
- : unit = ()

Note that:

The env argument represents the standard environment of a Unix process, allowing it to interact with the outside world. A program will typically start by extracting from env whatever things the program will need and then calling main with them.
The type of the main function here tells us that this program only interacts via stdout.
Eio_main.run automatically calls the appropriate run function for your platform. For example, on Linux this will call Eunix.run. For non-portable code you can use the platform-specific library directly.

Testing with mocks

Because external resources are provided to main as arguments, we can easily replace them with mocks for testing. e.g.

# Eio_main.run @@ fun _env ->
  let buffer = Buffer.create 20 in
  main ~stdout:(Eio.Flow.buffer_sink buffer);
  traceln "Main would print %S" (Buffer.contents buffer);;
Main would print "Hello, world!\n"
- : unit = ()

traceln provides convenient printf-style debugging, without requiring you to plumb stderr through your code. It's actually using the Format module, so you can use the extended formatting directives here too.

The MDX documentation system this README uses doesn't handle exceptions very well, so let's make a little wrapper to simplify future examples:

let run fn =
  Eio_main.run @@ fun env ->
  try fn env
  with Failure msg -> traceln "Error: %s" msg

Fibres

Here's an example running two threads of execution (fibres) concurrently:

let main _env =
  Switch.top @@ fun sw ->
  Fibre.both ~sw
    (fun () -> for x = 1 to 3 do traceln "x = %d" x; Fibre.yield ~sw () done)
    (fun () -> for y = 1 to 3 do traceln "y = %d" y; Fibre.yield ~sw () done)

# run main;;
x = 1
y = 1
x = 2
y = 2
x = 3
y = 3
- : unit = ()

Notes:

The two fibres run on a single core, so only one can be running at a time. Calling an operation that performs an effect (such as yield) can switch to a different thread.
The sw argument is used to handle exceptions (described later).

Tracing

The library can write traces in CTF format, showing when threads (fibres) are created, when they run, and how they interact. We can run the previous code with tracing enabled (writing to a new trace.ctf file) like this:

# let () =
    let buffer = Ctf.Unix.mmap_buffer ~size:0x100000 "trace.ctf" in
    let trace_config = Ctf.Control.make buffer in
    Ctf.Control.start trace_config;
    run main;
    Ctf.Control.stop trace_config;;
x = 1
y = 1
x = 2
y = 2
x = 3
y = 3

The trace can be viewed using mirage-trace-viewer. This should work even while the program is still running. The file is a ring buffer, so when it gets full old events will start to be overwritten with new ones.

This shows the two counting threads, as well as the lifetime of the sw switch. Note that the output from traceln appears in the trace as well as on the console.

Switches, errors and cancellation

A switch is used to group fibres together so that they can be cancelled or waited on together. This is a form of structured concurrency.

Here's what happens if one of the two threads above fails:

# run @@ fun _env ->
  Switch.top @@ fun sw ->
  Fibre.both ~sw
    (fun () -> for x = 1 to 3 do traceln "x = %d" x; Fibre.yield ~sw () done)
    (fun () -> failwith "Simulated error");;
x = 1
Error: Simulated error
- : unit = ()

What happened here was:

The first fibre ran, printed x = 1 and yielded.
The second fibre raised an exception.
Fibre.both caught the exception and turned off the switch.
The first thread's yield saw the switch was off and raised the exception there too.
Once both threads had finished, Fibre.both re-raised the exception.

Note that turning off a switch only asks the other thread(s) to cancel. A thread is free to ignore the switch and continue (perhaps to clean up some resources).

Any operation that can be cancelled should take a ~sw argument.

Switches can also be used to wait for threads even when there isn't an error. e.g.

# run @@ fun _env ->
  Switch.top (fun sw ->
    Fibre.fork_ignore ~sw (fun () -> for i = 1 to 3 do traceln "i = %d" i; Fibre.yield ~sw () done);
    traceln "First thread forked";
    Fibre.fork_ignore ~sw (fun () -> for j = 1 to 3 do traceln "j = %d" j; Fibre.yield ~sw () done);
    traceln "Second thread forked; top-level code is finished"
  );
  traceln "Switch is finished";;
i = 1
First thread forked
j = 1
i = 2
Second thread forked; top-level code is finished
j = 2
i = 3
j = 3
Switch is finished
- : unit = ()

Switch.top is used for top-level switches. You can also use Fibre.fork_sub_ignore to create a child sub-switch. Turning off the parent switch will also turn off the child switch, but turning off the child doesn't disable the parent.

For example, a web-server might use one switch for the whole server and then create one sub-switch for each incoming connection. This allows you to end all fibres handling a single connection by turning off that connection's switch, or to exit the whole application using the top-level switch.

Design note: results vs exceptions

The OCaml standard library uses exceptions to report errors in most cases. Many libraries instead use the result type, which has the advantage of tracking the possible errors in the type system. However, using result is slower, as it requires more allocations, and explicit code to propagate errors.

As part of the effects work, OCaml is expected to gain a typed effects extension to the type system, allowing it to track both effects and exceptions statically. In anticipation of this, the Eio library prefers to use exceptions in most cases, reserving the use of result for cases where the caller is likely to want to handle the problem immediately rather than propagate it.

Performance

As mentioned above, Eio allows you to supply your own implementations of its abstract interfaces. This is in contrast to OCaml's standard library, for example, which only operates on OS file descriptors. You might wonder what the performance impact of this is. Here's a simple implementation of cat using the standard OCaml functions:

# let () =
    let buf = Bytes.create 4096 in
    let rec copy () =
      match input stdin buf 0 4096 with
      | 0 -> ()
      | got ->
        output stdout buf 0 got;
        copy ()
    in
    copy ()

And here is the equivalent using Eio:

# let () =
    Eio_main.run @@ fun env ->
    Eio.Flow.copy
      (Eio.Stdenv.stdin env)
      (Eio.Stdenv.stdout env)

Testing on a fresh 10G file with pv on my machine gives:

$ truncate -s 10G dummy

$ cat_ocaml_unix.exe < dummy | pv >/dev/null
10.0GiB 0:00:04 [2.33GiB/s]

$ cat                < dummy | pv >/dev/null
10.0GiB 0:00:04 [2.42GiB/s]

$ cat_ocaml_eio.exe  < dummy | pv >/dev/null
10.0GiB 0:00:03 [3.01GiB/s]

Eio.Flow.copy src dst asks dst to copy from src. As dst here is a Unix file descriptor, it first calls the probe method on the src object to check whether it is too. Discovering that src is also a file descriptor, it switches to a faster code path optimised for that case. On my machine, this code path uses the Linux-specific splice system call for maximum performance.

Note that not all cases are well optimised yet, but the idea is for each backend to choose the most efficient way to implement the operation.

Networking

Eio provides a simple high-level API for networking. Here is a client that connects to address addr using network and sends a message:

let run_client ~sw ~network ~addr =
  traceln "Connecting to server...";
  let flow = Eio.Network.connect ~sw network addr in
  Eio.Flow.copy_string "Hello from client" flow;
  Eio.Flow.close flow

Note: The flow is attached to sw and will be closed automatically when it finishes. The explicit close here just ensures it is closed promptly, rather than waiting for the server to finish too.

Here is a server that listens on socket and handles a single connection by reading a message:

let run_server ~sw socket =
  Eio.Network.Listening_socket.accept_sub socket ~sw (fun ~sw flow _addr ->
    traceln "Server accepted connection from client";
    let b = Buffer.create 100 in
    Eio.Flow.copy flow (Eio.Flow.buffer_sink b);
    traceln "Server received: %S" (Buffer.contents b)
  ) ~on_error:(fun ex -> traceln "Error handling connection: %s" (Printexc.to_string ex));
  traceln "(normally we'd loop and accept more connections here)"

Notes:

accept_sub handles the connection in a new fibre, with its own sub-switch.
Normally, a server would call accept_sub in a loop to handle multiple connections.
When the child switch created by accept_sub finishes, flow is closed automatically.

We can test them in a single process using Fibre.both:

let main ~network ~addr =
  Switch.top @@ fun sw ->
  let server = Eio.Network.bind network ~sw ~reuse_addr:true addr in
  Eio.Network.Listening_socket.listen server 5;
  traceln "Server ready...";
  Fibre.both ~sw
    (fun () -> run_server ~sw server)
    (fun () -> run_client ~sw ~network ~addr)

# run @@ fun env ->
  main
    ~network:(Eio.Stdenv.network env)
    ~addr:Unix.(ADDR_INET (inet_addr_loopback, 8080))
Server ready...
Connecting to server...
Server accepted connection from client
(normally we'd loop and accept more connections here)
Server received: "Hello from client"
- : unit = ()

Design note: object capabilities

The Eio high-level API follows the principles of the Object-capability model (ocaps). In this model, having a reference to an "object" (which could be a function or closure) grants permission to use it. The only ways to get a reference are to create a new object, or to be passed an existing reference by another object. For A to pass a reference B to another object C, A requires access (i.e. references) to both B and C. In particular, for B to get a reference to C there must be a path in the reference graph between them on which all objects allow it.

This is all just standard programming practice, really, except that it disallows patterns that break this model:

Global variables are not permitted. Otherwise, B could store itself in a global variable and C could collect it.
Modules that use C code or the OS to provide the effect of globals are also not permitted.

For example, OCaml's Unix module provides access to the network and filesystem to any code that wants it. By contrast, an Eio module that wants such access must receive it explicitly.

Consider the network example in the previous section. Imagine this is a large program and we want to know:

Does this program modify the filesystem?
Does this program send telemetry data over the network?

In an ocap language, we don't have to read the entire code-base to find the answers:

All authority starts at the (privileged) run function with the env parameter, so we must check this code.
We see that only env's network access is used, so we know this program doesn't access the filesystem, answering question 1 immediately.
To check whether telemetry is sent, we need to follow the network authority as it is passed to main.
main uses network to open a listening socket on the loopback interface, which it passes to run_server. run_server does not get the full network access, so we probably don't need to read that code (we might want to check whether we granted other parties access to this port on our loopback network).
run_client does get network, so we do need to read that. We could make that code easier to audit by passing it (fun () -> Eio.Network.connect network addr) instead of network. Then we could see that run_client could only connect to our loopback address.

Since OCaml is not an ocap language, code can ignore Eio and use the non-ocap APIs directly. Therefore, this cannot be used as a security mechanism. However, it still makes non-malicious code easier to understand and test, and may allow for an ocap extension to the language in the future. See Emily for a previous attempt at this.

README.md

eio -- effects based parallel IO for OCaml

Contents

Motivation

Structure of the code

Getting started

Testing with mocks

Fibres

Tracing

Switches, errors and cancellation

Design note: results vs exceptions

Performance

Networking

Design note: object capabilities

Further reading