Add caml_alloc_custom_mem by damiendoligez · Pull Request #1738 · ocaml/ocaml

damiendoligez · 2018-04-25T15:06:34Z

This PR is related to #1476 and MPR#7750.

It adds one function to the runtime and three new GC parameters.

The function (caml_alloc_custom_mem) is to be used by C code to allocate custom objects that consist of a small block in the heap pointing to a large amount of memory outside the heap (typically allocated with malloc). This function takes only one size parameter, the number of bytes of memory outside the heap used by this custom block.

The user can then control how much of this memory is retained by dead blocks because the GC does not immediately collect dead blocks ("floating garbage"). This is done with three new GC parameters:

M custom_major_ratio This is a percentage relative to the major heap size. A complete GC cycle will be done every time 2/3 of that much memory is allocated for blocks in the major heap. Assuming constant allocation and deallocation rates, this means there are at most (M/100 * major-heap-size) bytes of floating garbage at any time. The factor of 2/3 (or 1.5) is (roughly speaking) because the major GC takes 1.5 cycles (previous cycle + marking phase) before it starts to deallocate dead blocks allocated during the previous cycle.
m custom_minor_ratio This is a percentage relative to the minor heap size. A minor GC will be forced when the memory allocated for blocks in the minor heap reaches this value.
n custom_minor_max_size This is a number of bytes. Blocks allocated in the minor heap that are over this size are assumed to be long-lived and counted immediately against the major heap rather than the minor heap. This is to avoid large allocations forcing a minor GC every time.

The function is then used to allocate bigarrays, allowing us to get rid of the infamous CAML_BA_MAX_MEMORY constant.

When applied to the test program of MPR#7750, the results are pretty good: we can make the process size vary from about 6GB (with M=10) to 15 GB (with M=100) to 25 GB (with M=200).

needs to be done: documentation

alainfrisch · 2018-04-25T15:57:49Z

This should also hopefully address MPR#7100.

byterun/major_gc.c

alainfrisch · 2018-04-25T16:08:52Z

Not strictly speaking related to the proposed change, but I think it's good to maintain an overall understanding of how custom blocks are handled by the GC. Could you explain the following heuristics in caml_adjust_gc_speed:

  if (caml_extra_heap_resources
           > (double) caml_minor_heap_wsz / 2.0
             / (double) caml_stat_heap_wsz) {
    CAML_INSTR_INT ("request_major/adjust_gc_speed_2@", 1);
    caml_request_major_slice ();
  }

As the heap grows during the life of the program, this will trigger more and more often. For instance, each i/o channel allocation (currently with ratio 1/1000) will trigger a major slice as soon as:

  caml_stat_heap_wsz > 500 * caml_minor_heap_wsz

and this can easily happen. Does it really make sense to trigger GC slices more often when the heap grows for a constant allocation rate of custom blocks?

damiendoligez · 2018-04-26T14:10:33Z

A bit of digging with git blame reveals that this heuristic was already present in 1995 in the very first commit in OCaml's history, so it dates back to Caml Light... It's probably time to re-assess because I really don't remember the reason for this.

xavierleroy · 2018-04-26T15:20:00Z

On Thu, Apr 26, 2018 at 4:10 PM Damien Doligez ***@***.***> wrote: A bit of digging with git blame reveals that this heuristic was already present in 1995 in the very first commit in OCaml's history, so it dates back to Caml Light... It's probably time to re-assess because I really don't remember the reason for this.

If you're in the mood for archeology, the Caml Light sources (with CVS history) are here: https://github.com/camllight/camllight Enjoy :-) - Xavier

…

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1738 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADqusmcekZprjpin0SHzD5fzwmqyEQsVks5tsdVdgaJpZM4TjnXp> .

stedolan · 2018-05-01T09:59:56Z

This looks really nice. The only bit that worries me is custom_minor_max_size, because it looks like performance might change abruptly when allocation sizes go from 8191 to 8193 bytes. Rather than a threshold, how about something smoother, like say adding the first 8k to extra_heap_resources_minor and the rest to extra_heap_resources?

alainfrisch · 2018-05-02T09:16:31Z

If you're in the mood for archeology, the Caml Light sources (with CVS
history) are here

This leads to :

camllight/camllight@67c1dc3#diff-d0f533866c552cc04626e6d892579278R153

which does not give any rationale for that specific heuristics.

damiendoligez · 2018-05-03T12:53:10Z

This looks really nice. The only bit that worries me is custom_minor_max_size, because it looks like performance might change abruptly when allocation sizes go from 8191 to 8193 bytes. Rather than a threshold, how about something smoother, like say adding the first 8k to extra_heap_resources_minor and the rest to extra_heap_resources?

Sounds like a good idea. I'll try it.

damiendoligez · 2018-05-04T08:54:45Z

@stedolan Done.

alainfrisch · 2018-05-11T11:02:38Z

@damiendoligez Should this new API also be used for i/o channels?

xavierleroy · 2018-05-20T16:37:20Z

@alainfrisch: I/O buffers consume both one file descriptor (a system resource, perhaps more appropriate for the old API?) and an out-of-heap buffer (a memory resource, very appropriate for the new API). So, it's a tie.

This said I'm very happy to see the new API. If the experts are OK with it I suggest merging in trunk ASAP.

alainfrisch · 2018-05-22T09:16:24Z

I/O buffers consume both one file descriptor (a system resource, perhaps more appropriate for the old API?)

It don't think it's the case. The life-time of the file descriptor does not depend on the GC: it can and must be released explicitly (with close), and the GC finaliser won't touch it (no implicit close). So as far as I can tell, the channel object is equivalent to a pure memory buffer, as far as the GC is concerned.

alainfrisch · 2018-05-22T09:18:42Z

(About i/o channels: with an extra indirection, one could even destroy (free) the buffer when the channel is (explicitly) closed, but it's tricky to make this thread-safe.)

damiendoligez · 2018-05-23T13:22:07Z

@alainfrisch you're right, the FD part of the channel is unmanaged, and the memory part is a good fit for the new API, so I'm going to convert it too.

Don't merge before I do that.

delamonpansie · 2018-06-26T16:48:56Z

Why just not use caml_alloc_dependent_memory and caml_free_dependent_memory?
Yes, this would require extending caml_ba_proxy with field holding actual size of allocation. It simple and relies on already existing Gc infrastructure.

Here is a small test case: (it requires 4.07)

module Gc_helper = struct
  type void
  external alloc_dependent_memory : (int [@untaged]) -> void = "caml_alloc_dependent_memory"
  external free_dependent_memory : (int [@untaged]) -> void = "caml_free_dependent_memory"

  let allocated = ref 0
  let minor_heap_size = Gc.((get ()).minor_heap_size) / (Sys.word_size / 8)

  let bigarray v =
    let size = Bigarray.(Array1.size_in_bytes v) in
    allocated := !allocated + size;
    if !allocated > minor_heap_size then begin
        allocated := size;
        Gc.major_slice (-1) |> ignore; (* heuristics from https://github.com/ocaml/ocaml/blob/trunk/byterun/memory.c#L522 *)
      end;
    alloc_dependent_memory size |> ignore;
    Gc.finalise_last (fun () -> free_dependent_memory size |> ignore) v; (* this should go to caml_ba_finaliser *)
    v
end

module Memstat = struct
  let word_size = Sys.word_size / 8 (* word size in bytes *)

  let kmgt size =
    let sprintf = Printf.sprintf in
    if size < 512 * 1024 then
      sprintf "%.2fk" (float size /. 0x1.0p10)
    else if size < 512 * 1024 * 1024 then
      sprintf "%.2fm" (float size /. 0x1.0p20)
    else if size < 512 * 1024 * 1024 * 1024 then
      sprintf "%.2fg" (float size /. 0x1.0p30)
    else
      sprintf "%.2ft" (float size /. 0x1.0p40)

  let statm () =
    let open Scanf in
    let fd = Scanning.from_file "/proc/self/statm" in
    let result = bscanf fd "%i %i %i"
                   (fun size resident shared -> (size, resident, shared) ) in
    Scanning.close_in fd;
    result

  let alive_buf_size = ref 0
  let total_buf_size = ref 0

  let alloc size v =
    total_buf_size := !total_buf_size + size;
    alive_buf_size := !alive_buf_size + size;
    Gc.finalise_last (fun () -> total_buf_size := !total_buf_size - size) v
  let free size =
    alive_buf_size := !alive_buf_size - size

  let rec memstat () =
    let stat = Gc.stat () in
    let vss, _, _ = statm () in
    Printf.printf "% 9s vss; % 9s heap; % 9s live; % 9s alive buf; % 9s total buf; % 9.0f%% garbage\n%!"
      (kmgt (vss * 4096))
      (kmgt (stat.heap_words * word_size))
      (kmgt (stat.live_words * word_size))
      (kmgt !alive_buf_size)
      (kmgt !total_buf_size)
      (100. *. float (!total_buf_size - !alive_buf_size) /. float !alive_buf_size);
    Unix.sleepf 0.5;
    memstat ()

  let _ =
    Thread.create memstat ()
end


(* Create some memory pressure, simulating long running programm with some
   reasonable amount of small objects *)
let tree =
  let limit = 1024 * 1024 * (try int_of_string (Sys.getenv "LIMIT")
                             with _ -> 10) in
  let key size =
    Bytes.init size (fun _ -> Random.int 9 + Char.code '0' |> Char.chr) in
  let module T = Set.Make(Bytes) in
  let rec build total_size t =
    if total_size > limit then
      t
    else
      let size = Random.int 16 in
      build (total_size + size) (T.add (key size) t)
  in
  print_string "Allocating some objects\n";
  build 0 T.empty

let run message alloc =
  Gc.compact ();
  print_string message;
  print_newline ();

  let q = Queue.create () in
  let deadline = Unix.gettimeofday () +. 15.0 in
  while Unix.gettimeofday () < deadline do
    let size = 2 * 1024 * 1024 + Random.int (2 * 1024 * 1024) in
    let buf = alloc size in
    Queue.push (size, buf) q;
    Memstat.alloc size buf;

    (* pretend  we have about 30 active buffers at every moment of time *)
    while Queue.length q > 30 do
      let size, _ = Queue.pop q in
      Memstat.free size;
    done;
  done;

  (* drain queue *)
  while Queue.length q > 0 do
    let size, _ = Queue.pop q in
    Memstat.alive_buf_size := !Memstat.alive_buf_size - size
  done


let _ =
  run "Bytes" (fun size -> Bytes.create size);
  run "helped Bigarray" (fun size -> Bigarray.(Array1.create char c_layout size) |> Gc_helper.bigarray);
  run "plain Bigarray" (fun size -> Bigarray.(Array1.create char c_layout size));


(* Local Variables: *)
(* compile-command: "ocamlopt -thread unix.cmxa threads.cmxa bigarray.cmxa gc_sim.ml -o gc_sim" *)
(* End: *)

damiendoligez · 2018-06-27T12:04:28Z

Why just not use caml_alloc_dependent_memory and caml_free_dependent_memory?

That API is harder to use (it mandates the use of finalizers) and never caught on. It doesn't let the user control the ratio of heap to external memory.

gasche · 2018-09-14T08:49:04Z

As far as I can tell, this PR has not been fully reviewed (and not approved), and @damiendoligez indicates that it is ready. @stedolan, would you maybe be willing to do a review?

gasche · 2018-10-28T19:42:51Z

( @alainfrisch or @jhjourdan might be qualified to review as well. )

alainfrisch

LGTM, only cosmetic comments.

@damiendoligez Did you test MPR#7100 against this branch?

man/ocamlrun.m

manual/manual/cmds/runtime.etex

runtime/custom.c

stdlib/gc.mli

testsuite/tests/misc/ephetest2.ml

stdlib/gc.mli

alainfrisch · 2018-10-29T16:04:45Z

You could also use the new API in win32graph/draw.c, graph/image.c and avoid magic constants in those files. It seems all other uses in the core distribution are with default values for mem/max (0/1), so could also be switched to new API (or not).

damiendoligez · 2018-11-05T15:34:54Z

I'm not sure about Win32, but on X Window, the memory held by a Pixmap is in the server, not in the user process, so I don't think it makes sense to use the new API there.

For the other uses, you're right that they all use 0/1 so converting them wouldn't change anything.

damiendoligez · 2018-11-05T15:44:53Z

I tried the test program of MPR#7100 and indeed with this PR it doesn't fail.

stdlib/gc.mli

damiendoligez · 2018-11-05T15:55:45Z

Note: it will be time to revisit #1961 after this PR is merged and tested in the wild for some time.

alainfrisch · 2018-11-05T16:10:46Z

I tried the test program of MPR#7100 and indeed with this PR it doesn't fail.

(Actually, I'm confused on whether #1476 already fixed that one.)

alainfrisch

LGTM

damiendoligez · 2018-11-06T12:40:45Z

(Actually, I'm confused on whether #1476 already fixed that one.)

I don't think it did. I tried the test program with 4.07.1 and it failed.

damiendoligez · 2018-11-06T12:42:15Z

CI passed, so I'm merging. Thanks for the review!

alainfrisch · 2018-11-09T13:31:33Z

Other tickets more or less directly related to this PR:

https://caml.inria.fr/mantis/view.php?id=7198
https://caml.inria.fr/mantis/view.php?id=7676 (somehow)
https://caml.inria.fr/mantis/view.php?id=6294
https://caml.inria.fr/mantis/view.php?id=7158

(I'm not sure it's worth mentioning them in the Changelog, but I'll let @damiendoligez decide.)

damiendoligez · 2018-11-12T14:33:31Z

This PR resolves MPR#7198 but the others will also need changes in other software.

I'll add it to Changes.

damiendoligez · 2018-11-13T12:45:05Z

I added a mention of MPR#7198 (3f46a1d). The others are not directly fixed by this PR.

damiendoligez mentioned this pull request Apr 25, 2018

Remove Meta.static_{alloc, free}. #1723

Merged

alainfrisch reviewed Apr 25, 2018

View reviewed changes

byterun/major_gc.c Show resolved Hide resolved

damiendoligez force-pushed the custom-obj-mem branch from 5ad3b36 to 6bab982 Compare May 3, 2018 14:03

damiendoligez added the work-in-progress label May 23, 2018

damiendoligez force-pushed the custom-obj-mem branch from 4ab5946 to d278ab1 Compare June 6, 2018 15:07

damiendoligez removed the work-in-progress label Jun 28, 2018

damiendoligez force-pushed the custom-obj-mem branch from d278ab1 to 2dd846b Compare July 2, 2018 15:09

andersfugmann mentioned this pull request Aug 16, 2018

Streaming upload/download support andersfugmann/aws-s3#4

Closed

damiendoligez force-pushed the custom-obj-mem branch from 2dd846b to ac908e4 Compare September 14, 2018 15:04

alainfrisch approved these changes Oct 29, 2018

View reviewed changes

damiendoligez added 2 commits November 5, 2018 15:12

fix bug in tests/misc/ephetest2.ml

3141649

add caml_alloc_custom_mem and corresponding GC parameters

484d521

alainfrisch reviewed Nov 5, 2018

View reviewed changes

stdlib/gc.mli Outdated Show resolved Hide resolved

remove spurious blank line

2cdf128

alainfrisch approved these changes Nov 5, 2018

View reviewed changes

damiendoligez merged commit 17b64ac into ocaml:trunk Nov 6, 2018

damiendoligez deleted the custom-obj-mem branch November 6, 2018 12:43

damiendoligez mentioned this pull request Nov 12, 2018

remove old GC-triggering heuristic #2144

Merged

hannesm mentioned this pull request Mar 3, 2019

HEADSUP: next release (3.5) plan mirage/mirage#973

Closed

braibant mentioned this pull request Mar 15, 2019

Statistical memory profiling #847

Closed

stedolan mentioned this pull request Mar 15, 2019

Request: make CAML_BA_MAX_MEMORY runtime-configurable #4868

Closed

xavierleroy mentioned this pull request Mar 17, 2019

Wish: make max size of custom space runtime-configurable #6133

Closed

gasche mentioned this pull request Nov 7, 2019

Wrong GC ratio multiplier when allocating custom blocks? #9073

Closed

xavierleroy mentioned this pull request May 9, 2020

GC not working hard enough on bigarrays #7495

Closed

hannesm mentioned this pull request Apr 15, 2022

caml_alloc_pages does not instruct the GC of how much memory could be free by a GC mirage/io-page#38

Open

damiendoligez mentioned this pull request Jun 21, 2023

fix bad behaviour of caml_alloc_custom_mem #12318

Merged

Conversation

damiendoligez commented Apr 25, 2018

Uh oh!

alainfrisch commented Apr 25, 2018

Uh oh!

Uh oh!

alainfrisch commented Apr 25, 2018

Uh oh!

damiendoligez commented Apr 26, 2018

Uh oh!

xavierleroy commented Apr 26, 2018 via email

Uh oh!

stedolan commented May 1, 2018

Uh oh!

alainfrisch commented May 2, 2018

Uh oh!

damiendoligez commented May 3, 2018

Uh oh!

damiendoligez commented May 4, 2018

Uh oh!

alainfrisch commented May 11, 2018

Uh oh!

xavierleroy commented May 20, 2018

Uh oh!

alainfrisch commented May 22, 2018

Uh oh!

alainfrisch commented May 22, 2018

Uh oh!

damiendoligez commented May 23, 2018

Uh oh!

delamonpansie commented Jun 26, 2018

Uh oh!

damiendoligez commented Jun 27, 2018

Uh oh!

gasche commented Sep 14, 2018

Uh oh!

gasche commented Oct 28, 2018

Uh oh!

alainfrisch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alainfrisch commented Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

damiendoligez commented Nov 5, 2018

Uh oh!

damiendoligez commented Nov 5, 2018

Uh oh!

Uh oh!

damiendoligez commented Nov 5, 2018

Uh oh!

alainfrisch commented Nov 5, 2018

Uh oh!

alainfrisch left a comment

Choose a reason for hiding this comment

Uh oh!

damiendoligez commented Nov 6, 2018

Uh oh!

damiendoligez commented Nov 6, 2018

Uh oh!

alainfrisch commented Nov 9, 2018

Uh oh!

damiendoligez commented Nov 12, 2018

Uh oh!

damiendoligez commented Nov 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

alainfrisch commented Oct 29, 2018 •

edited

Loading