Skip to content

Add caml_alloc_custom_mem#1738

Merged
damiendoligez merged 10 commits intoocaml:trunkfrom
damiendoligez:custom-obj-mem
Nov 6, 2018
Merged

Add caml_alloc_custom_mem#1738
damiendoligez merged 10 commits intoocaml:trunkfrom
damiendoligez:custom-obj-mem

Conversation

@damiendoligez
Copy link
Copy Markdown
Member

This PR is related to #1476 and MPR#7750.

It adds one function to the runtime and three new GC parameters.

The function (caml_alloc_custom_mem) is to be used by C code to allocate custom objects that consist of a small block in the heap pointing to a large amount of memory outside the heap (typically allocated with malloc). This function takes only one size parameter, the number of bytes of memory outside the heap used by this custom block.

The user can then control how much of this memory is retained by dead blocks because the GC does not immediately collect dead blocks ("floating garbage"). This is done with three new GC parameters:

  • M custom_major_ratio This is a percentage relative to the major heap size. A complete GC cycle will be done every time 2/3 of that much memory is allocated for blocks in the major heap. Assuming constant allocation and deallocation rates, this means there are at most (M/100 * major-heap-size) bytes of floating garbage at any time. The factor of 2/3 (or 1.5) is (roughly speaking) because the major GC takes 1.5 cycles (previous cycle + marking phase) before it starts to deallocate dead blocks allocated during the previous cycle.

  • m custom_minor_ratio This is a percentage relative to the minor heap size. A minor GC will be forced when the memory allocated for blocks in the minor heap reaches this value.

  • n custom_minor_max_size This is a number of bytes. Blocks allocated in the minor heap that are over this size are assumed to be long-lived and counted immediately against the major heap rather than the minor heap. This is to avoid large allocations forcing a minor GC every time.

The function is then used to allocate bigarrays, allowing us to get rid of the infamous CAML_BA_MAX_MEMORY constant.

When applied to the test program of MPR#7750, the results are pretty good: we can make the process size vary from about 6GB (with M=10) to 15 GB (with M=100) to 25 GB (with M=200).

needs to be done: documentation

@alainfrisch
Copy link
Copy Markdown
Contributor

This should also hopefully address MPR#7100.

@alainfrisch
Copy link
Copy Markdown
Contributor

Not strictly speaking related to the proposed change, but I think it's good to maintain an overall understanding of how custom blocks are handled by the GC. Could you explain the following heuristics in caml_adjust_gc_speed:

  if (caml_extra_heap_resources
           > (double) caml_minor_heap_wsz / 2.0
             / (double) caml_stat_heap_wsz) {
    CAML_INSTR_INT ("request_major/adjust_gc_speed_2@", 1);
    caml_request_major_slice ();
  }

As the heap grows during the life of the program, this will trigger more and more often. For instance, each i/o channel allocation (currently with ratio 1/1000) will trigger a major slice as soon as:

  caml_stat_heap_wsz > 500 * caml_minor_heap_wsz

and this can easily happen. Does it really make sense to trigger GC slices more often when the heap grows for a constant allocation rate of custom blocks?

@damiendoligez
Copy link
Copy Markdown
Member Author

A bit of digging with git blame reveals that this heuristic was already present in 1995 in the very first commit in OCaml's history, so it dates back to Caml Light... It's probably time to re-assess because I really don't remember the reason for this.

@xavierleroy
Copy link
Copy Markdown
Contributor

xavierleroy commented Apr 26, 2018 via email

@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented May 1, 2018

This looks really nice. The only bit that worries me is custom_minor_max_size, because it looks like performance might change abruptly when allocation sizes go from 8191 to 8193 bytes. Rather than a threshold, how about something smoother, like say adding the first 8k to extra_heap_resources_minor and the rest to extra_heap_resources?

@alainfrisch
Copy link
Copy Markdown
Contributor

If you're in the mood for archeology, the Caml Light sources (with CVS
history) are here

This leads to :

camllight/camllight@67c1dc3#diff-d0f533866c552cc04626e6d892579278R153

which does not give any rationale for that specific heuristics.

@damiendoligez
Copy link
Copy Markdown
Member Author

This looks really nice. The only bit that worries me is custom_minor_max_size, because it looks like performance might change abruptly when allocation sizes go from 8191 to 8193 bytes. Rather than a threshold, how about something smoother, like say adding the first 8k to extra_heap_resources_minor and the rest to extra_heap_resources?

Sounds like a good idea. I'll try it.

@damiendoligez
Copy link
Copy Markdown
Member Author

@stedolan Done.

@alainfrisch
Copy link
Copy Markdown
Contributor

@damiendoligez Should this new API also be used for i/o channels?

@xavierleroy
Copy link
Copy Markdown
Contributor

@alainfrisch: I/O buffers consume both one file descriptor (a system resource, perhaps more appropriate for the old API?) and an out-of-heap buffer (a memory resource, very appropriate for the new API). So, it's a tie.

This said I'm very happy to see the new API. If the experts are OK with it I suggest merging in trunk ASAP.

@alainfrisch
Copy link
Copy Markdown
Contributor

I/O buffers consume both one file descriptor (a system resource, perhaps more appropriate for the old API?)

It don't think it's the case. The life-time of the file descriptor does not depend on the GC: it can and must be released explicitly (with close), and the GC finaliser won't touch it (no implicit close). So as far as I can tell, the channel object is equivalent to a pure memory buffer, as far as the GC is concerned.

@alainfrisch
Copy link
Copy Markdown
Contributor

(About i/o channels: with an extra indirection, one could even destroy (free) the buffer when the channel is (explicitly) closed, but it's tricky to make this thread-safe.)

@damiendoligez
Copy link
Copy Markdown
Member Author

@alainfrisch you're right, the FD part of the channel is unmanaged, and the memory part is a good fit for the new API, so I'm going to convert it too.

Don't merge before I do that.

@delamonpansie
Copy link
Copy Markdown
Contributor

Why just not use caml_alloc_dependent_memory and caml_free_dependent_memory?
Yes, this would require extending caml_ba_proxy with field holding actual size of allocation. It simple and relies on already existing Gc infrastructure.

Here is a small test case: (it requires 4.07)

module Gc_helper = struct
  type void
  external alloc_dependent_memory : (int [@untaged]) -> void = "caml_alloc_dependent_memory"
  external free_dependent_memory : (int [@untaged]) -> void = "caml_free_dependent_memory"

  let allocated = ref 0
  let minor_heap_size = Gc.((get ()).minor_heap_size) / (Sys.word_size / 8)

  let bigarray v =
    let size = Bigarray.(Array1.size_in_bytes v) in
    allocated := !allocated + size;
    if !allocated > minor_heap_size then begin
        allocated := size;
        Gc.major_slice (-1) |> ignore; (* heuristics from https://github.com/ocaml/ocaml/blob/trunk/byterun/memory.c#L522 *)
      end;
    alloc_dependent_memory size |> ignore;
    Gc.finalise_last (fun () -> free_dependent_memory size |> ignore) v; (* this should go to caml_ba_finaliser *)
    v
end

module Memstat = struct
  let word_size = Sys.word_size / 8 (* word size in bytes *)

  let kmgt size =
    let sprintf = Printf.sprintf in
    if size < 512 * 1024 then
      sprintf "%.2fk" (float size /. 0x1.0p10)
    else if size < 512 * 1024 * 1024 then
      sprintf "%.2fm" (float size /. 0x1.0p20)
    else if size < 512 * 1024 * 1024 * 1024 then
      sprintf "%.2fg" (float size /. 0x1.0p30)
    else
      sprintf "%.2ft" (float size /. 0x1.0p40)

  let statm () =
    let open Scanf in
    let fd = Scanning.from_file "/proc/self/statm" in
    let result = bscanf fd "%i %i %i"
                   (fun size resident shared -> (size, resident, shared) ) in
    Scanning.close_in fd;
    result

  let alive_buf_size = ref 0
  let total_buf_size = ref 0

  let alloc size v =
    total_buf_size := !total_buf_size + size;
    alive_buf_size := !alive_buf_size + size;
    Gc.finalise_last (fun () -> total_buf_size := !total_buf_size - size) v
  let free size =
    alive_buf_size := !alive_buf_size - size

  let rec memstat () =
    let stat = Gc.stat () in
    let vss, _, _ = statm () in
    Printf.printf "% 9s vss; % 9s heap; % 9s live; % 9s alive buf; % 9s total buf; % 9.0f%% garbage\n%!"
      (kmgt (vss * 4096))
      (kmgt (stat.heap_words * word_size))
      (kmgt (stat.live_words * word_size))
      (kmgt !alive_buf_size)
      (kmgt !total_buf_size)
      (100. *. float (!total_buf_size - !alive_buf_size) /. float !alive_buf_size);
    Unix.sleepf 0.5;
    memstat ()

  let _ =
    Thread.create memstat ()
end


(* Create some memory pressure, simulating long running programm with some
   reasonable amount of small objects *)
let tree =
  let limit = 1024 * 1024 * (try int_of_string (Sys.getenv "LIMIT")
                             with _ -> 10) in
  let key size =
    Bytes.init size (fun _ -> Random.int 9 + Char.code '0' |> Char.chr) in
  let module T = Set.Make(Bytes) in
  let rec build total_size t =
    if total_size > limit then
      t
    else
      let size = Random.int 16 in
      build (total_size + size) (T.add (key size) t)
  in
  print_string "Allocating some objects\n";
  build 0 T.empty

let run message alloc =
  Gc.compact ();
  print_string message;
  print_newline ();

  let q = Queue.create () in
  let deadline = Unix.gettimeofday () +. 15.0 in
  while Unix.gettimeofday () < deadline do
    let size = 2 * 1024 * 1024 + Random.int (2 * 1024 * 1024) in
    let buf = alloc size in
    Queue.push (size, buf) q;
    Memstat.alloc size buf;

    (* pretend  we have about 30 active buffers at every moment of time *)
    while Queue.length q > 30 do
      let size, _ = Queue.pop q in
      Memstat.free size;
    done;
  done;

  (* drain queue *)
  while Queue.length q > 0 do
    let size, _ = Queue.pop q in
    Memstat.alive_buf_size := !Memstat.alive_buf_size - size
  done


let _ =
  run "Bytes" (fun size -> Bytes.create size);
  run "helped Bigarray" (fun size -> Bigarray.(Array1.create char c_layout size) |> Gc_helper.bigarray);
  run "plain Bigarray" (fun size -> Bigarray.(Array1.create char c_layout size));


(* Local Variables: *)
(* compile-command: "ocamlopt -thread unix.cmxa threads.cmxa bigarray.cmxa gc_sim.ml -o gc_sim" *)
(* End: *)

@damiendoligez
Copy link
Copy Markdown
Member Author

Why just not use caml_alloc_dependent_memory and caml_free_dependent_memory?

That API is harder to use (it mandates the use of finalizers) and never caught on. It doesn't let the user control the ratio of heap to external memory.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 14, 2018

As far as I can tell, this PR has not been fully reviewed (and not approved), and @damiendoligez indicates that it is ready. @stedolan, would you maybe be willing to do a review?

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 28, 2018

( @alainfrisch or @jhjourdan might be qualified to review as well. )

Copy link
Copy Markdown
Contributor

@alainfrisch alainfrisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only cosmetic comments.

@damiendoligez Did you test MPR#7100 against this branch?

@alainfrisch
Copy link
Copy Markdown
Contributor

alainfrisch commented Oct 29, 2018

You could also use the new API in win32graph/draw.c, graph/image.c and avoid magic constants in those files. It seems all other uses in the core distribution are with default values for mem/max (0/1), so could also be switched to new API (or not).

@damiendoligez
Copy link
Copy Markdown
Member Author

I'm not sure about Win32, but on X Window, the memory held by a Pixmap is in the server, not in the user process, so I don't think it makes sense to use the new API there.

For the other uses, you're right that they all use 0/1 so converting them wouldn't change anything.

@damiendoligez
Copy link
Copy Markdown
Member Author

I tried the test program of MPR#7100 and indeed with this PR it doesn't fail.

@damiendoligez
Copy link
Copy Markdown
Member Author

Note: it will be time to revisit #1961 after this PR is merged and tested in the wild for some time.

@alainfrisch
Copy link
Copy Markdown
Contributor

I tried the test program of MPR#7100 and indeed with this PR it doesn't fail.

(Actually, I'm confused on whether #1476 already fixed that one.)

Copy link
Copy Markdown
Contributor

@alainfrisch alainfrisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@damiendoligez
Copy link
Copy Markdown
Member Author

(Actually, I'm confused on whether #1476 already fixed that one.)

I don't think it did. I tried the test program with 4.07.1 and it failed.

@damiendoligez
Copy link
Copy Markdown
Member Author

CI passed, so I'm merging. Thanks for the review!

@damiendoligez damiendoligez merged commit 17b64ac into ocaml:trunk Nov 6, 2018
@damiendoligez damiendoligez deleted the custom-obj-mem branch November 6, 2018 12:43
@alainfrisch
Copy link
Copy Markdown
Contributor

Other tickets more or less directly related to this PR:

https://caml.inria.fr/mantis/view.php?id=7198
https://caml.inria.fr/mantis/view.php?id=7676 (somehow)
https://caml.inria.fr/mantis/view.php?id=6294
https://caml.inria.fr/mantis/view.php?id=7158

(I'm not sure it's worth mentioning them in the Changelog, but I'll let @damiendoligez decide.)

@damiendoligez
Copy link
Copy Markdown
Member Author

This PR resolves MPR#7198 but the others will also need changes in other software.

I'll add it to Changes.

@damiendoligez
Copy link
Copy Markdown
Member Author

I added a mention of MPR#7198 (3f46a1d). The others are not directly fixed by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants