I spent some time benchmarking our linear collections, and wanted to create an issue with my findings.
Reading from our linear arrays seems to almost an order of magnitude slower than reading from a vector.
I benchmarked these two functions:
linear :: Array.Linear.Array Int %1-> ()
linear hm =
hm
Linear.& Array.Linear.size
Linear.& \(Linear.Ur sz, arr) -> arr
Linear.& go 0 sz
where
go :: Int -> Int -> Array.Linear.Array Int %1-> ()
go start end arr
| start < end =
Array.Linear.unsafeGet start arr
Linear.& \(Linear.Ur i, arr) -> i `Linear.seq` go (start + 1) end arr
| otherwise = arr `Linear.lseq` ()
dataVector :: Data.Vector.Vector Int -> ()
dataVector v =
let sz = Data.Vector.length v
in go 0 sz
where
go :: Int -> Int -> ()
go start end
| start < end =
(v Data.Vector.! start) `seq` go (start + 1) end
| otherwise = ()
And here's the result on my system (this also includes allocating the array/vector with an initial element, but I measured that that is not dominating the runtime):
benchmarked arrays/reads/Data.Array.Mutable.Linear
time 27.26 ms (26.39 ms .. 28.05 ms)
0.997 R² (0.996 R² .. 0.999 R²)
mean 29.13 ms (28.44 ms .. 30.28 ms)
std dev 1.972 ms (1.178 ms .. 2.606 ms)
variance introduced by outliers: 26% (moderately inflated)
benchmarked arrays/reads/Data.Vector
time 3.841 ms (3.796 ms .. 3.887 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 3.971 ms (3.943 ms .. 4.011 ms)
std dev 107.8 μs (80.19 μs .. 173.3 μs)
When looking at the relevant -ddump-simpl output (with -O), it can be seen that the generated code of Data.Vector is a pretty efficient loop:
$wbReads1_raMu
:: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Array# Int -> ()
[GblId, Arity=3, Str=<L,U><L,U><L,U>, Unf=OtherCon []]
$wbReads1_raMu
= \ (ww_sat7 :: GHC.Prim.Int#)
(ww1_sat8 :: GHC.Prim.Int#)
(ww2_sat9 :: GHC.Prim.Array# Int) ->
join {
exit_X0 [Dmd=<L,C(U)>] :: GHC.Prim.Int# -> ()
[LclId[JoinId(1)], Arity=1, Str=<B,U>b]
exit_X0 (ww3_sasX [OS=OneShot] :: GHC.Prim.Int#)
= case lvl4_raM6 ww3_sasX ww1_sat8 of wild_00 { } } in
join {
exit1_X1 [Dmd=<L,C(U)>] :: GHC.Prim.Int# -> ()
[LclId[JoinId(1)], Arity=1, Str=<B,U>b]
exit1_X1 (ww3_sasX [OS=OneShot] :: GHC.Prim.Int#)
= case lvl4_raM6 ww3_sasX ww1_sat8 of wild_00 { } } in
joinrec {
$wgo2_sat3 [InlPrag=NOUSERINLINE[2], Occ=LoopBreaker]
:: GHC.Prim.Int# -> GHC.Prim.Int# -> ()
[LclId[JoinId(2)], Arity=2, Str=<L,U><L,U>, Unf=OtherCon []]
$wgo2_sat3 (ww3_sasX :: GHC.Prim.Int#) (ww4_sat1 :: GHC.Prim.Int#)
= case GHC.Prim.<# ww3_sasX ww4_sat1 of {
__DEFAULT -> GHC.Tuple.();
1# ->
case GHC.Prim.>=# ww3_sasX 0# of {
__DEFAULT -> jump exit_X0 ww3_sasX;
1# ->
case GHC.Prim.<# ww3_sasX ww1_sat8 of {
__DEFAULT -> jump exit1_X1 ww3_sasX;
1# ->
case GHC.Prim.indexArray#
@Int ww2_sat9 (GHC.Prim.+# ww_sat7 ww3_sasX)
of
{ (# ipv_a97a #) ->
case ipv_a97a of { GHC.Types.I# ipv1_s8LP ->
jump $wgo2_sat3 (GHC.Prim.+# ww3_sasX 1#) ww4_sat1
}
}
}
}
}; } in
jump $wgo2_sat3 0# ww1_sat8
However, the generated Core for our linear arrays is not as pretty:
Rec {
-- RHS size: {terms: 47, types: 183, coercions: 8, joins: 0/0}
$wgo1_raMr
:: GHC.Prim.Int#
-> GHC.Prim.Int#
-> Data.Array.Mutable.Unlifted.Linear.Array# Int
%1 -> ()
[GblId, Arity=3, Str=<L,U><L,U><L,U>, Unf=OtherCon []]
$wgo1_raMr
= \ (ww_sasC :: GHC.Prim.Int#)
(ww1_sasG :: GHC.Prim.Int#)
(ww2_sasK :: Data.Array.Mutable.Unlifted.Linear.Array# Int) ->
case GHC.Prim.<# ww_sasC ww1_sasG of {
__DEFAULT ->
case Unsafe.Coerce.unsafeEqualityProof
@(*)
@(GHC.Types.Any -> GHC.Types.Any)
@((Data.Array.Mutable.Unlifted.Linear.Array# Int -> () -> ())
%1 -> Data.Array.Mutable.Unlifted.Linear.Array# Int
%1 -> ()
%1 -> ())
of
{ Unsafe.Coerce.UnsafeRefl co_a8Ic ->
((\ (x_a8Ib [OS=OneShot] :: GHC.Types.Any) -> x_a8Ib)
`cast` (Sub (Sym co_a8Ic)
:: (GHC.Types.Any -> GHC.Types.Any)
~R# ((Data.Array.Mutable.Unlifted.Linear.Array# Int -> () -> ())
%1 -> Data.Array.Mutable.Unlifted.Linear.Array# Int
%1 -> ()
%1 -> ())))
(Data.Array.Mutable.Unlifted.Linear.lseq1 @Int @())
ww2_sasK
GHC.Tuple.()
};
1# ->
case Unsafe.Coerce.unsafeEqualityProof
@(*)
@(GHC.Types.Any -> GHC.Types.Any)
@((Data.Array.Mutable.Unlifted.Linear.Array# Int
-> (# Linear.Ur Int,
Data.Array.Mutable.Unlifted.Linear.Array# Int #))
%1 -> Data.Array.Mutable.Unlifted.Linear.Array# Int
%1 -> (# Linear.Ur Int,
Data.Array.Mutable.Unlifted.Linear.Array# Int #))
of
{ Unsafe.Coerce.UnsafeRefl co_a8py ->
case ((\ (x_a8px [OS=OneShot] :: GHC.Types.Any) -> x_a8px)
`cast` (Sub (Sym co_a8py)
:: (GHC.Types.Any -> GHC.Types.Any)
~R# ((Data.Array.Mutable.Unlifted.Linear.Array# Int
-> (# Linear.Ur Int,
Data.Array.Mutable.Unlifted.Linear.Array# Int #))
%1 -> Data.Array.Mutable.Unlifted.Linear.Array# Int
%1 -> (# Linear.Ur Int,
Data.Array.Mutable.Unlifted.Linear.Array# Int #))))
(\ (ds1_a8pC :: Data.Array.Mutable.Unlifted.Linear.Array# Int) ->
GHC.Magic.runRW#
@('GHC.Types.TupleRep
'[ 'GHC.Types.LiftedRep, 'GHC.Types.UnliftedRep])
@(# Linear.Ur Int, Data.Array.Mutable.Unlifted.Linear.Array# Int #)
(\ (s_a8pK [OS=OneShot] :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.readArray#
@GHC.Prim.RealWorld
@Int
(ds1_a8pC
`cast` (Data.Array.Mutable.Unlifted.Linear.N:Array#[0] <Int>_N
:: Data.Array.Mutable.Unlifted.Linear.Array# Int
~R# GHC.Prim.MutableArray# GHC.Prim.RealWorld Int))
ww_sasC
s_a8pK
of
{ (# ipv_a8pM, ipv1_a8pN #) ->
(# Data.Unrestricted.Internal.Ur.Ur @Int ipv1_a8pN, ds1_a8pC #)
}))
ww2_sasK
of
{ (# ipv_a8pQ, ipv1_a8pR #) ->
case ipv_a8pQ of { Linear.Ur i_a89p ->
$wgo1_raMr (GHC.Prim.+# ww_sasC 1#) ww1_sasG ipv1_a8pR
}
}
}
}
end Rec }
I can not read core well, but I think I can see a few things:
I'll try to look into these further.
I spent some time benchmarking our linear collections, and wanted to create an issue with my findings.
Reading from our linear arrays seems to almost an order of magnitude slower than reading from a
vector.I benchmarked these two functions:
And here's the result on my system (this also includes allocating the array/vector with an initial element, but I measured that that is not dominating the runtime):
When looking at the relevant
-ddump-simploutput (with-O), it can be seen that the generated code ofData.Vectoris a pretty efficient loop:However, the generated Core for our linear arrays is not as pretty:
I can not read core well, but I think I can see a few things:
Data.Array.Mutable.Unlifted.Linear, because the functions there are not usually inlined (see the comment and the whole discussion around it: Replace Unsafe.MutableArray with Data.Array.Mutable.Unlifted.Linear #187 (comment)).Ur, which is a boxed value. I do not know if there is a way around this.I'll try to look into these further.