-
Notifications
You must be signed in to change notification settings - Fork 358
feat(java): Chunk by chunk predictive map serialization protocol #1722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(java): Chunk by chunk predictive map serialization protocol #1722
Conversation
java/fury-core/src/main/java/org/apache/fury/serializer/collection/AbstractMapSerializer.java
Outdated
Show resolved
Hide resolved
|
Hi @Hen1ng , I noticed we created a MapChunkWriter for serialization/deserialization everytime when we serialize a map. This will introduce extra object allocation cose. And I found that we use |
|
Another thing I found is that we seems not support predict same type in this PR. For example: public static void main(String[] args) {
Map<String, Integer> map =new HashMap<>(20);
for (int i = 0; i < 20; i++) {
map.put("Key"+i, i);
}
Fury fury = Fury.builder().withChunkSerializeMapEnable(true).build();
byte[] result = null;
for (int i = 0; i < 1000000000; i++) {
result = fury.serialize(map);
}
fury.deserialize(result);
}With this PR, we still needs to write key type and value type for every element, could we optimize this in current PR? |
|
Benchmark (size) (tracking) Mode Cnt Score Error Units generalWrite benchmark compare, benchmark code @State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {
private Fury furyMapChunk;
private Fury fury;
Map<Integer, Integer> map;
MapBean mapBean = new MapBean();
Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
Bean bean;
@Param({"64", "128", "256", "512"})
int size;
@Param({"true", "false"})
boolean tracking = false;
@Setup(Level.Trial)
public void init() {
furyMapChunk =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(true)
.requireClassRegistration(false)
.build();
fury =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(false)
.requireClassRegistration(false)
.build();
map = new HashMap<>(size);
for (int i = 0; i < size; i++) {
map.put(i, i);
beanBBeanBMap.put(new BeanB(), new BeanB());
}
bean = new Bean();
bean.setMap(map);
mapBean.setMap(beanBBeanBMap);
}
@Benchmark
public void testGeneralChunkWrite() {
final byte[] serialize = furyMapChunk.serialize(map);
}
@Benchmark
public void testGeneralWrite() {
final byte[] serialize = fury.serialize(map);
}
} |
|
Benchmark (size) (tracking) Mode Cnt Score Error Units finalWrite benchmark compare, benchmark code is below @State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {
private Fury furyMapChunk;
private Fury fury;
Map<Integer, Integer> map;
MapBean mapBean = new MapBean();
Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
Bean bean;
@Param({"64", "128", "256", "512"})
int size;
@Param({"true", "false"})
boolean tracking = false;
@Setup(Level.Trial)
public void init() {
furyMapChunk =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(true)
.requireClassRegistration(false)
.build();
fury =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(false)
.requireClassRegistration(false)
.build();
map = new HashMap<>(size);
for (int i = 0; i < size; i++) {
map.put(i, i);
}
bean = new Bean();
bean.setMap(map);
}
@Benchmark
public void testFinalChunkWrite() {
final byte[] serialize = furyMapChunk.serialize(bean);
}
@Benchmark
public void testFinalWrite() {
final byte[] serialize = fury.serialize(bean);
}
}
|
|
HnBenchmark.testNoFinalGenericChunkWrite 64 true avgt 15 5173.244 ± 551.374 ns/op testNoFinalGenericWrite benchmark compare, benchmark code is below @State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {
private Fury furyMapChunk;
private Fury fury;
Map<Integer, Integer> map;
MapBean mapBean = new MapBean();
Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
Bean bean;
@Param({"64", "128", "256"})
int size;
@Param({"true", "false"})
boolean tracking = false;
@Setup(Level.Trial)
public void init() {
furyMapChunk =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(true)
.requireClassRegistration(false)
.build();
fury =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(false)
.requireClassRegistration(false)
.build();
map = new HashMap<>(size);
for (int i = 0; i < size; i++) {
map.put(i, i);
beanBBeanBMap.put(new BeanB(), new BeanB());
}
bean = new Bean();
bean.setMap(map);
mapBean.setMap(beanBBeanBMap);
}
@Benchmark
public void testNoFinalGenericChunkWrite() {
final byte[] serialize = furyMapChunk.serialize(mapBean);
}
@Benchmark
public void testNoFinalGenericWrite() {
final byte[] serialize = fury.serialize(mapBean);
}
} |
|
Benchmark (size) (tracking) Mode Cnt Score Error Units @State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {
private Fury furyMapChunk;
private Fury fury;
Map<Integer, Integer> map;
MapBean mapBean = new MapBean();
Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
Bean bean;
@Param({"64", "128", "256"})
int size;
@Param({"true", "false"})
boolean tracking = false;
@Setup(Level.Trial)
public void init() {
furyMapChunk =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(true)
.requireClassRegistration(false)
.build();
fury =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(false)
.requireClassRegistration(false)
.build();
map = new HashMap<>(size);
for (int i = 0; i < size; i++) {
if ( i == 0) {
map.put(null, i);
beanBBeanBMap.put(null, new BeanB());
continue;
}
if (i == 10) {
map.put(i, null);
beanBBeanBMap.put(new BeanB(), null);
continue;
}
map.put(i, i);
beanBBeanBMap.put(new BeanB(), new BeanB());
}
bean = new Bean();
bean.setMap(map);
mapBean.setMap(beanBBeanBMap);
}
@Benchmark
public void testGeneralChunkWriteWithNull() {
final byte[] serialize = furyMapChunk.serialize(map);
}
@Benchmark
public void testGeneralWriteWithNull() {
final byte[] serialize = fury.serialize(map);
}
} |
|
The performance are basically same, it's a little unexpected. Could you share some profiling data here? |
|
java/fury-core/src/main/java/org/apache/fury/serializer/collection/AbstractMapSerializer.java
Show resolved
Hide resolved
java/fury-core/src/main/java/org/apache/fury/serializer/collection/AbstractMapSerializer.java
Outdated
Show resolved
Hide resolved
|
Seems MapChunkWriter is not used anymore, could we remove it ? @Hen1ng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this is lots of work, thank you so much. The performance and codegen version can be left in later PR.
…p serialization (#2025) ## What does this PR do? This pr provides a new implementation for chunk based map serialization. ## Related issues #1571 #1549 #1722 Closes #925 ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fury/issues/new/choose) describing the need to do so and update the document if necessary. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark Deserialization are much faster than no-chunk version, serialization are faster if map size are bigger Using the benchmark code in #1722 (comment): This PR has run faster, it gets up to **3x faster** : ```java Benchmark (size) (tracking) Mode Cnt Score Error Units HnBenchmark.testGeneralChunkWriteWithNull 64 true avgt 3 965.521 ± 1830.936 ns/op HnBenchmark.testGeneralChunkWriteWithNull 64 false avgt 3 1060.411 ± 3424.719 ns/op HnBenchmark.testGeneralChunkWriteWithNull 128 true avgt 3 2404.445 ± 8687.122 ns/op HnBenchmark.testGeneralChunkWriteWithNull 128 false avgt 3 1814.507 ± 1722.751 ns/op HnBenchmark.testGeneralChunkWriteWithNull 256 true avgt 3 3944.632 ± 2203.076 ns/op HnBenchmark.testGeneralChunkWriteWithNull 256 false avgt 3 3288.805 ± 867.047 ns/op HnBenchmark.testGeneralWriteWithNull 64 true avgt 3 1962.688 ± 2828.210 ns/op HnBenchmark.testGeneralWriteWithNull 64 false avgt 3 1490.634 ± 962.836 ns/op HnBenchmark.testGeneralWriteWithNull 128 true avgt 3 3659.806 ± 7227.436 ns/op HnBenchmark.testGeneralWriteWithNull 128 false avgt 3 4084.654 ± 7374.774 ns/op HnBenchmark.testGeneralWriteWithNull 256 true avgt 3 9596.658 ± 20767.262 ns/op HnBenchmark.testGeneralWriteWithNull 256 false avgt 3 6679.325 ± 5472.179 ns/op ``` With StringMap and IntMap benchmark: ```java Benchmark (enableChunkEncoding) (mapSize) Mode Cnt Score Error Units MapSerializationSuite.deserializeIntMap false 5 thrpt 3 3804604.842 ± 15328547.705 ops/s MapSerializationSuite.deserializeIntMap false 20 thrpt 3 1254687.969 ± 388949.724 ops/s MapSerializationSuite.deserializeIntMap false 50 thrpt 3 495176.849 ± 335702.097 ops/s MapSerializationSuite.deserializeIntMap false 100 thrpt 3 258875.012 ± 32886.176 ops/s MapSerializationSuite.deserializeIntMap false 200 thrpt 3 134137.015 ± 114908.454 ops/s MapSerializationSuite.deserializeIntMap true 5 thrpt 3 5997383.562 ± 4598913.048 ops/s MapSerializationSuite.deserializeIntMap true 20 thrpt 3 1797855.524 ± 3853406.173 ops/s MapSerializationSuite.deserializeIntMap true 50 thrpt 3 582412.110 ± 1047668.070 ops/s MapSerializationSuite.deserializeIntMap true 100 thrpt 3 389066.866 ± 151297.708 ops/s MapSerializationSuite.deserializeIntMap true 200 thrpt 3 188316.860 ± 35331.909 ops/s MapSerializationSuite.deserializeStringMap false 5 thrpt 3 2898963.533 ± 1930240.310 ops/s MapSerializationSuite.deserializeStringMap false 20 thrpt 3 872196.086 ± 871637.268 ops/s MapSerializationSuite.deserializeStringMap false 50 thrpt 3 308761.737 ± 58099.196 ops/s MapSerializationSuite.deserializeStringMap false 100 thrpt 3 157261.914 ± 397356.241 ops/s MapSerializationSuite.deserializeStringMap false 200 thrpt 3 86576.549 ± 102489.156 ops/s MapSerializationSuite.deserializeStringMap true 5 thrpt 3 3701089.567 ± 1529899.331 ops/s MapSerializationSuite.deserializeStringMap true 20 thrpt 3 1048550.399 ± 130102.760 ops/s MapSerializationSuite.deserializeStringMap true 50 thrpt 3 407559.246 ± 38205.273 ops/s MapSerializationSuite.deserializeStringMap true 100 thrpt 3 172109.437 ± 397927.346 ops/s MapSerializationSuite.deserializeStringMap true 200 thrpt 3 92525.977 ± 379321.772 ops/s MapSerializationSuite.serializeIntMap false 5 thrpt 3 7958692.983 ± 1934287.574 ops/s MapSerializationSuite.serializeIntMap false 20 thrpt 3 2425269.897 ± 3763706.776 ops/s MapSerializationSuite.serializeIntMap false 50 thrpt 3 1079804.122 ± 215967.411 ops/s MapSerializationSuite.serializeIntMap false 100 thrpt 3 369848.671 ± 433172.821 ops/s MapSerializationSuite.serializeIntMap false 200 thrpt 3 192858.945 ± 71543.709 ops/s MapSerializationSuite.serializeIntMap true 5 thrpt 3 7239453.648 ± 3855324.170 ops/s MapSerializationSuite.serializeIntMap true 20 thrpt 3 2137006.685 ± 3823762.656 ops/s MapSerializationSuite.serializeIntMap true 50 thrpt 3 811639.511 ± 2407986.801 ops/s MapSerializationSuite.serializeIntMap true 100 thrpt 3 412728.569 ± 149199.142 ops/s MapSerializationSuite.serializeIntMap true 200 thrpt 3 236602.475 ± 253662.098 ops/s MapSerializationSuite.serializeStringMap false 5 thrpt 3 5821603.026 ± 1397740.496 ops/s MapSerializationSuite.serializeStringMap false 20 thrpt 3 1712819.341 ± 321017.433 ops/s MapSerializationSuite.serializeStringMap false 50 thrpt 3 615260.241 ± 806075.165 ops/s MapSerializationSuite.serializeStringMap false 100 thrpt 3 265117.558 ± 146904.745 ops/s MapSerializationSuite.serializeStringMap false 200 thrpt 3 128618.697 ± 94723.953 ops/s MapSerializationSuite.serializeStringMap true 5 thrpt 3 4503474.325 ± 11254674.336 ops/s MapSerializationSuite.serializeStringMap true 20 thrpt 3 1732501.942 ± 373691.778 ops/s MapSerializationSuite.serializeStringMap true 50 thrpt 3 596678.154 ± 173893.988 ops/s MapSerializationSuite.serializeStringMap true 100 thrpt 3 336814.584 ± 134582.563 ops/s MapSerializationSuite.serializeStringMap true 200 thrpt 3 143124.619 ± 200889.695 ops/s ```    
## What does this PR do? Implement chunk based map serialization in #925. This pr doesn't provide JIT support, it will be implemented in later PR. ## Related issues <!-- Is there any related issue? Please attach here. - #925 --> ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fury/issues/new/choose) describing the need to do so and update the document if necessary. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. --> --------- Co-authored-by: hening <ninghe.hn@alibaba-inc.com> Co-authored-by: chaokunyang <shawn.ck.yang@gmail.com>
…p serialization (#2025) This pr provides a new implementation for chunk based map serialization. Closes #925 <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fury/issues/new/choose) describing the need to do so and update the document if necessary. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? Deserialization are much faster than no-chunk version, serialization are faster if map size are bigger Using the benchmark code in #1722 (comment): This PR has run faster, it gets up to **3x faster** : ```java Benchmark (size) (tracking) Mode Cnt Score Error Units HnBenchmark.testGeneralChunkWriteWithNull 64 true avgt 3 965.521 ± 1830.936 ns/op HnBenchmark.testGeneralChunkWriteWithNull 64 false avgt 3 1060.411 ± 3424.719 ns/op HnBenchmark.testGeneralChunkWriteWithNull 128 true avgt 3 2404.445 ± 8687.122 ns/op HnBenchmark.testGeneralChunkWriteWithNull 128 false avgt 3 1814.507 ± 1722.751 ns/op HnBenchmark.testGeneralChunkWriteWithNull 256 true avgt 3 3944.632 ± 2203.076 ns/op HnBenchmark.testGeneralChunkWriteWithNull 256 false avgt 3 3288.805 ± 867.047 ns/op HnBenchmark.testGeneralWriteWithNull 64 true avgt 3 1962.688 ± 2828.210 ns/op HnBenchmark.testGeneralWriteWithNull 64 false avgt 3 1490.634 ± 962.836 ns/op HnBenchmark.testGeneralWriteWithNull 128 true avgt 3 3659.806 ± 7227.436 ns/op HnBenchmark.testGeneralWriteWithNull 128 false avgt 3 4084.654 ± 7374.774 ns/op HnBenchmark.testGeneralWriteWithNull 256 true avgt 3 9596.658 ± 20767.262 ns/op HnBenchmark.testGeneralWriteWithNull 256 false avgt 3 6679.325 ± 5472.179 ns/op ``` With StringMap and IntMap benchmark: ```java Benchmark (enableChunkEncoding) (mapSize) Mode Cnt Score Error Units MapSerializationSuite.deserializeIntMap false 5 thrpt 3 3804604.842 ± 15328547.705 ops/s MapSerializationSuite.deserializeIntMap false 20 thrpt 3 1254687.969 ± 388949.724 ops/s MapSerializationSuite.deserializeIntMap false 50 thrpt 3 495176.849 ± 335702.097 ops/s MapSerializationSuite.deserializeIntMap false 100 thrpt 3 258875.012 ± 32886.176 ops/s MapSerializationSuite.deserializeIntMap false 200 thrpt 3 134137.015 ± 114908.454 ops/s MapSerializationSuite.deserializeIntMap true 5 thrpt 3 5997383.562 ± 4598913.048 ops/s MapSerializationSuite.deserializeIntMap true 20 thrpt 3 1797855.524 ± 3853406.173 ops/s MapSerializationSuite.deserializeIntMap true 50 thrpt 3 582412.110 ± 1047668.070 ops/s MapSerializationSuite.deserializeIntMap true 100 thrpt 3 389066.866 ± 151297.708 ops/s MapSerializationSuite.deserializeIntMap true 200 thrpt 3 188316.860 ± 35331.909 ops/s MapSerializationSuite.deserializeStringMap false 5 thrpt 3 2898963.533 ± 1930240.310 ops/s MapSerializationSuite.deserializeStringMap false 20 thrpt 3 872196.086 ± 871637.268 ops/s MapSerializationSuite.deserializeStringMap false 50 thrpt 3 308761.737 ± 58099.196 ops/s MapSerializationSuite.deserializeStringMap false 100 thrpt 3 157261.914 ± 397356.241 ops/s MapSerializationSuite.deserializeStringMap false 200 thrpt 3 86576.549 ± 102489.156 ops/s MapSerializationSuite.deserializeStringMap true 5 thrpt 3 3701089.567 ± 1529899.331 ops/s MapSerializationSuite.deserializeStringMap true 20 thrpt 3 1048550.399 ± 130102.760 ops/s MapSerializationSuite.deserializeStringMap true 50 thrpt 3 407559.246 ± 38205.273 ops/s MapSerializationSuite.deserializeStringMap true 100 thrpt 3 172109.437 ± 397927.346 ops/s MapSerializationSuite.deserializeStringMap true 200 thrpt 3 92525.977 ± 379321.772 ops/s MapSerializationSuite.serializeIntMap false 5 thrpt 3 7958692.983 ± 1934287.574 ops/s MapSerializationSuite.serializeIntMap false 20 thrpt 3 2425269.897 ± 3763706.776 ops/s MapSerializationSuite.serializeIntMap false 50 thrpt 3 1079804.122 ± 215967.411 ops/s MapSerializationSuite.serializeIntMap false 100 thrpt 3 369848.671 ± 433172.821 ops/s MapSerializationSuite.serializeIntMap false 200 thrpt 3 192858.945 ± 71543.709 ops/s MapSerializationSuite.serializeIntMap true 5 thrpt 3 7239453.648 ± 3855324.170 ops/s MapSerializationSuite.serializeIntMap true 20 thrpt 3 2137006.685 ± 3823762.656 ops/s MapSerializationSuite.serializeIntMap true 50 thrpt 3 811639.511 ± 2407986.801 ops/s MapSerializationSuite.serializeIntMap true 100 thrpt 3 412728.569 ± 149199.142 ops/s MapSerializationSuite.serializeIntMap true 200 thrpt 3 236602.475 ± 253662.098 ops/s MapSerializationSuite.serializeStringMap false 5 thrpt 3 5821603.026 ± 1397740.496 ops/s MapSerializationSuite.serializeStringMap false 20 thrpt 3 1712819.341 ± 321017.433 ops/s MapSerializationSuite.serializeStringMap false 50 thrpt 3 615260.241 ± 806075.165 ops/s MapSerializationSuite.serializeStringMap false 100 thrpt 3 265117.558 ± 146904.745 ops/s MapSerializationSuite.serializeStringMap false 200 thrpt 3 128618.697 ± 94723.953 ops/s MapSerializationSuite.serializeStringMap true 5 thrpt 3 4503474.325 ± 11254674.336 ops/s MapSerializationSuite.serializeStringMap true 20 thrpt 3 1732501.942 ± 373691.778 ops/s MapSerializationSuite.serializeStringMap true 50 thrpt 3 596678.154 ± 173893.988 ops/s MapSerializationSuite.serializeStringMap true 100 thrpt 3 336814.584 ± 134582.563 ops/s MapSerializationSuite.serializeStringMap true 200 thrpt 3 143124.619 ± 200889.695 ops/s ```    



What does this PR do?
Implement chunk based map serialization in #925. This pr doesn't provide JIT support, it will be implemented in later PR.
Related issues
Does this PR introduce any user-facing change?
Benchmark