HADOOP-19385. S3A: Add iceberg bulk delete test#7316
HADOOP-19385. S3A: Add iceberg bulk delete test#7316steveloughran wants to merge 8 commits intoapache:trunkfrom
Conversation
09b4682 to
d37310c
Compare
|
🎊 +1 overall
This message was automatically generated. |
| <type>test-jar</type> | ||
| </dependency> | ||
|
|
||
| <dependency> |
| "fs.s3a.performance.flags"; | ||
|
|
||
| /** | ||
| * All performance flags in the enumeration. |
| * @return a configuration for subclasses to extend | ||
| */ | ||
|
|
||
| @Override |
There was a problem hiding this comment.
cut for now, or make this suite parameterized on bulk delete enabled
| * Format tests. | ||
| */ | ||
|
|
||
| package org.apache.fs.test.formats; No newline at end of file |
|
|
||
| private static final int DELETE_FILE_COUNT = 7; | ||
|
|
||
| @Parameterized.Parameters(name = "multiobjectdelete-{0}-usebulk-{1}") |
There was a problem hiding this comment.
flip params and add a javadoc
| * The classic invocation mechanism reports a failure. | ||
| */ | ||
| @Test | ||
| public void testDeleteDirectory() throws Throwable { |
There was a problem hiding this comment.
expand detail in name directory does not happen
| public void testDeleteManyFiles() throws Throwable { | ||
| Path path = methodPath(); | ||
| final FileSystem fs = getFileSystem(); | ||
| final List<Path> files = createFiles(fs, path, 1, DELETE_FILE_COUNT, 0); |
There was a problem hiding this comment.
+add a local temp file to show mixing of filesystems
| assertSuccessfulBulkDelete(bulkDelete_delete(getFileSystem(), basePath, paths)); | ||
| } | ||
|
|
||
| public static List<String> stringList(List<Path> files) { |
There was a problem hiding this comment.
javadocs and explain why not .toURI()
| } | ||
|
|
||
| @Test | ||
| public void testDeleteManyFiles() throws Throwable { |
There was a problem hiding this comment.
comment that this is how it is expected to be used
| /** Is bulk delete enabled on hadoop runtimes with API support: {@value}. */ | ||
| public static final String ICEBERG_BULK_DELETE_ENABLED = "iceberg.hadoop.bulk.delete.enabled"; | ||
|
|
||
| private static final int DELETE_PAGE_SIZE = 3; |
There was a problem hiding this comment.
explain choice of this and the delete file count
| final FileSystem fs = getFileSystem(); | ||
| final List<Path> files = createFiles(fs, path, 1, DELETE_FILE_COUNT, 0); | ||
| try (HadoopFileIO fileIO = createFileIO()) { | ||
| fileIO.deleteFiles(stringList(files)); |
There was a problem hiding this comment.
add iostats assertions
| } | ||
|
|
||
| /** | ||
| * Get a file status value or, if the path doesn't exist, return null. |
There was a problem hiding this comment.
Perhaps an opportunity to return Optional<FileStatus>?
There was a problem hiding this comment.
done. Actually, maybe we should put that into some new IOFunctions class in the util.functional package, which we can use to make some of the basic IO ops stuff we can chain together. Presumably you have stuff which can go in there too
| basePath = path(getClass().getName()); | ||
| dynamicWrappedIO = new DynamicWrappedIO(); | ||
| pageSize = dynamicWrappedIO.bulkDelete_pageSize(fs, basePath); | ||
| fs.mkdirs(basePath); |
There was a problem hiding this comment.
Assert that mkdirs succeeds (returns true)?
| } | ||
|
|
||
| public static List<String> stringList(List<Path> files) { | ||
| return files.stream().map(p -> toString(p)).collect(Collectors.toList()); |
There was a problem hiding this comment.
Can this be reduced to:
return files.stream().map(Path::toString).collect(Collectors.toList());
...and then remove the static toString method below? It just seems to turn around and call toString() on the object.
|
Chris, not forgotten this. Need to get that iceberg pr in first, with this one showing we can do it. Though I would like to get that logging in to the next hadoop release, |
hadoop-tools/hadoop-aws/pom.xml
Outdated
|
|
||
| <!-- Adds a *test only* java 17 build profile--> | ||
| <profile> | ||
| <id>java-17-or-later</id> |
There was a problem hiding this comment.
| <id>java-17-or-later</id> | |
| <id>jdk17+</id> |
better to follow
https://github.com/apache/maven-apache-parent/blob/ce14a4641f0f21f98e51f564e9d290910ae75e7c/pom.xml#L500
| ## <a name="java17"></a> Java 17 Tests | ||
|
|
||
| This module includes a test source tree which compiles and runs on | ||
| Java 17+ _only_. This is to allow external libraries to be used |
There was a problem hiding this comment.
Iceberg 1.8 requires Java 11+, instead of Java 17+
There was a problem hiding this comment.
really? Even better. I do seem to have to switch to a java17 JVM for a build though, but maybe that's because I do that for parquet.
This means I'll be able to wire yetus up to CI this even before we've got that java17 image up
There was a problem hiding this comment.
actually going to add a special fileformat profile which is needed to explicitly include the format stuff, rather than do it by java version.
why so? stops another loop getting into the build. there's enough going on there with avro and hbase being hadoop dependencies...keeping format (regression) testing optional will keep release engineers happy. Or at least not unhappy with me
d37310c to
8d9201e
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
Add Iceberg core to the hadoop-aws test classpath. Iceberg is java17+ only, so this adds * A new test source path src/test/java17 * A new profile "java-17-or-later" which includes this and declares the dependency on iceberg-core. The new test is ITestIcebergBulkDelete; it is parameterized Iceberg bulk delete enabled/disabled and s3a multipart delete enabled/disabled. There is a superclass contract test org.apache.fs.test.formats.AbstractIcebergDeleteTest To support future stores which implement bulk delete. This is currently a minimal superclass; all tests are currently in ITestIcebergBulkDelete Change-Id: Ica8682f4efdd0cb04ca7f4762b2e47d396552910
Code in sync with PRs, more assertions, use other parts of API for coverage there too.
use listing API to understand it
All parameterized contract test setup is utterly broken. Need to think of a way to address that, probably something where contract setup is postponed
d2fef70 to
1a3d25f
Compare
|
💔 -1 overall
This message was automatically generated. |
|
tested with local build as this is now parameterized for junit5 and works, will now need to fix up the rests of the tests, which will need pulling d2aec90 to trunk, fixing up what was needed here (all setup/teardown to tag as beforeAll/afterAll even though superclass method already is) |
Tests are invoked iff "format-tests" option is set. this means that a java17 build doesn't require a compatible version of iceberg (or later, other formats) just to build. This avoids adding more loops in a from-scratch build; there are enough with Avro and HBase.
0830758 to
3a3e6f3
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
Forms initial iceberg page. To add * analytics stream * using s3 as the URLs
|
💔 -1 overall
This message was automatically generated. |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Add Iceberg core to the hadoop-aws test classpath to verify that it is correctly
invoked through the HadoopFileIO component when enabled in iceberg
Iceberg is java17+ only, so this adds
The new test is ITestIcebergBulkDelete; it is parameterized Iceberg bulk delete enabled/disabled and s3a multipart delete enabled/disabled.
There is a superclass contract test
This is to support future stores which implement bulk delete. This is currently a minimal superclass; all tests are currently in
ITestIcebergBulkDeleteHow was this patch tested?
Fun!
Once we have iceberg PR 10233 merged in, we can merge this. Until then it doesn't currently compile as for testing unless we move to DynMethods there (maybe I should)
Important: Yetus cannot test this yet!
Yetus doesn't have a java17 run, so the new test is not executed. Even if the Iceberg Jar was there
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?