Add Linux x86-64bits native method to retrieve the number of allocated bytes on disk for a file#80437
Conversation
…d bytes on disk for a file Follow up elastic#79698
63eff2b to
49901c3
Compare
49901c3 to
0803d10
Compare
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
@elasticmachine run elasticsearch-ci/part-2 |
|
@DaveCTurner I'd appreciate if you find the time to look at this :) Otherwise I can ask someone else for sure. |
DaveCTurner
left a comment
There was a problem hiding this comment.
Sorry for the delay here @tlrx, I left a few comments.
| } catch (UnsatisfiedLinkError ue0) { | ||
| try { | ||
| logger.debug("trying to register XStat64Method..."); | ||
| Native.register(XStat64Method.class, Platform.C_LIBRARY_NAME); |
There was a problem hiding this comment.
Could you add some comments on when these fallbacks are necessary? My reading of https://linux.die.net/man/2/stat64 is that stat() alone is enough, but if it isn't then I suspect stat() doesn't return something laid out as a Stat64 structure.
There was a problem hiding this comment.
Oh good grief this is just cursed. stat() in glibc is actually a macro, not a real function:
$ nm -AD /lib/x86_64-linux-gnu/libc.so.6 | grep 'stat$'
/lib/x86_64-linux-gnu/libc.so.6:00000000000f6d20 T __fxstat
/lib/x86_64-linux-gnu/libc.so.6:0000000000078b70 T _IO_file_stat
/lib/x86_64-linux-gnu/libc.so.6:00000000000f6d70 T __lxstat
/lib/x86_64-linux-gnu/libc.so.6:00000000001057b0 T ustat
/lib/x86_64-linux-gnu/libc.so.6:00000000000f6cd0 T __xstat
I think the macro might also reorganise the layout of the struct stat to be different from the one that __xstat populates. Here we're relying on the ABI of the secret __xstat function instead. How sure can we be that this ABI won't change?
There was a problem hiding this comment.
Libc uses symbol versioning so it would be more reliable to call __xstat@GLIBC_2.2.5. I don't know if specifying a symbol version is actually possible in JNA. Maybe FunctionMapper can cope with the full symbol name?
There was a problem hiding this comment.
@DaveCTurner Yeah, this is not ideal, but kinda unavoidable if we don't write our own C code. In general, such usage leads to me think that a small lightweight native library in ES could be worth the effort - since it would resolve many of these kinda issues ( and these issues have the potential to be significant and fail in catastrophic ways ).
| public long reserved3; // __syscall_slong_t | ||
|
|
||
| @Override | ||
| protected List<String> getFieldOrder() { |
There was a problem hiding this comment.
JNA apparently prefers the class-level annotation @Structure.FieldOrder({"st_dev", "st_ino", ...}) these days. Not sure why we don't do this elsewhere, maybe it's a relatively new feature.
| public long reserved1; // __syscall_slong_t | ||
| public long reserved2; // __syscall_slong_t | ||
| public long reserved3; // __syscall_slong_t |
There was a problem hiding this comment.
I don't think these reserved fields exist in the libc interface stat(), although they might be there in the syscall interface that it wraps.
There was a problem hiding this comment.
Actually I'm not sure what on earth is going on any more. I thought we were implementing glibc's struct stat as defined here but that seems to be completely different from what you get if you #include <sys/stat.h>.
There was a problem hiding this comment.
Ok, on my machine it's actually coming from here. It's cursed:
#ifndef __x86_64__
__mode_t st_mode; /* File mode. */
__nlink_t st_nlink; /* Link count. */
#else
__nlink_t st_nlink; /* Link count. */
__mode_t st_mode; /* File mode. */
#endif
| * | ||
| * https://linux.die.net/man/3/strerror | ||
| */ | ||
| private static native String strerror(int errno); |
| assert method == StatMethod.class; | ||
| rc = StatMethod.stat(path.toString(), stats); | ||
| } | ||
| if (rc != 0) { |
There was a problem hiding this comment.
I think we should declare the native methods as throws LastErrorException in order to get a proper exception with an appropriate error message, rather than just logging the numeric error.
There was a problem hiding this comment.
I did not know about throws LastErrorException, thanks
There was a problem hiding this comment.
and apparently throwing an exception is preferred since it has better performance.
| long nbBlocks = (cacheFile.getLength() / blockSize); | ||
| if (cacheFile.getLength() % blockSize > 0L) { | ||
| nbBlocks += 1L; | ||
| } |
There was a problem hiding this comment.
Suggested oneliner:
| long nbBlocks = (cacheFile.getLength() / blockSize); | |
| if (cacheFile.getLength() % blockSize > 0L) { | |
| nbBlocks += 1L; | |
| } | |
| final long nbBlocks = (cacheFile.getLength() + blockSize - 1) / blockSize; // ceil(cacheFile.getLength() / blockSize) |
| public int st_mode; // mode_t | ||
| public int st_uid; // uid_t | ||
| public int st_gid; // gid_t | ||
| public int pad0; // int |
There was a problem hiding this comment.
Oof bet it was fun to discover that this was needed...
There was a problem hiding this comment.
It came from a struct stat64 but now I can't find it anymore.
|
Tests are green but let's have more runs @elasticmachine run elasticsearch-ci/part-2 |
| @Structure.FieldOrder( | ||
| { | ||
| "st_dev", | ||
| "st_ino", | ||
| "st_mode", | ||
| "st_nlink", | ||
| "st_uid", | ||
| "st_gid", | ||
| "st_rdev", | ||
| "st_size", | ||
| "st_blksize", | ||
| "st_blocks", | ||
| "st_atim", | ||
| "st_mtim", | ||
| "st_ctim", | ||
| "st_atim_tv_sec", | ||
| "st_mtim_tv_sec", | ||
| "st_ctim_tv_sec" } | ||
| ) | ||
| public static class Stat extends Structure { | ||
|
|
||
| public long st_dev; // __dev_t st_dev; /* Device. */ | ||
| public long st_ino; // __ino_t st_ino; /* File serial number. */ | ||
| public int st_mode; // __mode_t st_mode; /* File mode. */ | ||
| public long st_nlink; // __nlink_t st_nlink; /* Link count. */ | ||
| public int st_uid; // __uid_t st_uid; /* User ID of the file's owner. */ | ||
| public int st_gid; // __gid_t st_gid; /* Group ID of the file's group. */ | ||
| public long st_rdev; // __dev_t st_rdev; /* Device number, if device. */ | ||
| public long st_size; // __off_t st_size; /* Size of file, in bytes. */ | ||
| public long st_blksize; // __blksize_t st_blksize; /* Optimal block size for I/O. */ | ||
| public long st_blocks; // __blkcnt_t st_blocks; /* Number 512-byte blocks allocated. */ | ||
| public Time st_atim; // struct timespec st_atim; /* Time of last access. */ | ||
| public Time st_mtim; // struct timespec st_mtim; /* Time of last modification. */ | ||
| public Time st_ctim; // struct timespec st_ctim; /* Time of last status change. */ | ||
| public long st_atim_tv_sec; // backward compatibility | ||
| public long st_mtim_tv_sec; // backward compatibility | ||
| public long st_ctim_tv_sec; // backward compatibility |
There was a problem hiding this comment.
I suspect this puts st_blocks in the right place but it doesn't match the struct stat that my system sees:
$ uname -a
Linux worker-772973 4.15.0-120-generic #122~16.04.1-Ubuntu SMP Wed Sep 30 15:09:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ ldd --version
ldd (Ubuntu GLIBC 2.23-0ubuntu11.3) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
$ echo "#include <sys/stat.h>" | gcc -xc - -E -dD | grep -ve '^$' | grep -A23 '^struct stat'
struct stat
{
__dev_t st_dev;
__ino_t st_ino;
__nlink_t st_nlink;
__mode_t st_mode;
__uid_t st_uid;
__gid_t st_gid;
int __pad0;
__dev_t st_rdev;
__off_t st_size;
__blksize_t st_blksize;
__blkcnt_t st_blocks;
# 91 "/usr/include/x86_64-linux-gnu/bits/stat.h" 3 4
struct timespec st_atim;
struct timespec st_mtim;
struct timespec st_ctim;
#define st_atime st_atim.tv_sec
#define st_mtime st_mtim.tv_sec
#define st_ctime st_ctim.tv_sec
# 106 "/usr/include/x86_64-linux-gnu/bits/stat.h" 3 4
__syscall_slong_t __glibc_reserved[3];
# 115 "/usr/include/x86_64-linux-gnu/bits/stat.h" 3 4
};
In particular, st_mode and st_nlink are the other way around, there's 4 bytes of padding between st_gid and st_rdev, and the last three long fields are apparently unrelated to time.
Is this how some other x86_64 GNU/Linux system declares its struct stat? If so, I think we need to understand the differences.
(might also be worth recording the command echo "#include <sys/stat.h>" | gcc -xc - -E -dD | grep -ve '^$' | grep -A23 '^struct stat' in a comment so that future readers can reproduce this analysis)
There was a problem hiding this comment.
Yes, I'm just trying to make sense of all of this, given the number of indirection each system can have given the compiling options that were provided.
There was a problem hiding this comment.
Ok, I think I finally managed to link the struct stat compiled on my system (and yours in a different GLIBC version) with the glibc source code. The way the glibc source includes header files depending of the platform is puzzling and there was more recent changes that confused me even more.
I updated the code to finally match what is expected by our systems. I also add documentation to link to the command you provided as well as the source files for the two glibc versions we analyzed.
Hopefully we should be in a better place now 🤞🏻
DaveCTurner
left a comment
There was a problem hiding this comment.
LGTM (mod checkstyle pickiness) - thanks for the extra iterations @tlrx
|
Thanks @DaveCTurner and @arteam ! I'm going to trigger more tests here before merging - just in case. |
|
@elasticmachine run elasticsearch-ci/part-2 |
1 similar comment
|
@elasticmachine run elasticsearch-ci/part-2 |
|
🚀 |
|
Let's see if CI complains 😬 |
This pull request introduces a new native method for Linux 64bits platforms to retrieve the number of bytes allocated on disk for a given sparse file. It tries to call the native methods
stat64,__xstat64orstat.Follow up #79698