Skip to content

Add Linux x86-64bits native method to retrieve the number of allocated bytes on disk for a file#80437

Merged
tlrx merged 18 commits intoelastic:masterfrom
tlrx:linux64b-native-method
Dec 6, 2021
Merged

Add Linux x86-64bits native method to retrieve the number of allocated bytes on disk for a file#80437
tlrx merged 18 commits intoelastic:masterfrom
tlrx:linux64b-native-method

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Nov 5, 2021

This pull request introduces a new native method for Linux 64bits platforms to retrieve the number of bytes allocated on disk for a given sparse file. It tries to call the native methods stat64, __xstat64 or stat.

Follow up #79698

@tlrx tlrx added >enhancement :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.16.1 v8.1.0 labels Nov 5, 2021
@tlrx tlrx force-pushed the linux64b-native-method branch 3 times, most recently from 63eff2b to 49901c3 Compare November 8, 2021 15:33
@tlrx tlrx force-pushed the linux64b-native-method branch from 49901c3 to 0803d10 Compare November 8, 2021 16:32
@tlrx tlrx requested review from DaveCTurner and arteam November 9, 2021 09:30
@tlrx tlrx marked this pull request as ready for review November 9, 2021 09:30
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Nov 9, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Contributor

@arteam arteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Nov 30, 2021

@elasticmachine run elasticsearch-ci/part-2

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Nov 30, 2021

@DaveCTurner I'd appreciate if you find the time to look at this :) Otherwise I can ask someone else for sure.

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here @tlrx, I left a few comments.

} catch (UnsatisfiedLinkError ue0) {
try {
logger.debug("trying to register XStat64Method...");
Native.register(XStat64Method.class, Platform.C_LIBRARY_NAME);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments on when these fallbacks are necessary? My reading of https://linux.die.net/man/2/stat64 is that stat() alone is enough, but if it isn't then I suspect stat() doesn't return something laid out as a Stat64 structure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good grief this is just cursed. stat() in glibc is actually a macro, not a real function:

$ nm -AD /lib/x86_64-linux-gnu/libc.so.6 | grep 'stat$'
/lib/x86_64-linux-gnu/libc.so.6:00000000000f6d20 T __fxstat
/lib/x86_64-linux-gnu/libc.so.6:0000000000078b70 T _IO_file_stat
/lib/x86_64-linux-gnu/libc.so.6:00000000000f6d70 T __lxstat
/lib/x86_64-linux-gnu/libc.so.6:00000000001057b0 T ustat
/lib/x86_64-linux-gnu/libc.so.6:00000000000f6cd0 T __xstat

I think the macro might also reorganise the layout of the struct stat to be different from the one that __xstat populates. Here we're relying on the ABI of the secret __xstat function instead. How sure can we be that this ABI won't change?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Libc uses symbol versioning so it would be more reliable to call __xstat@GLIBC_2.2.5. I don't know if specifying a symbol version is actually possible in JNA. Maybe FunctionMapper can cope with the full symbol name?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @ChrisHegarty: yet more JNA-related risks ^

Copy link
Copy Markdown
Contributor

@ChrisHegarty ChrisHegarty Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner Yeah, this is not ideal, but kinda unavoidable if we don't write our own C code. In general, such usage leads to me think that a small lightweight native library in ES could be worth the effort - since it would resolve many of these kinda issues ( and these issues have the potential to be significant and fail in catastrophic ways ).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I reopened #80903 because of this

public long reserved3; // __syscall_slong_t

@Override
protected List<String> getFieldOrder() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JNA apparently prefers the class-level annotation @Structure.FieldOrder({"st_dev", "st_ino", ...}) these days. Not sure why we don't do this elsewhere, maybe it's a relatively new feature.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks

Comment on lines +161 to +163
public long reserved1; // __syscall_slong_t
public long reserved2; // __syscall_slong_t
public long reserved3; // __syscall_slong_t
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these reserved fields exist in the libc interface stat(), although they might be there in the syscall interface that it wraps.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not sure what on earth is going on any more. I thought we were implementing glibc's struct stat as defined here but that seems to be completely different from what you get if you #include <sys/stat.h>.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, on my machine it's actually coming from here. It's cursed:

#ifndef __x86_64__
    __mode_t st_mode;                   /* File mode.  */
    __nlink_t st_nlink;                 /* Link count.  */
#else
    __nlink_t st_nlink;         /* Link count.  */
    __mode_t st_mode;           /* File mode.  */
#endif

*
* https://linux.die.net/man/3/strerror
*/
private static native String strerror(int errno);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unused.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

assert method == StatMethod.class;
rc = StatMethod.stat(path.toString(), stats);
}
if (rc != 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should declare the native methods as throws LastErrorException in order to get a proper exception with an appropriate error message, rather than just logging the numeric error.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know about throws LastErrorException, thanks

Copy link
Copy Markdown
Member Author

@tlrx tlrx Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and apparently throwing an exception is preferred since it has better performance.

Comment on lines +459 to +462
long nbBlocks = (cacheFile.getLength() / blockSize);
if (cacheFile.getLength() % blockSize > 0L) {
nbBlocks += 1L;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested oneliner:

Suggested change
long nbBlocks = (cacheFile.getLength() / blockSize);
if (cacheFile.getLength() % blockSize > 0L) {
nbBlocks += 1L;
}
final long nbBlocks = (cacheFile.getLength() + blockSize - 1) / blockSize; // ceil(cacheFile.getLength() / blockSize)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

public int st_mode; // mode_t
public int st_uid; // uid_t
public int st_gid; // gid_t
public int pad0; // int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof bet it was fun to discover that this was needed...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It came from a struct stat64 but now I can't find it anymore.

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Dec 2, 2021

Tests are green but let's have more runs

@elasticmachine run elasticsearch-ci/part-2

Comment on lines +89 to +125
@Structure.FieldOrder(
{
"st_dev",
"st_ino",
"st_mode",
"st_nlink",
"st_uid",
"st_gid",
"st_rdev",
"st_size",
"st_blksize",
"st_blocks",
"st_atim",
"st_mtim",
"st_ctim",
"st_atim_tv_sec",
"st_mtim_tv_sec",
"st_ctim_tv_sec" }
)
public static class Stat extends Structure {

public long st_dev; // __dev_t st_dev; /* Device. */
public long st_ino; // __ino_t st_ino; /* File serial number. */
public int st_mode; // __mode_t st_mode; /* File mode. */
public long st_nlink; // __nlink_t st_nlink; /* Link count. */
public int st_uid; // __uid_t st_uid; /* User ID of the file's owner. */
public int st_gid; // __gid_t st_gid; /* Group ID of the file's group. */
public long st_rdev; // __dev_t st_rdev; /* Device number, if device. */
public long st_size; // __off_t st_size; /* Size of file, in bytes. */
public long st_blksize; // __blksize_t st_blksize; /* Optimal block size for I/O. */
public long st_blocks; // __blkcnt_t st_blocks; /* Number 512-byte blocks allocated. */
public Time st_atim; // struct timespec st_atim; /* Time of last access. */
public Time st_mtim; // struct timespec st_mtim; /* Time of last modification. */
public Time st_ctim; // struct timespec st_ctim; /* Time of last status change. */
public long st_atim_tv_sec; // backward compatibility
public long st_mtim_tv_sec; // backward compatibility
public long st_ctim_tv_sec; // backward compatibility
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this puts st_blocks in the right place but it doesn't match the struct stat that my system sees:

$ uname -a
Linux worker-772973 4.15.0-120-generic #122~16.04.1-Ubuntu SMP Wed Sep 30 15:09:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ ldd --version
ldd (Ubuntu GLIBC 2.23-0ubuntu11.3) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
$ echo "#include <sys/stat.h>" | gcc -xc - -E -dD | grep -ve '^$' | grep -A23 '^struct stat'
struct stat
  {
    __dev_t st_dev;
    __ino_t st_ino;
    __nlink_t st_nlink;
    __mode_t st_mode;
    __uid_t st_uid;
    __gid_t st_gid;
    int __pad0;
    __dev_t st_rdev;
    __off_t st_size;
    __blksize_t st_blksize;
    __blkcnt_t st_blocks;
# 91 "/usr/include/x86_64-linux-gnu/bits/stat.h" 3 4
    struct timespec st_atim;
    struct timespec st_mtim;
    struct timespec st_ctim;
#define st_atime st_atim.tv_sec
#define st_mtime st_mtim.tv_sec
#define st_ctime st_ctim.tv_sec
# 106 "/usr/include/x86_64-linux-gnu/bits/stat.h" 3 4
    __syscall_slong_t __glibc_reserved[3];
# 115 "/usr/include/x86_64-linux-gnu/bits/stat.h" 3 4
  };

In particular, st_mode and st_nlink are the other way around, there's 4 bytes of padding between st_gid and st_rdev, and the last three long fields are apparently unrelated to time.

Is this how some other x86_64 GNU/Linux system declares its struct stat? If so, I think we need to understand the differences.

(might also be worth recording the command echo "#include <sys/stat.h>" | gcc -xc - -E -dD | grep -ve '^$' | grep -A23 '^struct stat' in a comment so that future readers can reproduce this analysis)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm just trying to make sense of all of this, given the number of indirection each system can have given the compiling options that were provided.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I finally managed to link the struct stat compiled on my system (and yours in a different GLIBC version) with the glibc source code. The way the glibc source includes header files depending of the platform is puzzling and there was more recent changes that confused me even more.

I updated the code to finally match what is expected by our systems. I also add documentation to link to the command you provided as well as the source files for the two glibc versions we analyzed.

Hopefully we should be in a better place now 🤞🏻

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (mod checkstyle pickiness) - thanks for the extra iterations @tlrx

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Dec 3, 2021

Thanks @DaveCTurner and @arteam ! I'm going to trigger more tests here before merging - just in case.

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Dec 3, 2021

@elasticmachine run elasticsearch-ci/part-2

1 similar comment
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Dec 6, 2021

@elasticmachine run elasticsearch-ci/part-2

@tlrx tlrx merged commit 0c9ba49 into elastic:master Dec 6, 2021
@tlrx tlrx deleted the linux64b-native-method branch December 6, 2021 08:40
@DaveCTurner
Copy link
Copy Markdown
Member

🚀

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Dec 6, 2021

Let's see if CI complains 😬

tlrx added a commit that referenced this pull request Dec 7, 2021
…ectures (#81376)

The native method added in #80437 requires some adjustments 
to work on aarch64 architectures, which hopefully this change 
addresses. The stat version has been tested on AWS 64-bit 
ARM instances (on ubuntu-18.04 and on Amazon Linux 2).

Closes #81362
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team. v8.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants