mmap() - backed istream implementation#150
Conversation
|
My comment about Its probably not as important and may actually help here since you do read most of the data in the mmap region, at least the weights, only once (it does seem like it scans through the file a few times skipping weights to do the name-based lookups, but that's not a huge deal) I'm also not sure much is gained by creating an Generally the advantages of |
|
Thank you for sending us this change! This change would result in an 18% speedup of loading in the average case, however what @apage43 says is true. It doesn't make sense to use mmap() as a read() replacement, because the true power of mmap() is it gives us the ability to not read the file at all. I've just written a blog post for a command line utility I wrote here: https://justine.lol/rusage/ which uses what we've done here as an example use case, and at the bottom adds further clarity to these points. We need to go back to the drawing board on this. An mmap() change should reduce startup latency not by 18%, but by 100%. In order to make that change, we need to find a way to have the tensor data structures that exist on disk be the same as what llama.cpp needs at runtime in memory. I propose that we discuss our ideas for how we might do this in #91. Thank you again for your contribution. I hope you'll continue to keep working with us on the issue, while we figure this out! |
* iq4_ks_r4: Zen4 * iq4_ks_r4: AVX2 * iq4_ks_r4: WIP * iq4_ks_r4: slightly better Zen4 * iq4_ks_r4: slightly better Zen4 * iq4_ks_r4: NEON * Minor --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This is for issue #91.
Treat this as a first draft. There are definitely some thing that need to be changed and will be changed shortly.
I have not benchmarked. I'm submitting the PR early to get feedback.
Questions/Observations for discussion:
llama_istreambe a part ofutils.cpp?MADV_SEQUENTIALis probably wrong, as brought up in Should usemmapfor model loading #91. I'll benchmark to make sure, then fix that before it's merged.llama_istream, the intent was that you wouldn't have to checkUSE_MMAPat the site where it's constructed. Reduce model loading time #43 throws a wrench in that. I could create a quick wrapper class, rather than usingusing. It might be a little confusing when people merge though.std::istream's copy constructor is explicitly deleted. So I can't get it out of a function except as an rvalue as the return value (or a global variable which I think would be worse), and the convention elsewhere is that the return value is for the error.istreamwas WAY more painful than I expected. For some reason, if I don't reimplementseekoff()andseekpos()this way,tellg()seems to return garbage. Couldn't tell you why. Nor can I find any hint of why on the internet. The signature is also very finicky. If you have any insight into why it needs to be exactly this way, let me know.mmap()arguments to use. If you have a better suggestion, let me know.