Skip to content

Towards better Windows compatibility: TOOLS, DATA_TOOLS & documentation#2521

Merged
kevinkreiser merged 26 commits intovalhalla:masterfrom
gis-ops:2300-build-tools-windows
Aug 21, 2020
Merged

Towards better Windows compatibility: TOOLS, DATA_TOOLS & documentation#2521
kevinkreiser merged 26 commits intovalhalla:masterfrom
gis-ops:2300-build-tools-windows

Conversation

@nilsnolde
Copy link
Copy Markdown
Member

@nilsnolde nilsnolde commented Aug 10, 2020

#2300

This is the first bit to improve Windows compatibility for Valhalla with big thanks to @mattrjackson who did most of the changes:

  • update Lua syntax to 5.3 (include new bit operator instead of separate library from 5.2: http://www.lua.org/manual/5.3/manual.html#3.4.2) (misunderstanding on my part)
  • more OS specific path issues (which probably could be handles more gracefully than what I did)
  • build TOOLS & DATA_TOOLS in CI & test a few executables to prevent runtime regressions for Win

With these modifications, it's possible to run the valhalla_build_tiles.exe and the valhalla_run_* executables.

Tasklist

Let me know what you think of the tasks below...

  • Add #fixes with the issue number that this PR addresses
  • Generally use squash merge to rebase and clean comments before merging
  • dev documentation for Windows in /docs, short and crispy
  • build Python bindings on Windows (hopefully only a documentation item as well)
  • maybe: as long as no proper testing is possible on Azure, manually test some of the executables in the Win pipelines?
  • Update the changelog
  • make x86 Azure CI tests work
  • enable caching for installed vcpkg packages (too intense for now, see Towards better Windows compatibility: TOOLS, DATA_TOOLS & documentation #2521 (comment))

@purew
Copy link
Copy Markdown
Contributor

purew commented Aug 10, 2020

update Lua syntax to 5.3

I believe we're using LuaJIT for performance reasons which uses the 5.1 syntax?

Comment thread src/mjolnir/graphtilebuilder.cc Outdated
@danpat
Copy link
Copy Markdown
Member

danpat commented Aug 10, 2020

@nilsnolde @purew An early version of my LuaJIT PR did support multiple syntaxes (by re-writing the .lua file when it was converted to a string - it's a simple search/replace operation currently), but we scrapped that for simplicity and decided to just stick with the LuaJIT 5.1 syntax + extensions.

@nilsnolde From #2352, LuaJIT outperformed Lua5.3 by ~12% for the parse phase of valhalla_build_tiles - do you think the gains are significant enough to justify regressing performance there?

@nilsnolde
Copy link
Copy Markdown
Member Author

Thanks for the insight @danpat ! I found it a little confusing, but maybe you can shed some light on it: using LuaJIT means one can entirely get rid of Lua5.x libraries? I have both installed on Win, might explain why I have to manually tell cmake how to link the proper paths to include and lib (and likely doing it wrong on top of that). Then it might use Lua5.3 which is why it can't make sense of the bit.bor calls. I'll try that again on Wednesday, family day tmrw:)

@purew
Copy link
Copy Markdown
Contributor

purew commented Aug 10, 2020

using LuaJIT means one can entirely get rid of Lua5.x libraries?

You need luajit libraries. It's a completely different implementation of lua 5.1.

@nilsnolde
Copy link
Copy Markdown
Member Author

Right, it probably got confused since I had both installed, luajit and 5.3.

BTW, no Windows CI builds on PRs? Or are subsequent webhooks disabled when the first CI fails? (and I'll run format.sh, forgot about it)

@nilsnolde
Copy link
Copy Markdown
Member Author

@danpat @purew removing Lua5.3 from my packages helped! Lua back to previous state.

I also followed Kevin's advice and allowed an optional arg for GraphTile::Suffix, so it's quite a few changed files. Didn't touch the tests, they'd fail currently (if they'd run for Windows). For some other time:)

Not sure if the README addition is really needed. I'm rather considering writing a blog about it, being more detailed. In the current state a Windows build is pretty deducible from the Linux instructions (+ the CI). What do you think?

Once you're happy I'll properly prepare the PR. Was only working on Windows so far and don't wanna deal with why format.sh isn't working in the WSL.

Comment thread lua/graph.lua
@nilsnolde
Copy link
Copy Markdown
Member Author

sorry for all the bs commits! guess that'll need a squash..

@nilsnolde nilsnolde changed the title Towards better Windows compatibility: TOOLS, documentation & Python Towards better Windows compatibility: TOOLS, DATA_TOOLS & documentation Aug 14, 2020
@nilsnolde
Copy link
Copy Markdown
Member Author

nilsnolde commented Aug 14, 2020

Finally the Win CI builds work. Will do the same for appveyor then.

Before I finish this off:

  • quick (wasn't meant to sound rushing, just wanna have an approval before tidying up on linux) approval of the code changes pls
  • should we switch to latest VS 2019 in CI? Rather than extending the matrix with 2019?
  • how about "manually" running e.g. valhalla_build_tiles.exe & valhalla_run_route.exe after the build in CI with one of the demo OSM files? As long as test suite isn't running (would like to look into that as well).
  • took longer than I expected, so I'll save Python support for another day

@purew
Copy link
Copy Markdown
Contributor

purew commented Aug 14, 2020

  • quick approval of the code changes pls

Looks ok to me.

  • should we switch to latest VS 2019 in CI? Rather than extending the matrix with 2019?

Yes please!

  • how about "manually" running e.g. valhalla_build_tiles.exe & valhalla_run_route.exe after the build in CI with one of the demo OSM files? As long as test suite isn't running (would like to look into that as well).

Not sure what this means, almost none of us build valhalla on Windows and have no ability to do this?

@nilsnolde
Copy link
Copy Markdown
Member Author

Not sure what this means, almost none of us build valhalla on Windows and have no ability to do this?

I meant running a few TOOLS/DATA_TOOLS executables on CI after it built. Cheap testing, but wouldn't be much effort and prevent at least some runtime regressions for Win?

@purew
Copy link
Copy Markdown
Contributor

purew commented Aug 14, 2020

Not sure what this means, almost none of us build valhalla on Windows and have no ability to do this?

I meant running a few TOOLS/DATA_TOOLS executables on CI after it built. Cheap testing, but wouldn't be much effort and prevent at least some runtime regressions for Win?

Ohh yea definitely, that's a great idea.

@nilsnolde
Copy link
Copy Markdown
Member Author

Ok, should be good for review now.

Few more things:

  • I wanted to also "test" valhalla_build_admins.exe (never tried before on Win), but had trouble with spatialite: sqlite couldn't load the module. Tried for an hour (adding "spatialite.dll" as module name in the code; copying latest official spatialite DLLs which almost worked but errored on another spatialite dependency), didn't work. Smth wrong with the vcpkg installation of spatialite, but couldn't find any Github issues.
  • When I enabled the address sanitizer, the tests don't pass currently:
[FAIL] truckcost
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TruckCost
[ RUN      ] TruckCost.testTruckCostParams
[       OK ] TruckCost.testTruckCostParams (888 ms)
[----------] 1 test from TruckCost (888 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (889 ms total)
[  PASSED  ] 1 test.

=================================================================
==11043==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 57 byte(s) in 1 object(s) allocated from:
    #0 0x7f9222b9483a in __interceptor_realloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x7f9221d67ffb in __argz_add_sep (/usr/lib/libc.so.6+0x91ffb)
    #2 0x7f92fffffffe  (<unknown module>)

SUMMARY: AddressSanitizer: 57 byte(s) leaked in 1 allocation(s).
make[3]: *** [test/CMakeFiles/run-truckcost.dir/build.make:81: test/truckcost.log] Error 1
make[3]: *** Deleting file 'test/truckcost.log'
make[2]: *** [CMakeFiles/Makefile2:9069: test/CMakeFiles/run-truckcost.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:4897: test/CMakeFiles/check.dir/rule] Error 2
make: *** [Makefile:2052: check] Error 2

@nilsnolde nilsnolde marked this pull request as ready for review August 17, 2020 04:22
@nilsnolde
Copy link
Copy Markdown
Member Author

x86 build for win is still failing.. don't really understand why.. CMake is set to x86 & Release build, yet it's complaining that there's no config for Win32 & Debug build. Found a similar issue over at CMake.

Anyone got an idea (cc @mloskot @mattrjackson)? If not, I'll try a cross compile myself later this week and see if I can resolve. My Win disk is completely full already, 88 GB to work on a single project 💥

Will also add some caching on Azure once x86 works (almost 1 hour CI builds right now!): https://docs.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#get-started

@nilsnolde
Copy link
Copy Markdown
Member Author

nilsnolde commented Aug 17, 2020

sqlite couldn't load the [spatialite] module.

This is where it exits:

bool load_spatialite(sqlite3* db_handle) {
sqlite3_enable_load_extension(db_handle, 1);
// we do a bunch of failover for changes to the module file name over the years
for (const auto& mod_name : std::vector<std::string>{"mod_spatialite", "mod_spatialite.so",
"libspatialite", "libspatialite.so"}) {
std::string sql = "SELECT load_extension('" + mod_name + "')";
char* err_msg = nullptr;
if (sqlite3_exec(db_handle, sql.c_str(), nullptr, nullptr, &err_msg) == SQLITE_OK) {
LOG_INFO("SpatiaLite loaded as an extension");
return true;
} else {
LOG_WARN("load_extension() warning: " + std::string(err_msg));
sqlite3_free(err_msg);
}
}
LOG_ERROR("sqlite3 load_extension() failed to load spatialite module");
return false;
}

But this worked:

spatialite_init(0);

So, the code knows where to find spatialite but sqlite doesn't. I tried sorts of mod_names (DLL is named spatialite.dll on Win), even specifying the whole path, which works on SQLite command line (though a different SQLite installation than the one linked here). I have no idea why it doesn't find spatialite.. Someone got a tip what I could try?

@SvetlanaBayarovich
Copy link
Copy Markdown
Contributor

@nilsnolde As for the sanitizer question, this looks like a known issue with Leak Sanitizer (which is a part of Address Sanitizer): see the #2296(comment). Due to this Leak Sanitizer is disabled in CI. You can also disable it while testing locally, e.g. ASAN_OPTIONS=detect_leaks=0 make check

@mloskot
Copy link
Copy Markdown
Collaborator

mloskot commented Aug 18, 2020

x86 build for win is still failing.. don't really understand why.. CMake is set to x86 & Release build, yet it's complaining that there's no config for Win32 & Debug build.
Anyone got an idea (cc @mloskot @mattrjackson)?

The problem is in the CMake command line(s) in this PR: cmake -A %PLATFORM% translates to cmake -A x86 which is wrong.
This should read cmake -A Win32 as per allowed values for -A and the VS generator.

Comment thread .azure-pipelines.yml Outdated
@nilsnolde
Copy link
Copy Markdown
Member Author

Good for merge from my side @kevinkreiser @purew

Caching on Azure with vcpkg is still painful.. which is mostly the vcpkg part as there's no preconfigured task for it, we'd have to bake our own (which needs privileges on Azure) and follow this one: https://github.com/lukka/CppBuildTasks#developers-information

@mloskot
Copy link
Copy Markdown
Collaborator

mloskot commented Aug 19, 2020

Thanks @nilsnolde for the great set of improvements.

I used to get excited and configured all free CI services (AzP, AV, Travis, CircleCI) for projects I maintain, but that eventually becomes pain to maintain. Then, the disrupting GitHub Actions arrived and IMHO we should consider complete switch to GA as single CI with jobs for the big three: Linux, Windows and macOS.

@nilsnolde
Copy link
Copy Markdown
Member Author

Yeah, I also loathe all the external CI stuff by now.. For internal stuff we keep using Gitlab with its amazing infrastructure, but the lack of SEO prevents public repos unfortunately. Didn't try Github Actions yet, but I'd be all in favor of ditching external CI.

Now that I got you here @mloskot, did you ever encounter this on Win with spatialite? #2521 (comment)

@mloskot
Copy link
Copy Markdown
Collaborator

mloskot commented Aug 19, 2020

Didn't try Github Actions yet, but I'd be all in favor of ditching external CI.

I am moving home during next weeks, and GSoC is finalising, so I won't be able to do anything soon, but I will keep this task in mind for the autumn.

Now that I got you here @mloskot, did you ever encounter this on Win with spatialite?

I have seen this issue, but it did not ring any bells. I may be wrong, can't verify it now, but I think did not build Valhalla with SQLite support.

Comment thread src/baldr/graphtile.cc
struct url_facet : dir_facet {
protected:
virtual char do_thousands_sep() const {
return '/';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also need the do_grouping function that is what splits it every 3 digits

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it inherited from dir_facet?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. yeah, just read up quickly about virtual inheritance and seems like url_facet inherits do_grouping from the base class numpunct here (?!).. I'll add that. curious that the tests wouldn't fail then..

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my eyes completely missed that you inherited from dir_facet! im sorry that would work and explain the passing tests... sorry for the stupid mistake there!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries:) I also misunderstood what virtual inheritance is and actually already did what you asked. Anyways, should we just stick with the way it is now? The azure build time problem is a show-stopper right now anyways I think?!

@nilsnolde
Copy link
Copy Markdown
Member Author

Oh crap.. In the last run, we were running out of Azure CI build time.. build times are very close to 1 hour now and x64 must've just passed that limit.. damn slow windows builds.. not sure how to speed this up without losing the additional TOOLS & DATA_TOOLS..

any ideas?

@kevinkreiser
Copy link
Copy Markdown
Member

@nilsnolde one idea is to run some unit tests instead of building utrecht. we have a bunch of unit tests that exorcise tile building etc notably the gurka ones. perhaps you could pick one interesting one and run that? it might be cheaper than all of utrecht?

the only way this is possible on our linux CI is we use ccache. i googled a while back about on windows but it wasnt clear if its possible or what state it is in.

@kevinkreiser kevinkreiser merged commit 9e9eab1 into valhalla:master Aug 21, 2020
@nilsnolde
Copy link
Copy Markdown
Member Author

@kevinkreiser all "tests" (running some exes) take less than a minute actually, incl building all of Utrecht.

Cache is an option on Azure too. Looks like I overlooked a few things last time I looked at it. With this config caching vcpkg modules should work (and shave off ~25 mins when cache hits). Couple of caveats:

  • we could (should?) add vcpkg as a submodule to Valhalla (packages available there are not explicitly versioned, rather follow the vcpkg version)
  • even more files need to be added to version control: vcpkg_list_x64.txt & CMakeSettings.json
  • if the cache doesn't hit (based on hashing vcpkg_list_x64.txt & CMakeSettings.json) and needs to be rebuilt, it'll cross the 1 h limit again maybe

Github Actions seems to be a LOT more relaxed with their OSS plans: https://docs.github.com/en/actions/getting-started-with-github-actions/about-github-actions#usage-limits

@kevinkreiser
Copy link
Copy Markdown
Member

@nilsnolde

The caching of the dependencies is a great idea. We avoid this on linux using docker, but if there is a good mechanism for windows we should definitely do that. After that it would be great for us to cache the actual build contents like ccache does. If we do those two things (or even one to start) we should be in the clear.

Github actions is also fine with me if you can find the time to look at it. I just dont have the time at the moment.

@nilsnolde
Copy link
Copy Markdown
Member Author

Ok, let me look at the deps cache first. Should be easy enough. Not sure I understand why caching the build contents is interesting, but I'm not the build (tools) expert..

Ccache is available for Win, but not for Visual Studio compiler it seems. Phuu, that's really exhausting.. fingers crossed Win is based on linux soon 🤞

@mloskot
Copy link
Copy Markdown
Collaborator

mloskot commented Aug 21, 2020

we could (should?) add vcpkg as a submodule to Valhalla (packages available there are not explicitly versioned, rather follow the vcpkg version)

I think it's easier to track working vcpkg state in CI configuration like it's done now

# TODO: check if https://github.com/microsoft/vcpkg/issues/12606 resolves regularly
git -C %VCPKG_DIR% checkout f4bd6423

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants