Skip to content

Bugfixes and Tests for nvcc#193

Closed
seeraven wants to merge 1 commit intoccache:masterfrom
seeraven:bugfix/nvcc_without_preprocessed_input
Closed

Bugfixes and Tests for nvcc#193
seeraven wants to merge 1 commit intoccache:masterfrom
seeraven:bugfix/nvcc_without_preprocessed_input

Conversation

@seeraven
Copy link
Contributor

@seeraven seeraven commented Aug 27, 2017

This pull request contains two bugfixes:

  • nvcc can not use preprocessed input. If the option run_second_cpp is set to false, ccache falls back to the behaviour of run_second_cpp=true for the cuda compiler.
  • Unhandled command line options with directories or files caused ccache to fail and fall back to calling nvcc directly without caching. The command line options --compiler-bindir/-cdir, --output-directory/-odir and --libdevice-directory/-ldir are now handled by ccache. This should fix Support nvcc (CUDA) -ccbin/--compiler-bindir option #144 .

In addition, tests for nvcc were added by adding two more travis jobs and extending the test.sh script.

Copy link

@lukeyeager lukeyeager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working for me!

#145 is a sufficient solution for me (I've been using it for months), but this is obviously much more complete. Hopefully the tests will be enough to get this approved for merge!

@lukeyeager
Copy link

/cc @colesbury

@cbeck88
Copy link

cbeck88 commented Nov 15, 2017

We are now using this patch in tesla autopilot firmware builds.

Previously, we could not use ccache with any of our (numerous) cuda targets. Thank you!

@jrosdahl jrosdahl added the issue: feature New feature label Jan 9, 2018
Copy link
Member

@jrosdahl jrosdahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising. Thanks for including lots of tests!

However, I have trouble applying the commits to current master, mainly due to the spaces→tabs changes in test.sh in the first commit. Could you please squash the commits into one that only contains the actual changes you propose?

@@ -0,0 +1,29 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the file only is meant to be sourced: remove the shebang and make it non-executable?

ccache.c Outdated
// Alternate form of -o with no space.
if (str_startswith(argv[i], "-o")) {
// Alternate form of -o with no space. Ignore nvcc -odir.
if (str_startswith(argv[i], "-o") && !str_eq(argv[i], "-odir")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So

ccache gcc dir.c -odir

will no longer work, right? That's not acceptable. Perhaps you can detect that nvcc is being used and only then ignore -odir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it and added a test for it. Actually, the option -optf caused the same problem, so I fixed that as well. Since nvcc can't handle the -o option without space, that part is not ignored if nvcc was detected.

ccache.c Outdated
goto out;
}

if (!conf->run_second_cpp && str_eq(actual_language,"cuda")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a space after the comma.


expect_different_files() {
if [ ! -e "$1" ]; then
test_failed "compare_files: $1 missing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use tabs in shell scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were no tabs in it. Perhaps from the first commit? I've squashed everything and rebased it on master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There were lots of space→tab indent changes in the first commit which then were changed back in a subsequent commit.)

@seeraven seeraven force-pushed the bugfix/nvcc_without_preprocessed_input branch 3 times, most recently from fc4c457 to d58c324 Compare January 14, 2018 12:52
@seeraven
Copy link
Contributor Author

The following changes were made:

  • Squashed the commit.
  • Rebased on master.
  • The -o option without a separating space is ignored for nvcc. This resolves the -odir bug when used with gcc.
  • The already existing option -optf is also only parsed when used with nvcc.
  • Added another test to ensure the commands gcc -optf test.c and gcc -odir test.c work as expected, that is they create the files ptf resp. dir.
  • Removed the combination "clang with cuda" from the .travis.yml file. Travis uses now clang 5.0 which nvcc does not support (even with CUDA 9). A configuration with clang 4.x could be added, but it would probably require rewriting the .travis.yml file and putting everything in the matrix section.

@jrosdahl
Copy link
Member

Thanks for the updates! Looks mostly good.

Some questions and comments:

  1. The nvcc test suites are painfully slow. Would it be possible to speed them up? Otherwise I'm afraid they mostly will be seen as an annoyance... Isn't there a way of compiling a "hello world" type program that is more or less instantaneous?
  2. $compiler_location in test.sh could point to a masquerading ccache (see the uncached_compile function), and if so, the "option --compiler-bindir" test will fail. Also, $compiler_version isn't supposed to be used by tests (hence the lowercase name). However, I've improved things on master now (see 2b045cb), so please update to use $REAL_COMPILER instead.
  3. I get the following error locally:
    Compiler version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
    CUDA compiler:    Cuda compilation tools, release 7.5, V7.5.17 (/usr/bin/nvcc)
    
    Running test suite nvcc.......sh: 1: cicc: not found
    cuobjdump fatal   : Could not open input file 'reference_test1.o'
    sh: 1: cicc: not found
    
    FAILED
    
    Test suite:     nvcc
    Test case:      option --libdevice-directory
    Failure reason: Expected "cache miss" to be 1, actual 0
    
    Any ideas? I'm using Ubuntu 16.04 and the nvidia-cuda-toolkit package contains /usr/lib/nvidia-cuda-toolkit/bin/cicc.

@seeraven seeraven force-pushed the bugfix/nvcc_without_preprocessed_input branch from d58c324 to 5bc80ef Compare January 24, 2018 09:58
@seeraven
Copy link
Contributor Author

Update:

I've rebased on master and modified the test.sh script to incorporate your points 1 and 2. nvcc is painfully slow when compiling cuda code. I've modified to use the c++ mode wherever I can to gain speed. However, a few tests must remain using cuda code because that is what we need to test at least once.

Currently, travis is unable to complete the tests because of an error in the pch test suite. I've checked that the same error occurs on the last commit on master, so it wasn't introduced by my latest modifications in the nvcc test suite.

Regarding the last point, I've tested with a travis configuration to ensure supporting 7.5 using the official nvidia repos - that worked. I'll assume you installed the canonical package (because 7.5 is not provided for 16.04 by NVidia). I'll try to reproduce the error with the canonical package, I'll assume the directories used in the failed test are not available. In the updated test.sh I've added a check to ensure the directories used in the test exist.

@seeraven seeraven force-pushed the bugfix/nvcc_without_preprocessed_input branch from 5bc80ef to ab3c026 Compare January 24, 2018 11:35
Feature: Added cuda compiler in separate travis job and implemented
         tests for nvcc.

Feature: Added support for nvcc compiler options --compiler-bindir/-ccbin,
         --output-directory/-odir and --libdevice-directory/-ldir.
         Added tests for these options in test.sh.

Bugfix:  Original patch had a statement to avoid using the preprocessed
         input files for nvcc when run_second_cpp is false. Otherwise,
         when a build is necessary and the preprocessed output was used,
         nvcc results with a compiler error. The patch simply ensures
         run_second_cpp is always set to true for the cuda compiler.

Bugfix:  The -optf and -odir options are only accepted for nvcc. For other
         compilers, they behave like '-o ptf' resp. '-o dir'.
         A test was added to check this behaviour.
@seeraven seeraven force-pushed the bugfix/nvcc_without_preprocessed_input branch from ab3c026 to b8d2f48 Compare January 24, 2018 12:21
@seeraven
Copy link
Contributor Author

Update:
The test.sh script now detects the correct paths for the canonical ubuntu package as well as for the official nvidia ubuntu packages. In addition, if any of the required paths is not found, the test of the --libdevice-directory option is skipped.
It's just a feeling, but I think nvcc of CUDA 8 is faster than the one of 7.5.

@jrosdahl
Copy link
Member

I've rebased on master and modified the test.sh script to incorporate your points 1 and 2. nvcc is painfully slow when compiling cuda code. I've modified to use the c++ mode wherever I can to gain speed. However, a few tests must remain using cuda code because that is what we need to test at least once.

OK, thanks. For reference, the "different GPU architectures" test takes ≈10s on my system, which is slightly more than how much time the entire existing test suite takes... The full test suite takes ≈9s without nvcc tests and ≈47s with nvcc tests for me.

Currently, travis is unable to complete the tests because of an error in the pch test suite. I've checked that the same error occurs on the last commit on master, so it wasn't introduced by my latest modifications in the nvcc test suite.

Yes, fixed in 958c4c2.

The test.sh script now detects the correct paths for the canonical ubuntu package as well as for the official nvidia ubuntu packages. In addition, if any of the required paths is not found, the test of the --libdevice-directory option is skipped.

Now all tests work if I disable my masquerading gcc, but with my default setup, the "option --libdevice-directory" test fails. But I was able to fix it like this:

diff --git a/test.sh b/test.sh
index 8d37791..41c1f28 100755
--- a/test.sh
+++ b/test.sh
@@ -3790,7 +3790,7 @@ SUITE_nvcc_ldir_SETUP() {
 }
 
 SUITE_nvcc_ldir() {
-    NVCC_OPTS_CUDA="-Wno-deprecated-gpu-targets -c"
+    NVCC_OPTS_CUDA="-Wno-deprecated-gpu-targets -c -ccbin $REAL_COMPILER"
     CCACHE_NVCC_CUDA="$CCACHE $REAL_NVCC $NVCC_OPTS_CUDA"
     CUOBJDUMP="$REAL_CUOBJDUMP -all -elf -symbols -ptx -sass"
     NVCC_DIR=$(dirname $REAL_NVCC)

@jrosdahl jrosdahl added this to the 3.4 milestone Jan 27, 2018
@jrosdahl
Copy link
Member

Merged with the mentioned fix and some cleanups. Thanks!

@jrosdahl jrosdahl closed this Jan 27, 2018
@jrosdahl
Copy link
Member

Any ideas on how to speed up the tests are of course still welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

issue: feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support nvcc (CUDA) -ccbin/--compiler-bindir option

4 participants