Skip to content

WIP: nixos/kubernetes: Fix tests on Hydra and OfBorg#37199

Closed
srhb wants to merge 6 commits intoNixOS:masterfrom
srhb:fix-kube-tests
Closed

WIP: nixos/kubernetes: Fix tests on Hydra and OfBorg#37199
srhb wants to merge 6 commits intoNixOS:masterfrom
srhb:fix-kube-tests

Conversation

@srhb
Copy link
Copy Markdown
Contributor

@srhb srhb commented Mar 16, 2018

Motivation for this change

Fix the test issues discussed in #36739

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@GrahamcOfBorg GrahamcOfBorg added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Mar 16, 2018
@srhb
Copy link
Copy Markdown
Contributor Author

srhb commented Mar 17, 2018

I think I've now incorporated the changes as suggested by the discussions in #36739 (plus the few commits that would permit the whole thing to succeed if certs.nix is fixed.)

The problem at hand now is that every certificate/key file that kube components are now pointing at are actually files containing the paths to the cert/key instead of the actual cert/key data.

I do not think I'll manage to resolve this.

@GrahamcOfBorg GrahamcOfBorg added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Mar 17, 2018
@cstrahan
Copy link
Copy Markdown
Contributor

cstrahan commented Mar 17, 2018

I'm also taking a stab at this. This is what I have so far:

diff --git a/nixos/release.nix b/nixos/release.nix
index 6a3fcea1768..842bf264c08 100644
--- a/nixos/release.nix
+++ b/nixos/release.nix
@@ -296,7 +296,11 @@ in rec {
   tests.kernel-copperhead = callTest tests/kernel-copperhead.nix {};
   tests.kernel-latest = callTest tests/kernel-latest.nix {};
   tests.kernel-lts = callTest tests/kernel-lts.nix {};
-  tests.kubernetes = callSubTestsOnTheseSystems ["x86_64-linux"] tests/kubernetes/default.nix {};
+  #tests.kubernetes = callSubTestsOnTheseSystems ["x86_64-linux"] tests/kubernetes/default.nix {};
+  tests.kubernetes.dns = callSubTestsOnTheseSystems ["x86_64-linux"] tests/kubernetes/dns.nix {};
+  ## kubernetes.e2e should eventually replace kubernetes.rbac when it works
+  #tests.kubernetes.e2e = callSubTestsOnTheseSystems ["x86_64-linux"] tests/kubernetes/e2e.nix {};
+  tests.kubernetes.rbac = callSubTestsOnTheseSystems ["x86_64-linux"] tests/kubernetes/rbac.nix {};
   tests.latestKernel.login = callTest tests/login.nix { latestKernel = true; };
   tests.ldap = callTest tests/ldap.nix {};
   #tests.lightdm = callTest tests/lightdm.nix {};
diff --git a/nixos/tests/kubernetes/certs.nix b/nixos/tests/kubernetes/certs.nix
index d3eff910c46..846b2ca6dbf 100644
--- a/nixos/tests/kubernetes/certs.nix
+++ b/nixos/tests/kubernetes/certs.nix
@@ -7,28 +7,59 @@
 }:
 let
   runWithCFSSL = name: cmd:
-    builtins.fromJSON (builtins.readFile (
-      pkgs.runCommand "${name}-cfss.json" {
-        buildInputs = [ pkgs.cfssl ];
-      } "cfssl ${cmd} > $out"
-    ));
+    let secrets = pkgs.runCommand "${name}-cfss.json" {
+        buildInputs = [ pkgs.cfssl pkgs.jq ];
+        outputs = [ "out" "cert" "key" "csr" ];
+      }
+      ''
+        (
+          echo "${cmd}"
+          cfssl ${cmd} > tmp
+          cat tmp | jq -r .key > $key
+          cat tmp | jq -r .cert > $cert
+          cat tmp | jq -r .csr > $csr
+
+          touch $out
+        ) 2>&1 | fold -w 80 -s
+      '';
+    in {
+      key = secrets.key;
+      cert = secrets.cert;
+      csr = secrets.csr;
+    };
 
   writeCFSSL = content:
     pkgs.runCommand content.name {
-      buildInputs = [ pkgs.cfssl ];
+      buildInputs = [ pkgs.cfssl pkgs.jq ];
     } ''
       mkdir -p $out
       cd $out
-      cat ${writeFile content} | cfssljson -bare ${content.name}
+
+      json=${pkgs.lib.escapeShellArg (builtins.toJSON content)}
+
+      # for a given $field in the json, treat the associated value as a
+      # file path and substitute the contents thereof into the $json
+      # object.
+      expandFileField() {
+        local field=$1
+        local path="$(echo "$json" | jq -r ".$field")"
+        json="$(echo "$json" | jq --arg val "$(cat "$path")" ".$field = \$val")"
+      }
+
+      ${pkgs.lib.optionalString (content ? key) "expandFileField key"}
+      ${pkgs.lib.optionalString (content ? ca) "expandFileField ca"}
+      ${pkgs.lib.optionalString (content ? cert) "expandFileField cert"}
+
+      echo "$json" | cfssljson -bare ${content.name}
     '';
 
   noCSR = content: pkgs.lib.filterAttrs (n: v: n != "csr") content;
   noKey = content: pkgs.lib.filterAttrs (n: v: n != "key") content;
 
-  writeFile = content: pkgs.writeText "content" (
-    if pkgs.lib.isAttrs content then builtins.toJSON content
-    else toString content
-  );
+  writeFile = content:
+    if pkgs.lib.isDerivation content
+    then content
+    else pkgs.writeText "content" (builtins.toJSON content);
 
   createServingCertKey = { ca, cn, hosts? [], size ? 2048, name ? cn }:
     noCSR (

The problem at hand now is that every certificate/key file that kube components are now pointing at are actually files containing the paths to the cert/key instead of the actual cert/key data.

@srhb The changes I made to writeCFSSL should fix that.

The test are now running, and they're spewing a whole lot of text. Currently combing through it to see what's up.

@cstrahan
Copy link
Copy Markdown
Contributor

The tests seem to get stuck here:

machine1: running command: kubectl get pod redis | grep Running
[...]
machine1# [   70.895417] kubelet[1394]: I0317 03:50:04.953107    1394 plugins.go:412] Calling network plugin cni to set up pod "probe_default"
machine1# [   70.898231] kubelet[1394]: I0317 03:50:04.955162    1394 cni.go:284] Got netns path /proc/3271/ns/net
machine1# [   70.899945] kubelet[1394]: I0317 03:50:04.955179    1394 cni.go:285] Using podns path default
machine1# [   70.902613] kubelet[1394]: I0317 03:50:04.955243    1394 cni.go:256] About to add CNI network cni-loopback (type=loopback)
machine1# [   70.905511] kubelet[1394]: E0317 03:50:04.955271    1394 cni.go:259] Error adding network: failed to find plugin "loopback" in path [/opt/loopback/bin /opt/cni/bin]
machine1# [   70.910955] kubelet[1394]: E0317 03:50:04.955287    1394 cni.go:220] Error while adding to cni lo network: failed to find plugin "loopback" in path [/opt/loopback/bin /opt/cni/bin]

Note that last line. I'm not a K8S guru, so I'll have to give some thought to both

  • what is wrong, and
  • how to fix it

@cstrahan
Copy link
Copy Markdown
Contributor

Ah, so this is a problem. Early in the logs:

machine1# [   28.301180] kubelet-bootstrap-start[984]: rm: cannot remove '/opt/cni/bin/*': No such file or directory
machine1# [   28.328598] kubelet-bootstrap-start[984]: Linking cni package: /nix/store/y9vzvrnkbl5cg1j4x0vpkryw1p6l2ihd-cni-0.6.0

Here's the Nix responsible for that service:

      systemd.services.kubelet-bootstrap = {
        description = "Boostrap Kubelet";
        wantedBy = ["kubernetes.target"];
        after = ["docker.service" "network.target"];
        path = with pkgs; [ docker ];
        script = ''
          ${concatMapStrings (img: ''
            echo "Seeding docker image: ${img}"
            docker load <${img}
          '') cfg.kubelet.seedDockerImages}

          rm /opt/cni/bin/* || true
          ${concatMapStrings (package: ''
            echo "Linking cni package: ${package}"
            ln -fs ${package.plugins}/* /opt/cni/bin
          '') cfg.kubelet.cni.packages}
        '';
        serviceConfig = {
          Slice = "kubernetes.slice";
          Type = "oneshot";
        };
      };

So that first log line was benign. Let's now look at that directory that was linked:

$ tree /nix/store/y9vzvrnkbl5cg1j4x0vpkryw1p6l2ihd-cni-0.6.0/bin
/nix/store/y9vzvrnkbl5cg1j4x0vpkryw1p6l2ihd-cni-0.6.0/bin
└── cnitool

That's no good -- that's supposed to contain all of the reference plugins, not the cnitool binary. I'm not sure how we got here, but I could picture something like upstream splitting what was one repo into two separate repos; we update our cni package and unintentionally break Kubernetes along the way. Oops.

@cstrahan
Copy link
Copy Markdown
Contributor

cstrahan commented Mar 17, 2018

This seems to be the culprit:

      # Allways include cni plugins
      services.kubernetes.kubelet.cni.packages = [pkgs.cni];

should be

      # Allways include cni plugins
      services.kubernetes.kubelet.cni.packages = [pkgs.cni.plugins];

The cni package has all of the plugins installed in the plugins output.

EDIT: That was actually fine before -- I didn't catch the .plugins in ln -fs ${package.plugins}/* /opt/cni/bin.

However, that's not sufficient. Let's look at that output:

$ tree /nix/store/z04pzvaffrzbv0cgw9fcrhvaqx1q6zr0-cni-0.6.0-plugins
/nix/store/z04pzvaffrzbv0cgw9fcrhvaqx1q6zr0-cni-0.6.0-plugins
└── noop

noop is the only plugin in the cni source tree as of version 0.6.0. If we go back before 0d961f0, we'll see:

$ tree /nix/store/jrv5bf6ls95j5mzzw54i536a3ilf4lm0-cni-0.5.2-plugins
/nix/store/jrv5bf6ls95j5mzzw54i536a3ilf4lm0-cni-0.5.2-plugins
├── bridge
├── dhcp
├── flannel
├── host-local
├── ipvlan
├── loopback
├── macvlan
├── noop
├── ptp
└── tuning

So we can revert that commit, or (preferably, IMO) add a new package for the reference plugins (and we probably should remove the plugins output from cni).

@cstrahan cstrahan mentioned this pull request Mar 17, 2018
8 tasks
@cstrahan
Copy link
Copy Markdown
Contributor

Heads up: I opened #37218 which adds these fixes here along with the fixes I outlined above. Thanks @srhb for getting this ball rolling!

@srhb srhb closed this Mar 17, 2018
@srhb srhb deleted the fix-kube-tests branch March 18, 2018 17:07
cstrahan added a commit to cstrahan/nixpkgs that referenced this pull request Mar 30, 2018
* Fix reference CNI plugins
  * The plugins were split out of the upstream cni repo around version
    0.6.0

* Fix RBAC and DNS tests
  * Fix broken apiVersion fields
  * Change plugin linking to look in ${package}/bin rather than
    ${package.plugins}

* Initial work towards a working e2e test
  * Test still fails, but at least the expression evaluates now

Continues @srhb's work in NixOS#37199

Fixes NixOS#37199
globin pushed a commit to mayflower/nixpkgs that referenced this pull request May 26, 2018
* Fix reference CNI plugins
  * The plugins were split out of the upstream cni repo around version
    0.6.0

* Fix RBAC and DNS tests
  * Fix broken apiVersion fields
  * Change plugin linking to look in ${package}/bin rather than
    ${package.plugins}

* Initial work towards a working e2e test
  * Test still fails, but at least the expression evaluates now

Continues @srhb's work in NixOS#37199

Fixes NixOS#37199

(cherry picked from commit 709b6f6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants