There is a new experimental implementation for GBTs which is supposed to be orders of magnitude faster than the vanilla GBTs for N > O(10K) which is not an uncommon use case for SKLL users.
We should include these once they move from experimental to stable versions. Not now.