Skip to content

DOC: provide methods to circumvent unknown chunk sizes#637

Merged
TomAugspurger merged 6 commits intodask:masterfrom
stsievert:incremental-search-dataframes
Apr 9, 2020
Merged

DOC: provide methods to circumvent unknown chunk sizes#637
TomAugspurger merged 6 commits intodask:masterfrom
stsievert:incremental-search-dataframes

Conversation

@stsievert
Copy link
Copy Markdown
Member

What does this PR implement?
It makes the conversion to Dask Array more clear. This reflects the documentation in dask/dask#4516

Reference issues/PRs

@stsievert stsievert force-pushed the incremental-search-dataframes branch from 3ab5e16 to 1eac415 Compare April 8, 2020 17:48
@stsievert
Copy link
Copy Markdown
Member Author

I think the CI failure is unrelated. Here's the error:

Details
================================== FAILURES ===================================
______________________ test_singular_values[randomized] _______________________

svd_solver = 'randomized'

    @pytest.mark.parametrize("svd_solver", ["full", "auto", "randomized"])
    def test_singular_values(svd_solver):
        # Check that the IncrementalPCA output has the correct singular values
    
        rng = np.random.RandomState(0)
        n_samples = 1000
        n_features = 100
    
        X = datasets.make_low_rank_matrix(
            n_samples, n_features, tail_strength=0.0, effective_rank=10, random_state=rng
        )
        X = da.from_array(X, chunks=[200, -1])
    
        pca = PCA(n_components=10, svd_solver=svd_solver, random_state=rng).fit(X)
        ipca = IncrementalPCA(n_components=10, batch_size=100, svd_solver=svd_solver).fit(X)
        assert_array_almost_equal(pca.singular_values_, ipca.singular_values_, 2)
    
        # Compare to the Frobenius norm
        X_pca = pca.transform(X)
        X_ipca = ipca.transform(X)
        assert_array_almost_equal(
            np.sum(pca.singular_values_ ** 2.0), np.linalg.norm(X_pca, "fro") ** 2.0, 12
E        x: array(6.38)
E        y: array(6.39)

tests\test_incremental_pca.py:368: AssertionError

@TomAugspurger
Copy link
Copy Markdown
Member

Thanks. This looks like a nice change until we can support them directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants