Skip to content

Unable to Take Cartesian Products of Many Arrays #16350

@hcars

Description

@hcars

Describe the bug

When trying to run cartesian from sklearn.utils.extmath on more 32 arrays, the module crashes due to np.indices functioning rejecting the input shape in the source code.

Steps/Code to Reproduce

cartesian(([(1, 2), (3, 4)], [(5, 6), (7, 8)], [(9, 10), (7, 11)], [(3, 12), (13, 14)], [(9, 15), (3, 16)], [(17, 18), (3, 19)], [(1, 20), (21, 22)], [(21, 23), (3, 24)], [(25, 26), (27, 28)], [(21, 29), (3, 30)], [(9, 31), (27, 32)], [(9, 33), (1, 34)], [(21, 35), (17, 36)], [(13, 37), (7, 38)], [(25, 39), (7, 40)], [(1, 41), (27, 42)], [(21, 43), (17, 44)], [(17, 45), (3, 46)], [(25, 47), (17, 48)], [(21, 49), (17, 50)], [(5, 51), (13, 52)], [(1, 53), (7, 54)], [(25, 55), (13, 56)], [(5, 57), (7, 58)], [(9, 59), (1, 60)], [(25, 61), (5, 62)], [(3, 63), (27, 64)], [(25, 65), (7, 66)], [(1, 67), (27, 68)], [(27, 69), (7, 70)], [(21, 71), (17, 72)], [(17, 73), (5, 74)], [(9, 75), (21, 76)], [(21, 77), (13, 78)], [(25, 79), (3, 80)], [(9, 81), (25, 82)], [(9, 83), (7, 84)], [(17, 85), (27, 86)], [(5, 87), (27, 88)], [(25, 89), (13, 90)]))

Example:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
docs = ["Help I have a bug" for i in range(1000)]
vectorizer = CountVectorizer(input=docs, analyzer='word')
lda_features = vectorizer.fit_transform(docs)
lda_model = LatentDirichletAllocation(
    n_topics=10,
    learning_method='online',
    evaluate_every=10,
    n_jobs=4,
)
model = lda_model.fit(lda_features)

If the code is too long, feel free to put it in a public gist and link
it in the issue: https://gist.github.com
-->

Sample code to reproduce the problem

Expected Results

The cartesian product of the lists.

Actual Results

Traceback (most recent call last):
File "/home/hlc5v/.conda/envs/graph_env/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/hlc5v/.conda/envs/graph_env/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/hlc5v/.conda/envs/graph_env/lib/python3.6/cProfile.py", line 160, in
main()
File "/home/hlc5v/.conda/envs/graph_env/lib/python3.6/cProfile.py", line 153, in main
runctx(code, globs, None, options.outfile, options.sort)
File "/home/hlc5v/.conda/envs/graph_env/lib/python3.6/cProfile.py", line 20, in runctx
filename, sort)
File "/home/hlc5v/.conda/envs/graph_env/lib/python3.6/profile.py", line 64, in runctx
prof.runctx(statement, globals, locals)
File "/home/hlc5v/.conda/envs/graph_env/lib/python3.6/cProfile.py", line 100, in runctx
exec(cmd, globals, locals)
File "../../DNF_Approx/GraphConvert/graphtodnf.py", line 191, in
main()
File "../../DNF_Approx/GraphConvert/graphtodnf.py", line 176, in main
pi, vars, weights = createPathSet(curr, infected, args.susceptible[j], t=int(args.time_steps))
File "../../DNF_Approx/GraphConvert/graphtodnf.py", line 103, in createPathSet
PI = cartesian_product(*pathSet)
File "../../DNF_Approx/GraphConvert/graphtodnf.py", line 113, in cartesian_product
arr = np.empty([len(a) for a in arrays] + [la], dtype=tuple)
ValueError: sequence too large; cannot be greater than 32

Versions

System:
python: 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]
executable: C:\Users\hcars\PycharmProjects\DNF_git\GraphForecastApprox\venv\Scripts\python.exe
machine: Windows-10-10.0.17763-SP0
Python dependencies:
pip: 19.0.3
setuptools: 40.8.0
sklearn: 0.22.1
numpy: 1.17.2
scipy: 1.4.1
Cython: None
pandas: 0.25.1
matplotlib: 3.1.1
joblib: 0.14.1
Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions