Skip to content

Slicing #9

Closed
mrocklin wants to merge 8 commits intodask:masterfrom
nevermindewe:slicing
Closed

Slicing #9
mrocklin wants to merge 8 commits intodask:masterfrom
nevermindewe:slicing

Conversation

@mrocklin
Copy link
Member

This is @nevermindewe s work. Creating a PR for comments.

Wesley Emeneker added 6 commits January 14, 2015 10:23
…us mode.

In synchronous mode, only one thread in the pool will be used.
This is not a complete/fully baked solution; it is incremental progress.
Things that need to be done before it can be considered fully baked:
 - deciding whether or not to create the threadpool in synchronous mode
 - adding a configuration mechanism to specify sync/async mode when threaded.get is called. (Currently it relies on kwargs to determine sync or async)

To support sync mode, the "state" dict locks had to be rearranged.
Now, locking is a little finer-grained and immediately surrounds the code that updates the state dict.

Side-effects of sync mode:
 - state['num-active-threads'] will never be non-zero (based on when it is updated).
   This means that the queue would never be emptied while jobs were waiting to run because "state['num-active-threads'] < nthreads" was always true.
   This was solved by adding a max_queue_size variable that tells the maximum number of items allowed in the queue before we start emptying it.
…ial_scheduler

This merges the changesets ef08fcd and 6c2cdef with the synchronous/asynchronous changes committed earlier today.

Conflicts:
	dask/threaded.py
… and tests that verify the functionality.

Current state:
 - a reference implementation for 1D slicing is implemented
 - a helper function that (so far) constructs a correct dask slice graph (minus step, indexing without slicing, and ellipsis) for n-dimensional slicing passes tests for 1D and 2D slicing

Remaining to be written:
 - a front-end function (or perhaps a few functions) that setup the data necessary to correctly call the helper _dask_slice()

Remaining to be figured out:
 - how to deal with slice steps in a clear, concise, easy-to-understand way
 - how to deal with indexing and ellipsis (array[0,0:4,...]) or open colon (array[:,0:3])
This is based on @mrocklin's ideas and prototyping for how things should work.
The result is cleaner (so far).

This implementation passes basic tests, and does not include any optimizations yet.
Adding comments and doctests revealed some bugs in the previous code that didn't properly handle steps with offsets.
The updated code fixes these errors, but I'm not happy with it.
I think that the code can be significantly simpler and clearer, but I don't understand it well-enough yet.
As it stands, steps and starting offsets should be properly handled now.
These additions should better explain what and why the function.
Added doctests to give examples of what/why should be returned.
dask/array.py Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still use this function? If not should remove it.

Wesley Emeneker added 2 commits January 19, 2015 17:37
 - non-slice index examples added to doctest
 - Reformatted and updated docstrings
 - tested against python 3.4 and fixed integer vs. float division problems
 - reformatted to have 80 max 80 columns per line in code, and max 72 columns per line in comments
Added optimization where if all slices are slice(None,None,None), the original object is returned.
Tested in both py2.7 and py3.4
@mrocklin mrocklin mentioned this pull request Jan 20, 2015
@mrocklin mrocklin closed this Jan 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant