-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Abstract problem
I have y = f(x1, x2, ...), where x1, x2... are 1-dimensional arrays with the same shape, and f is a generic elementwise, embarassingly parallel function (e.g. sum()). As a consequence, y has the same shape as the inputs.
I need to find:
- the top k elements of
y - the elements of
x1, x2,...which produced the top k elements ofy
Real life use case
x1, x2... are the values of financial assets under a Monte Carlo risk simulation, one point per scenario, and y is the value of the portfolio containing them. I need to find
- the 99% worst value of y, e.g. where the portfolio loses the most money (Value at Risk, or VaR)
- the value of each individual asset building the portfolio where the portfolio is the worst 99% (risk attribution)
In qualitative terms, (2) answers the question: When you're on the brink of bankrupting, what is causing you to do so? Which of the products that you own are a time bomb that you should sell NOW if you fancy surviving a disaster, e.g. a new global market crisis?
Solution
The fist point is readily solvable with topk:
k = int(round(y.size * .01)
-(-y).topk(k)[-1]
The second point is not possible in dask to my understanding; in numpy it would be: x[argsort(y)[k]]
There's two things that are missing to make it happen in dask:
- an
argtopkfunction, which returns the indexes of the top k elements of an array - the ability to slice
a[b], where b is a dask array of integers.
With these, the dask solution to the problem would be x[(-y).argtopk(k)[-1]]
Working on a POC - stay tuned...