Skip to content

Risk attribution: a[argsort(b)[:k]] #3396

@crusaderky

Description

@crusaderky

Abstract problem

I have y = f(x1, x2, ...), where x1, x2... are 1-dimensional arrays with the same shape, and f is a generic elementwise, embarassingly parallel function (e.g. sum()). As a consequence, y has the same shape as the inputs.
I need to find:

  1. the top k elements of y
  2. the elements of x1, x2,... which produced the top k elements of y

Real life use case

x1, x2... are the values of financial assets under a Monte Carlo risk simulation, one point per scenario, and y is the value of the portfolio containing them. I need to find

  1. the 99% worst value of y, e.g. where the portfolio loses the most money (Value at Risk, or VaR)
  2. the value of each individual asset building the portfolio where the portfolio is the worst 99% (risk attribution)

In qualitative terms, (2) answers the question: When you're on the brink of bankrupting, what is causing you to do so? Which of the products that you own are a time bomb that you should sell NOW if you fancy surviving a disaster, e.g. a new global market crisis?

Solution

The fist point is readily solvable with topk:

k = int(round(y.size * .01)
-(-y).topk(k)[-1]

The second point is not possible in dask to my understanding; in numpy it would be: x[argsort(y)[k]]

There's two things that are missing to make it happen in dask:

  1. an argtopk function, which returns the indexes of the top k elements of an array
  2. the ability to slice a[b], where b is a dask array of integers.

With these, the dask solution to the problem would be x[(-y).argtopk(k)[-1]]

Working on a POC - stay tuned...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions