Skip to content

RANN with ball trees fails to make theoretical guarantees #338

@rcurtin

Description

@rcurtin

Reported by rcurtin on 8 Dec 44565964 13:44 UTC
The recent solution of #320 introduced a new issue in r16855, in allkrann_search_test.cpp; the SingleBallTreeTest and DualBallTreeTest tests fail to make the theoretical guarantees given for the success probability of RANN.

After some amount of digging, I think I may have isolated the problem for the single-tree case (but I am not certain). I think the dual-tree problem is the same, though, so we can consider the simpler single-tree case.

Score(queryIndex, referenceNode) has the following basic structure:

  if ((SortPolicy::IsBetter(distance, bestDistance))
      && numSamplesMade[< numSamplesReqd)
  {
    // Do either real approximation or brute-force search.
  }
  else
  {
    // Do "fake" approximation.  We don't need to actually sample.
  }

This makes reasonable sense; if the node is too far from the point or the required number of samples has already been made, then the node can be pruned, and no more sampling is necessary. Otherwise, we have to either sample the node, or recurse deeper for brute-force search.

However, consider the case where the node is sufficiently close to the point that it can't be pruned (i.e. dist_to_node(q, R) < ub(q)) but numSamplesMadequeryIndex >= numSamplesReqd. In this situation we have already sampled enough points from the dataset. But now suppose that R contains, as its descendants, all of the first many nearest neighbors of q; this means that by pruning R and not sampling it (which is what will happen in this case), the final rank-approximate nearest neighbor result for the query q will not be in the top many true nearest neighbors (we can call this a "failure").

Basically, what appears to have happened here is that we have sampled the necessary number of points from the dataset, but they have not been sampled uniformly because we are pruning some nodes that could contain better points and not sampling them.

I've sent an email to Pari about this to get his input on what the best way to solve the issue is, or if I have overlooked something simpler.

Migrated-From: http://trac.research.cc.gatech.edu/fastlab/ticket/356

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions