Thursday, April 13, 2017

How to Run N Nearest Neighbor Search in Python using kdtree Package

Given S points scattered in a K-dimension space, N nearest neighbor search algorithm finds out for certain point, which N out of S points are its closest neighbors. To implement N nearest neighbor searching algorithm, a kd tree needs to be constructed for all these S points. Searching in the internet, the most popular way to set up kd tree in Python is to use the scipy package. However, I keeps failing on installing scipy package due to the lack of Lapack package. Lapack is a linear algorithm library. Instead, I find another way. Probably some other folks also have the issue of scipy installation. Therefore, let me what I learnt.

My solution is to install the kdtree package. The package can be found in https://github.com/stefankoegl/kdtree. It is pretty straightforward to use. Let me share my example code of N nearest neighbor search algorithm. All points here are in a 3D dimension with N = 2. The average distance to its nearest neighbors is printed out for each point.



import kdtree
import math
import numpy

data = ((1, 0, 0),(2, 0, 0), (2, 0, 0), (3, 0, 0));

# Set up kd tree
myTree = kdtree.create(dimensions=3)
for i in range(0, len(data)):
  tempData = data[i];
  myTree.add((float(tempData[0]), float(tempData[1]), float(tempData[2])));

# Nearest neighbor search
minDist = []
NN = 2; # Number of neighbors to search
for i in range(0, len(data)):
   searchResult = myTree.search_knn(data[i], NN+1); # Find two closest neighbors including itself
   sumDist = 0;
   for p in range(0, NN):
     sumDist = sumDist + math.sqrt(searchResult[p+1][1]);
   avgDist = sumDist/NN;
   print avgDist

No comments:

Post a Comment