K-nearest neighbor (k-NN) algorithm and its Weka version: How do you calculate the weight 1/d?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

K-nearest neighbor (k-NN) algorithm and its Weka version: How do you calculate the weight 1/d?

Zazzaro Gaetano

Dear all,

 

I have two questions for you.

 

I suppose that weight by 1/distance that I can find in the IBk algorithm of Weka tool is computed by the formula 1/(distance)^2, but I'm not sure.

Can you help me to be sure of the presence of (^2)?

 

Moreover, a very difficult question for you:

If I choose the set S as training set and also as test set, and k = 1, obviously all instances are correctly labeled (d (x, x) = 0). If I increase k (for example k = 9) and choose the weight = 1/d, then all the instances are always labeled correctly, because the weight associated with the nearest point tends to infinity. But why if I increase k a lot (for example k = 299), I get a lot of labeling errors? I suppose 1/0 is considered very big, but how big? I add that in the dataset there are no duplicate instances. Also, I am aware that there can be many points very close to x with the opposite label to the label of x. But, in your opinion, shouldn't the label of x always prevail, since its weight is "infinitely" greater than the other weights? Is it possible that somewhere in the algorithm (for example, for the computation of Euclidean distances or in the search technique (nearestNeighbourSearchAlgorithm = LinearNNSerach)) there is some approximation that is too strong or limiting?

What do you think about my question?

 

Thank you very much

Gaetano

 


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: K-nearest neighbor (k-NN) algorithm and its Weka version: How do you calculate the weight 1/d?

Peter Reutemann
> I have two questions for you.
>
> I suppose that weight by 1/distance that I can find in the IBk algorithm of Weka tool is computed by the formula 1/(distance)^2, but I'm not sure.
>
> Can you help me to be sure of the presence of (^2)?

See the code in the makeDistribution method:
https://github.com/Waikato/weka-trunk/blob/6162515ec4799331b8b63a66136a89fd334f4f94/weka/src/main/java/weka/classifiers/lazy/IBk.java#L866

> Moreover, a very difficult question for you:
>
> If I choose the set S as training set and also as test set, and k = 1, obviously all instances are correctly labeled (d (x, x) = 0). If I increase k (for example k = 9) and choose the weight = 1/d, then all the instances are always labeled correctly, because the weight associated with the nearest point tends to infinity. But why if I increase k a lot (for example k = 299), I get a lot of labeling errors? I suppose 1/0 is considered very big, but how big? I add that in the dataset there are no duplicate instances. Also, I am aware that there can be many points very close to x with the opposite label to the label of x. But, in your opinion, shouldn't the label of x always prevail, since its weight is "infinitely" greater than the other weights? Is it possible that somewhere in the algorithm (for example, for the computation of Euclidean distances or in the search technique (nearestNeighbourSearchAlgorithm = LinearNNSerach)) there is some approximation that is too strong or limiting?

For the 1/d case, Weka adds 0.001 to avoid division by zero
(makeDistribution method):
https://github.com/Waikato/weka-trunk/blob/6162515ec4799331b8b63a66136a89fd334f4f94/weka/src/main/java/weka/classifiers/lazy/IBk.java#L870

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html