Dear all, I have two questions for you. I suppose that weight by 1/distance that I can find in the IBk algorithm of Weka tool is computed
by the formula 1/(distance)^2, but I'm not sure. Can you help me to be sure of the presence of (^2)? Moreover, a very difficult question for you: If I choose the set S as training set and also as test set, and k = 1, obviously all instances are
correctly labeled (d (x, x) = 0). If I increase k (for example k = 9) and choose the weight = 1/d, then all the instances are always labeled correctly, because the weight associated with the nearest point tends to infinity. But why if I increase k a lot (for
example k = 299), I get a lot of labeling errors? I suppose 1/0 is considered very big, but how big? I add that in the dataset there are no duplicate instances. Also, I am aware that there can be many points very close to x with the opposite label to the label
of x. But, in your opinion, shouldn't the label of x always prevail, since its weight is "infinitely" greater than the other weights? Is it possible that somewhere in the algorithm (for example, for the computation of Euclidean distances or in the search technique
(nearestNeighbourSearchAlgorithm = LinearNNSerach)) there is some approximation that is too strong or limiting? What do you think about my question? Thank you very much Gaetano _______________________________________________ Wekalist mailing list -- [hidden email] Send posts to [hidden email] To unsubscribe send an email to [hidden email] To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
> I have two questions for you.
> > I suppose that weight by 1/distance that I can find in the IBk algorithm of Weka tool is computed by the formula 1/(distance)^2, but I'm not sure. > > Can you help me to be sure of the presence of (^2)? See the code in the makeDistribution method: https://github.com/Waikato/weka-trunk/blob/6162515ec4799331b8b63a66136a89fd334f4f94/weka/src/main/java/weka/classifiers/lazy/IBk.java#L866 > Moreover, a very difficult question for you: > > If I choose the set S as training set and also as test set, and k = 1, obviously all instances are correctly labeled (d (x, x) = 0). If I increase k (for example k = 9) and choose the weight = 1/d, then all the instances are always labeled correctly, because the weight associated with the nearest point tends to infinity. But why if I increase k a lot (for example k = 299), I get a lot of labeling errors? I suppose 1/0 is considered very big, but how big? I add that in the dataset there are no duplicate instances. Also, I am aware that there can be many points very close to x with the opposite label to the label of x. But, in your opinion, shouldn't the label of x always prevail, since its weight is "infinitely" greater than the other weights? Is it possible that somewhere in the algorithm (for example, for the computation of Euclidean distances or in the search technique (nearestNeighbourSearchAlgorithm = LinearNNSerach)) there is some approximation that is too strong or limiting? For the 1/d case, Weka adds 0.001 to avoid division by zero (makeDistribution method): https://github.com/Waikato/weka-trunk/blob/6162515ec4799331b8b63a66136a89fd334f4f94/weka/src/main/java/weka/classifiers/lazy/IBk.java#L870 Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 577-5304 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/ _______________________________________________ Wekalist mailing list -- [hidden email] Send posts to [hidden email] To unsubscribe send an email to [hidden email] To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
Free forum by Nabble | Edit this page |