Explanation of misclassification

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Explanation of misclassification

Alexander Osherenko
I wonder: are there some articles that aim at the ​error explanation​​ of classification results that consider ​the chosen classifier, the data or some other aspects and explain the probable reason of misclassification? For example, a typical answer of this question would be "a classifier works not good with sparse data" or "a classifier works not good because of data overfitting".

Best, Alexander
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Explanation of misclassification

Mark Hall
Your best bet would be to read a good book on machine learning. Strengths, weaknesses and representational power of algorithms will be discussed, along with data characteristics that each is best suited to handle.

Cheers,
Mark.

On 13/06/17, 3:34 AM, "Alexander Osherenko" <[hidden email] on behalf of [hidden email]> wrote:

    I wonder: are there some articles that aim at the ​error explanation​​ of
    classification results that consider ​the chosen classifier, the data or
    some other aspects and explain the probable reason of misclassification? For
    example, a typical answer of this question would be "a classifier works not
    good with sparse data" or "a classifier works not good because of data
    overfitting".
   
    Best, Alexander
   
   
   
    --
    View this message in context: http://weka.8497.n7.nabble.com/Explanation-of-misclassification-tp40936.html
    Sent from the WEKA mailing list archive at Nabble.com.
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Explanation of misclassification

Alexander Osherenko
I read a couple of good books on machine learning, but they were not about the question I am interested in. Maybe, you can recommmend something?

Best, Alexander

2017-06-13 14:54 GMT+01:00 Mark Hall <[hidden email]>:
Your best bet would be to read a good book on machine learning. Strengths, weaknesses and representational power of algorithms will be discussed, along with data characteristics that each is best suited to handle.

Cheers,
Mark.

On 13/06/17, 3:34 AM, "Alexander Osherenko" <[hidden email] on behalf of [hidden email]> wrote:

    I wonder: are there some articles that aim at the ​error explanation​​ of
    classification results that consider ​the chosen classifier, the data or
    some other aspects and explain the probable reason of misclassification? For
    example, a typical answer of this question would be "a classifier works not
    good with sparse data" or "a classifier works not good because of data
    overfitting".

    Best, Alexander



    --
    View this message in context: http://weka.8497.n7.nabble.com/Explanation-of-misclassification-tp40936.html
    Sent from the WEKA mailing list archive at Nabble.com.
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Explanation of misclassification

Eibe Frank-2
Administrator
In reply to this post by Alexander Osherenko
To some extent, the answer to the first question, regarding sparse data, depends on how an algorithm is implemented. In WEKA, all algorithms that use standard WEKA distance or kernel functions should be able to efficiently process sparse data.

If you grep through the source code for calls of the valueSparse() method, you will find all classes that are likely to make efficient use of sparse data when configured appropriately. Here is an example, from a grep run on the source code for the core WEKA distribution, with duplicate and irrelevant hits removed:

grep -r "valueSparse" . | grep ".java:"
./associations/ItemSet.java:            if (m_items[itemIndex] != (int) instance.valueSparse(p1)) {
./attributeSelection/CorrelationAttributeEval.java:            nomAtts[current.index(j)][(int) current.valueSparse(j)][i] += 1;
./attributeSelection/InfoGainAttributeEval.java:              counts[inst.index(i)][(int) inst.valueSparse(i)][numClasses] += inst
./attributeSelection/ReliefFAttributeEval.java:            m_minArray[instance.index(j)] = instance.valueSparse(j);
./classifiers/bayes/NaiveBayesMultinomial.java:              double numOccurrences = instance.valueSparse(a) * instance.weight();
./classifiers/bayes/NaiveBayesMultinomialUpdateable.java:            double numOccurrences = instance.valueSparse(a) * instance.weight();
./classifiers/functions/SGD.java:          result += inst1.valueSparse(p1) * weights[p2];
./classifiers/functions/SMO.java:              result += m_weights[inst.index(p)] * inst.valueSparse(p);
./classifiers/functions/supportVector/CachedKernel.java:          result += inst1.valueSparse(p1) * inst2.valueSparse(p2);
./classifiers/functions/supportVector/RBFKernel.java:          sum += inst.valueSparse(j) * inst.valueSparse(j);
./classifiers/functions/supportVector/RegOptimizer.java:          result += m_weights[inst.index(i)] * inst.valueSparse(i);
./classifiers/functions/VotedPerceptron.java:                result += i1.valueSparse(p1) *
./clusterers/FarthestFirst.java:        diff = difference(firstI, first.valueSparse(p1), second.valueSparse(p2));
./core/DictionaryBuilder.java:              docLength += inst.valueSparse(j) * inst.valueSparse(j);
./core/neighboursearch/balltrees/BallNode.java:        attrVals[j] += temp.valueSparse(j);
./core/neighboursearch/balltrees/BottomUpConstructor.java:      attrVals[k] += node1.anchor.valueSparse(k)*anchr1Ratio;
./core/neighboursearch/balltrees/MiddleOutConstructor.java:      attrVals[k] += node1.anchor.valueSparse(k) * anchr1Ratio;
./core/NormalizableDistance.java:        diff = difference(firstI, first.valueSparse(p1), second.valueSparse(p2));
./filters/unsupervised/attribute/CartesianProduct.java:          newVals[inst.index(i)] = inst.valueSparse(i);
./filters/unsupervised/attribute/NumericToBinary.java:          vals[j] = instance.valueSparse(j);
./filters/unsupervised/attribute/RandomProjection.java:        double value = instance.valueSparse(i);
./filters/unsupervised/attribute/RandomSubset.java:          classValue = instance.valueSparse(p1);
./filters/unsupervised/attribute/ReplaceMissingValues.java:    double value = inst.valueSparse(i);

This list does not include some methods, such as IBk and GaussianProcesses, that access the data through one of the classes listed above. For example, EuclideanDistance and ManhattanDistance, which are used by IBk, inherit the relevant code from NormalizableDistance. GaussianProcesses can be used with standard kernels, e.g., PolynomialKernel and RBFKernel, that extend CachedKernel. Apriori for item set mining uses the ItemSet class.

Note that, in SMO and GaussianProcesses, etc., you will want to turn off normalisation to make efficient use of sparse data.

One way to find out if a WEKA method is configured correctly to make efficient use of sparse data is to use the SparseToNonSparse and NonSparseToSparse filters to create two versions of a (sparse) dataset and compare runtime on the two versions.

Regarding overfitting, most algorithms have a parameter (or parameters) that control the closeness of fit to the training data. For example, in J48, if you prune the tree all the way back to the root node, you will get the ZeroR classifier. Another way to combat overfitting is to apply Bagging in conjunction with the base classifier that overfits, perhaps in conjunction with the RandomSubspace classifier.

You may want to perform a bias-variance decomposition if you suspect overfitting to the particular dataset at hand. The WEKA implementation of the decomposition can currently only be run from the command-line. Here is an example:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.trees.J48-C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3877
Sigma^2      : 0    
Bias^2       : 0.1902
Variance     : 0.1935

Bagging reduces error, primarily by reducing variance:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .Bagging -- -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.meta.Bagging-P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3326
Sigma^2      : 0    
Bias^2       : 0.1727
Variance     : 0.1567

Throwing the RandomSubSpace method into the mix reduces variance even further:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .Bagging -- -W .RandomSubSpace -- -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.meta.Bagging-P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.meta.RandomSubSpace -- -P 0.5 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3161
Sigma^2      : 0    
Bias^2       : 0.175
Variance     : 0.1383

Cheers,
Eibe

> On 13/06/2017, at 7:34 PM, Alexander Osherenko <[hidden email]> wrote:
>
> I wonder: are there some articles that aim at the ​error explanation​​ of
> classification results that consider ​the chosen classifier, the data or
> some other aspects and explain the probable reason of misclassification? For
> example, a typical answer of this question would be "a classifier works not
> good with sparse data" or "a classifier works not good because of data
> overfitting".
>
> Best, Alexander
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Explanation-of-misclassification-tp40936.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Explanation of misclassification

Alexander Osherenko
My question was a little bit more theoretic. Are there other characteristics of data (besides quantity -- sparse or overfitting) that are considered in Weka?

Best, Alexander

2017-06-14 1:07 GMT+01:00 Eibe Frank <[hidden email]>:
To some extent, the answer to the first question, regarding sparse data, depends on how an algorithm is implemented. In WEKA, all algorithms that use standard WEKA distance or kernel functions should be able to efficiently process sparse data.

If you grep through the source code for calls of the valueSparse() method, you will find all classes that are likely to make efficient use of sparse data when configured appropriately. Here is an example, from a grep run on the source code for the core WEKA distribution, with duplicate and irrelevant hits removed:

grep -r "valueSparse" . | grep ".java:"
./associations/ItemSet.java:            if (m_items[itemIndex] != (int) instance.valueSparse(p1)) {
./attributeSelection/CorrelationAttributeEval.java:            nomAtts[current.index(j)][(int) current.valueSparse(j)][i] += 1;
./attributeSelection/InfoGainAttributeEval.java:              counts[inst.index(i)][(int) inst.valueSparse(i)][numClasses] += inst
./attributeSelection/ReliefFAttributeEval.java:            m_minArray[instance.index(j)] = instance.valueSparse(j);
./classifiers/bayes/NaiveBayesMultinomial.java:              double numOccurrences = instance.valueSparse(a) * instance.weight();
./classifiers/bayes/NaiveBayesMultinomialUpdateable.java:            double numOccurrences = instance.valueSparse(a) * instance.weight();
./classifiers/functions/SGD.java:          result += inst1.valueSparse(p1) * weights[p2];
./classifiers/functions/SMO.java:              result += m_weights[inst.index(p)] * inst.valueSparse(p);
./classifiers/functions/supportVector/CachedKernel.java:          result += inst1.valueSparse(p1) * inst2.valueSparse(p2);
./classifiers/functions/supportVector/RBFKernel.java:          sum += inst.valueSparse(j) * inst.valueSparse(j);
./classifiers/functions/supportVector/RegOptimizer.java:          result += m_weights[inst.index(i)] * inst.valueSparse(i);
./classifiers/functions/VotedPerceptron.java:                result += i1.valueSparse(p1) *
./clusterers/FarthestFirst.java:        diff = difference(firstI, first.valueSparse(p1), second.valueSparse(p2));
./core/DictionaryBuilder.java:              docLength += inst.valueSparse(j) * inst.valueSparse(j);
./core/neighboursearch/balltrees/BallNode.java:        attrVals[j] += temp.valueSparse(j);
./core/neighboursearch/balltrees/BottomUpConstructor.java:      attrVals[k] += node1.anchor.valueSparse(k)*anchr1Ratio;
./core/neighboursearch/balltrees/MiddleOutConstructor.java:      attrVals[k] += node1.anchor.valueSparse(k) * anchr1Ratio;
./core/NormalizableDistance.java:        diff = difference(firstI, first.valueSparse(p1), second.valueSparse(p2));
./filters/unsupervised/attribute/CartesianProduct.java:          newVals[inst.index(i)] = inst.valueSparse(i);
./filters/unsupervised/attribute/NumericToBinary.java:          vals[j] = instance.valueSparse(j);
./filters/unsupervised/attribute/RandomProjection.java:        double value = instance.valueSparse(i);
./filters/unsupervised/attribute/RandomSubset.java:          classValue = instance.valueSparse(p1);
./filters/unsupervised/attribute/ReplaceMissingValues.java:         double value = inst.valueSparse(i);

This list does not include some methods, such as IBk and GaussianProcesses, that access the data through one of the classes listed above. For example, EuclideanDistance and ManhattanDistance, which are used by IBk, inherit the relevant code from NormalizableDistance. GaussianProcesses can be used with standard kernels, e.g., PolynomialKernel and RBFKernel, that extend CachedKernel. Apriori for item set mining uses the ItemSet class.

Note that, in SMO and GaussianProcesses, etc., you will want to turn off normalisation to make efficient use of sparse data.

One way to find out if a WEKA method is configured correctly to make efficient use of sparse data is to use the SparseToNonSparse and NonSparseToSparse filters to create two versions of a (sparse) dataset and compare runtime on the two versions.

Regarding overfitting, most algorithms have a parameter (or parameters) that control the closeness of fit to the training data. For example, in J48, if you prune the tree all the way back to the root node, you will get the ZeroR classifier. Another way to combat overfitting is to apply Bagging in conjunction with the base classifier that overfits, perhaps in conjunction with the RandomSubspace classifier.

You may want to perform a bias-variance decomposition if you suspect overfitting to the particular dataset at hand. The WEKA implementation of the decomposition can currently only be run from the command-line. Here is an example:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.trees.J48-C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3877
Sigma^2      : 0
Bias^2       : 0.1902
Variance     : 0.1935

Bagging reduces error, primarily by reducing variance:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .Bagging -- -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.meta.Bagging-P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3326
Sigma^2      : 0
Bias^2       : 0.1727
Variance     : 0.1567

Throwing the RandomSubSpace method into the mix reduces variance even further:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .Bagging -- -W .RandomSubSpace -- -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.meta.Bagging-P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.meta.RandomSubSpace -- -P 0.5 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3161
Sigma^2      : 0
Bias^2       : 0.175
Variance     : 0.1383

Cheers,
Eibe

> On 13/06/2017, at 7:34 PM, Alexander Osherenko <[hidden email]> wrote:
>
> I wonder: are there some articles that aim at the ​error explanation​​ of
> classification results that consider ​the chosen classifier, the data or
> some other aspects and explain the probable reason of misclassification? For
> example, a typical answer of this question would be "a classifier works not
> good with sparse data" or "a classifier works not good because of data
> overfitting".
>
> Best, Alexander
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Explanation-of-misclassification-tp40936.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Explanation of misclassification

Davide Barbieri
Hello Alexander,

Overfitting may happen when data are sparse, because the model "adheres" too tighlty to the given data and does not generalize well.

The theoretical answer to your question lmay lie in the fact that the process you are observing is non-deterministic in nature, like it is the case in statistics and data mining, which deal with uncertainty (in medicine, biology, finance etc.).

Think of classification as a form of regression. In linear regression, if (many) data points are close to the line, prediction may be accurate. If they were always lying on a line then that would be a deterministic "law" and not a model (for example, the law of Newton). No residuals (errors due to randomness) are present.

If the underlying process, which we dont know but we want to model, that generates the data points you have collected is highly random, then predictability will be poor, no matter how many data you collect. Further, that variables you are observing may not be as "diagnostic" as expected and poorly correlated to the process you want to infer.

I hope I have been able to correctly address your question, 

2017-06-15 9:46 GMT+02:00 Alexander Osherenko <[hidden email]>:
My question was a little bit more theoretic. Are there other characteristics of data (besides quantity -- sparse or overfitting) that are considered in Weka?

Best, Alexander

2017-06-14 1:07 GMT+01:00 Eibe Frank <[hidden email]>:
To some extent, the answer to the first question, regarding sparse data, depends on how an algorithm is implemented. In WEKA, all algorithms that use standard WEKA distance or kernel functions should be able to efficiently process sparse data.

If you grep through the source code for calls of the valueSparse() method, you will find all classes that are likely to make efficient use of sparse data when configured appropriately. Here is an example, from a grep run on the source code for the core WEKA distribution, with duplicate and irrelevant hits removed:

grep -r "valueSparse" . | grep ".java:"
./associations/ItemSet.java:            if (m_items[itemIndex] != (int) instance.valueSparse(p1)) {
./attributeSelection/CorrelationAttributeEval.java:            nomAtts[current.index(j)][(int) current.valueSparse(j)][i] += 1;
./attributeSelection/InfoGainAttributeEval.java:              counts[inst.index(i)][(int) inst.valueSparse(i)][numClasses] += inst
./attributeSelection/ReliefFAttributeEval.java:            m_minArray[instance.index(j)] = instance.valueSparse(j);
./classifiers/bayes/NaiveBayesMultinomial.java:              double numOccurrences = instance.valueSparse(a) * instance.weight();
./classifiers/bayes/NaiveBayesMultinomialUpdateable.java:            double numOccurrences = instance.valueSparse(a) * instance.weight();
./classifiers/functions/SGD.java:          result += inst1.valueSparse(p1) * weights[p2];
./classifiers/functions/SMO.java:              result += m_weights[inst.index(p)] * inst.valueSparse(p);
./classifiers/functions/supportVector/CachedKernel.java:          result += inst1.valueSparse(p1) * inst2.valueSparse(p2);
./classifiers/functions/supportVector/RBFKernel.java:          sum += inst.valueSparse(j) * inst.valueSparse(j);
./classifiers/functions/supportVector/RegOptimizer.java:          result += m_weights[inst.index(i)] * inst.valueSparse(i);
./classifiers/functions/VotedPerceptron.java:                result += i1.valueSparse(p1) *
./clusterers/FarthestFirst.java:        diff = difference(firstI, first.valueSparse(p1), second.valueSparse(p2));
./core/DictionaryBuilder.java:              docLength += inst.valueSparse(j) * inst.valueSparse(j);
./core/neighboursearch/balltrees/BallNode.java:        attrVals[j] += temp.valueSparse(j);
./core/neighboursearch/balltrees/BottomUpConstructor.java:      attrVals[k] += node1.anchor.valueSparse(k)*anchr1Ratio;
./core/neighboursearch/balltrees/MiddleOutConstructor.java:      attrVals[k] += node1.anchor.valueSparse(k) * anchr1Ratio;
./core/NormalizableDistance.java:        diff = difference(firstI, first.valueSparse(p1), second.valueSparse(p2));
./filters/unsupervised/attribute/CartesianProduct.java:          newVals[inst.index(i)] = inst.valueSparse(i);
./filters/unsupervised/attribute/NumericToBinary.java:          vals[j] = instance.valueSparse(j);
./filters/unsupervised/attribute/RandomProjection.java:        double value = instance.valueSparse(i);
./filters/unsupervised/attribute/RandomSubset.java:          classValue = instance.valueSparse(p1);
./filters/unsupervised/attribute/ReplaceMissingValues.java:         double value = inst.valueSparse(i);

This list does not include some methods, such as IBk and GaussianProcesses, that access the data through one of the classes listed above. For example, EuclideanDistance and ManhattanDistance, which are used by IBk, inherit the relevant code from NormalizableDistance. GaussianProcesses can be used with standard kernels, e.g., PolynomialKernel and RBFKernel, that extend CachedKernel. Apriori for item set mining uses the ItemSet class.

Note that, in SMO and GaussianProcesses, etc., you will want to turn off normalisation to make efficient use of sparse data.

One way to find out if a WEKA method is configured correctly to make efficient use of sparse data is to use the SparseToNonSparse and NonSparseToSparse filters to create two versions of a (sparse) dataset and compare runtime on the two versions.

Regarding overfitting, most algorithms have a parameter (or parameters) that control the closeness of fit to the training data. For example, in J48, if you prune the tree all the way back to the root node, you will get the ZeroR classifier. Another way to combat overfitting is to apply Bagging in conjunction with the base classifier that overfits, perhaps in conjunction with the RandomSubspace classifier.

You may want to perform a bias-variance decomposition if you suspect overfitting to the particular dataset at hand. The WEKA implementation of the decomposition can currently only be run from the command-line. Here is an example:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.trees.J48-C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3877
Sigma^2      : 0
Bias^2       : 0.1902
Variance     : 0.1935

Bagging reduces error, primarily by reducing variance:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .Bagging -- -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.meta.Bagging-P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3326
Sigma^2      : 0
Bias^2       : 0.1727
Variance     : 0.1567

Throwing the RandomSubSpace method into the mix reduces variance even further:

java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .Bagging -- -W .RandomSubSpace -- -W .J48

Bias-Variance Decomposition

Classifier   : weka.classifiers.meta.Bagging-P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.meta.RandomSubSpace -- -P 0.5 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Data File    : /Users/eibe/datasets/UCI/diabetes.arff
Class Index  : last
Training Pool: 100
Iterations   : 50
Seed         : 1
Error        : 0.3161
Sigma^2      : 0
Bias^2       : 0.175
Variance     : 0.1383

Cheers,
Eibe

> On 13/06/2017, at 7:34 PM, Alexander Osherenko <[hidden email]> wrote:
>
> I wonder: are there some articles that aim at the ​error explanation​​ of
> classification results that consider ​the chosen classifier, the data or
> some other aspects and explain the probable reason of misclassification? For
> example, a typical answer of this question would be "a classifier works not
> good with sparse data" or "a classifier works not good because of data
> overfitting".
>
> Best, Alexander
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Explanation-of-misclassification-tp40936.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
Davide Barbieri

http://docente.unife.it/davide.barbieri/

Universita' di Ferrara - http://www.unife.it/

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Explanation of misclassification

Eibe Frank-2
Administrator
In reply to this post by Alexander Osherenko
You can also differentiate based on attribute types present in a dataset and whether missing values are present.

For example, decision tree learners are often a good choice when you have a mix of numeric attributes and nominal attributes, and there are also missing values in your data.

Page 351 of Hastie et al.’s book (free PDF download) has a nice table (Table 10.1):

  http://statweb.stanford.edu/~tibs/ElemStatLearn/

(Note that it does not consider ensemble classifiers and only basic decision trees.)

Cheers,
Eibe

> On 15/06/2017, at 7:46 PM, Alexander Osherenko <[hidden email]> wrote:
>
> My question was a little bit more theoretic. Are there other characteristics of data (besides quantity -- sparse or overfitting) that are considered in Weka?
>
> Best, Alexander
>
> 2017-06-14 1:07 GMT+01:00 Eibe Frank <[hidden email]>:
> To some extent, the answer to the first question, regarding sparse data, depends on how an algorithm is implemented. In WEKA, all algorithms that use standard WEKA distance or kernel functions should be able to efficiently process sparse data.
>
> If you grep through the source code for calls of the valueSparse() method, you will find all classes that are likely to make efficient use of sparse data when configured appropriately. Here is an example, from a grep run on the source code for the core WEKA distribution, with duplicate and irrelevant hits removed:
>
> grep -r "valueSparse" . | grep ".java:"
> ./associations/ItemSet.java:            if (m_items[itemIndex] != (int) instance.valueSparse(p1)) {
> ./attributeSelection/CorrelationAttributeEval.java:            nomAtts[current.index(j)][(int) current.valueSparse(j)][i] += 1;
> ./attributeSelection/InfoGainAttributeEval.java:              counts[inst.index(i)][(int) inst.valueSparse(i)][numClasses] += inst
> ./attributeSelection/ReliefFAttributeEval.java:            m_minArray[instance.index(j)] = instance.valueSparse(j);
> ./classifiers/bayes/NaiveBayesMultinomial.java:              double numOccurrences = instance.valueSparse(a) * instance.weight();
> ./classifiers/bayes/NaiveBayesMultinomialUpdateable.java:            double numOccurrences = instance.valueSparse(a) * instance.weight();
> ./classifiers/functions/SGD.java:          result += inst1.valueSparse(p1) * weights[p2];
> ./classifiers/functions/SMO.java:              result += m_weights[inst.index(p)] * inst.valueSparse(p);
> ./classifiers/functions/supportVector/CachedKernel.java:          result += inst1.valueSparse(p1) * inst2.valueSparse(p2);
> ./classifiers/functions/supportVector/RBFKernel.java:          sum += inst.valueSparse(j) * inst.valueSparse(j);
> ./classifiers/functions/supportVector/RegOptimizer.java:          result += m_weights[inst.index(i)] * inst.valueSparse(i);
> ./classifiers/functions/VotedPerceptron.java:                result += i1.valueSparse(p1) *
> ./clusterers/FarthestFirst.java:        diff = difference(firstI, first.valueSparse(p1), second.valueSparse(p2));
> ./core/DictionaryBuilder.java:              docLength += inst.valueSparse(j) * inst.valueSparse(j);
> ./core/neighboursearch/balltrees/BallNode.java:        attrVals[j] += temp.valueSparse(j);
> ./core/neighboursearch/balltrees/BottomUpConstructor.java:      attrVals[k] += node1.anchor.valueSparse(k)*anchr1Ratio;
> ./core/neighboursearch/balltrees/MiddleOutConstructor.java:      attrVals[k] += node1.anchor.valueSparse(k) * anchr1Ratio;
> ./core/NormalizableDistance.java:        diff = difference(firstI, first.valueSparse(p1), second.valueSparse(p2));
> ./filters/unsupervised/attribute/CartesianProduct.java:          newVals[inst.index(i)] = inst.valueSparse(i);
> ./filters/unsupervised/attribute/NumericToBinary.java:          vals[j] = instance.valueSparse(j);
> ./filters/unsupervised/attribute/RandomProjection.java:        double value = instance.valueSparse(i);
> ./filters/unsupervised/attribute/RandomSubset.java:          classValue = instance.valueSparse(p1);
> ./filters/unsupervised/attribute/ReplaceMissingValues.java:         double value = inst.valueSparse(i);
>
> This list does not include some methods, such as IBk and GaussianProcesses, that access the data through one of the classes listed above. For example, EuclideanDistance and ManhattanDistance, which are used by IBk, inherit the relevant code from NormalizableDistance. GaussianProcesses can be used with standard kernels, e.g., PolynomialKernel and RBFKernel, that extend CachedKernel. Apriori for item set mining uses the ItemSet class.
>
> Note that, in SMO and GaussianProcesses, etc., you will want to turn off normalisation to make efficient use of sparse data.
>
> One way to find out if a WEKA method is configured correctly to make efficient use of sparse data is to use the SparseToNonSparse and NonSparseToSparse filters to create two versions of a (sparse) dataset and compare runtime on the two versions.
>
> Regarding overfitting, most algorithms have a parameter (or parameters) that control the closeness of fit to the training data. For example, in J48, if you prune the tree all the way back to the root node, you will get the ZeroR classifier. Another way to combat overfitting is to apply Bagging in conjunction with the base classifier that overfits, perhaps in conjunction with the RandomSubspace classifier.
>
> You may want to perform a bias-variance decomposition if you suspect overfitting to the particular dataset at hand. The WEKA implementation of the decomposition can currently only be run from the command-line. Here is an example:
>
> java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .J48
>
> Bias-Variance Decomposition
>
> Classifier   : weka.classifiers.trees.J48-C 0.25 -M 2
> Data File    : /Users/eibe/datasets/UCI/diabetes.arff
> Class Index  : last
> Training Pool: 100
> Iterations   : 50
> Seed         : 1
> Error        : 0.3877
> Sigma^2      : 0
> Bias^2       : 0.1902
> Variance     : 0.1935
>
> Bagging reduces error, primarily by reducing variance:
>
> java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .Bagging -- -W .J48
>
> Bias-Variance Decomposition
>
> Classifier   : weka.classifiers.meta.Bagging-P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
> Data File    : /Users/eibe/datasets/UCI/diabetes.arff
> Class Index  : last
> Training Pool: 100
> Iterations   : 50
> Seed         : 1
> Error        : 0.3326
> Sigma^2      : 0
> Bias^2       : 0.1727
> Variance     : 0.1567
>
> Throwing the RandomSubSpace method into the mix reduces variance even further:
>
> java weka.classifiers.BVDecompose -t ~/datasets/UCI/diabetes.arff -W .Bagging -- -W .RandomSubSpace -- -W .J48
>
> Bias-Variance Decomposition
>
> Classifier   : weka.classifiers.meta.Bagging-P 100 -S 1 -num-slots 1 -I 10 -W weka.classifiers.meta.RandomSubSpace -- -P 0.5 -S 1 -num-slots 1 -I 10 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
> Data File    : /Users/eibe/datasets/UCI/diabetes.arff
> Class Index  : last
> Training Pool: 100
> Iterations   : 50
> Seed         : 1
> Error        : 0.3161
> Sigma^2      : 0
> Bias^2       : 0.175
> Variance     : 0.1383
>
> Cheers,
> Eibe
>
> > On 13/06/2017, at 7:34 PM, Alexander Osherenko <[hidden email]> wrote:
> >
> > I wonder: are there some articles that aim at the ​error explanation​​ of
> > classification results that consider ​the chosen classifier, the data or
> > some other aspects and explain the probable reason of misclassification? For
> > example, a typical answer of this question would be "a classifier works not
> > good with sparse data" or "a classifier works not good because of data
> > overfitting".
> >
> > Best, Alexander
> >
> >
> >
> > --
> > View this message in context: http://weka.8497.n7.nabble.com/Explanation-of-misclassification-tp40936.html
> > Sent from the WEKA mailing list archive at Nabble.com.
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...