A thousand thanks to Professor Eibe and everyones help! I only have one final process left needing confirmation.
Respected Weka datamining team
A thousand thanks to everyone for clarifying my past questions! They really help a ton!!! Again thanks to everyone for your diligent insights!
Sorry for bothering you one more time, I only have one more process I need a confirmation of. I do no longer have questions after this one. I normally wouldn’t bother the same people with repeated questions, its just
that weka is a software and not an algorithm thus I cannot simply ask someone else from a related field.
The question I need a confirmation of is if my interpretation of the results of the weka experimenter is completely correct. I already read the weka online appendix thoroughly and set everything up accordingly, I should
be correct, however I cannot afford mistakes in the current situation. Again, the below text does not contain any real investigative questions, they are really just basic things needing a yes or no confirmation.
I want to run 5 repeated classification on 3 different datasets (iris.arff; weather.numeric.arff, labor.arff) and need to collect the average classification performance (accuracy and kappa) of the 3 repeats for each datasets.
I ran the weka experimenter (simple mode) according to the manual using 10 fold cross validation, classification. 5 repeats and “datasets first” options.
The algorithm used is J48 –C0.25 –M2.
All 3 datasets had nominal classes.
Iris.arff has 4 numeric attribute.
Weather.numeric.arff and labor.arff contain a mix of numerical and nominal attributes.
I did no data preprocessing such as scaling or converting all attributes into nominal.
Decision tree methods such as J48 and weka random forest can handle all input datasets without preprocessing, regardless if the input datasets are purely numerical, purely nominal or a mix of both. And both algorithms
have intrinsic attribute selection capability hence I do not have to use an external feature selector such as CfsSubsetEval.
Am I correct on the above point???
After the experimenter run is complete, click on “Analyze” tab and click on “Experiment”
And further click on “show std deviations”
In the “Comparison field” select “Percent_correct”. Leave everything else default.
So iris has a kappa of 0.92±0.08…and
so on. Am I correct in my understanding?
As I understand the result of the experimenter should correspond to the results from the explorer. “Percent_correct” in the EXPERIMENTER should correspond to “Correctly Classified Instances 94.6667 %” in the EXPLORER(see
“Kappa_statistics” in the EXPERIMENTER should correspond to “Kappa statistic 0.92” in the EXPLORER(see below).
I would like to point out that I found that repeated runs using Experimenter showed the accuracy and kappa changes slightly which is expected. However using the explorer, I get the same 94.6667% correct every time, even
if I use a different random seed every time. How come this is so?
Just 2 quick minor questions I would like a clarification.
(1) I can convert csv files into arff files using weka arffviewer to open the csv file and then save it as an arff file. Is this correct? The inverse can also be done using arffviewer to convert arff file to csv files,
(2)Professor Eibe mentioned in my last question on meta.attributeselected.classifier that if the test set does not undergo feature selection. The attribute set (training set) will be ‘tuned’ to the test set. What does
Does it mean that if the AttributeSelectedClassifer is not used, WEKA will add artificial features to the training set to make it the same size (same number of attributes) as the test set? This would invalidate the test
process wouldn’t it?
Also, Can I manually create training and test set of the same attribute sets instead of using AttributeSelectedClassifer? E.g. I use CfssubsetEval on a training set, it gives me a set of attributes that was selected.
I write down this set of attributes, open both the training and test set data file and strip them down to the set of attributes I wrote down.
Then during classification, I no longer use AttributeSelectedClassifer, can I do that?
Finally a big THANK YOU for everyone’s help!!! I could not have come this far without your support!!