GridWeka crossvalidation task problems

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

GridWeka crossvalidation task problems

Nisarg  Vyas


Hi all,

  This is the question about the parallel architecture version of weka called 'gridweka'.
  I want to perform crossvalidation tasks parallely (on different 'server' {in GridWeka's terminology} machines).
In the original grid weka, there is a provision of performing crossvalidation tasks in a distributed environment, but for me it does not work sometimes.

The control flow (according to me) is: There are several servers, which wait for the connection, a client initiates the process, and then communicates to the servers and each server carry out the crossvalidation task for corrsponding fold (e.g. server 1 will carry tasks for fold 1 (splitting the data, building the classifier and classifying), server 2 for fold 2 and so on). If number of clients are less than number of folds, then client assigns operations to servers for each fold by round-robin scheduling (It also checks the computational load and resources on the servers).  Then all servers respond to the client and client displays the final results.

When I perform this task, it works for several folds and then the communication between client and servers suddenly stops and my operation hangs.  If I do not use parallel options, same CV task is carried out perfectly. So, any suggestions or insight regarding this matter will be highly appreciated.
I do not have crystal clear idea about how Grid Weka (and hence Java )handles client/server communications and tasks. Any insight about that will be helpful, too.

Thank you,
Nisarg




_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

interpreatin the 'genetic search' using the 'Wrapper Subset Evaluator' results

rich@thevillas.eclipse.co.uk
Hi,


I have done a 'genetic search'  using the 'Wrapper Subset Evaluator'
with the output below.


I understand that the merit is a score whereby a 'good feature subsets
contain features highly correlated with the class, yet uncorrelated with
each other', but what is the 'scaled' score?.
I expected the merit scores to be in order asc/desc order of merit but
they don't appear to be?
Did the first member in the population evolve into the first member in
the list of members after 20 generations?

thanks for any help

Rich

=== Attribute Selection on all input data ===

Search Method:
    Genetic search.
    Start set: no attributes
    Population size: 20
    Number of generations: 20
    Probability of crossover:  0.6
    Probability of mutation:  0.033
    Report frequency: 20
    Random number seed: 1

Initial population
merit       scaled      subset
 0.27898     0.23804    9 16 17 27 29 30
 0.28407     0.22905    1 19 20 27
 0.21856     0.34474    7 8 13 16 20 21 22 24 27 28 29 30 32 33 34 36 39
 0.31133     0.18092    2 16 17 22 23 24 25 26 28 32 33 35 36
 0.39737     0.02899    9 16 21 24 38
 0.38144     0.05712    8 12 17 18 25 26 29 34 37
 0.37274     0.07249    4 15 25 34
 0.26453     0.26356    3 4 6 7 9 10 12 14 15 16 17 18 19 20 21 22 26 28
29 30 31 32 33 34 39
 0.22397     0.33517    7 9 12 14 18 19 21 22 23 24 27 29 34 36 39
 0.25977     0.27196    1 2 5 7 8 9 10 11 12 13 14 18 19 22 23 24 25 27
28 31 32 35 36
 0.24056     0.30589    1 2 3 4 5 6 14 16 18 19 20 21 22 26 27 31 33 34
37 39
 0.16338     0.44216    1 2 7 10 11 12 15 16 19 20 21 23 29 30 32 33 35
37 38 39
 0.15764     0.45231    2 3 4 5 6 9 12 13 14 18 20 21 22 23 24 27 28 32
36 38 39
 0.18506     0.40389    1 4 6 7 8 9 10 11 13 14 15 19 20 22 23 24 25 27
29 31 32 33 38 39
 0.41379     0          4 9 11 16 26 28 36
 0.23645     0.31314    1 2 4 5 7 8 10 11 13 15 18 19 20 21 28 29 32 34
35 36 37 38
 0.20854     0.36243    5 7 8 11 13 18 22 27 28 34 36 39
 0.29475     0.21021    2 4 7 15 16 17 20 25 27 32 34 38
 0.15714     0.45318    1 2 3 5 7 9 10 15 18 20 22 24 27 28 29 30 38 39
 0.2335      0.31835    1 2 7 8 9 10 11 12 14 15 16 20 21 23 25 27 28 29
31 35 37 38

Generation: 20
merit       scaled      subset
 0.13498     0.18752    10 14 18 27 28 29 30 33 35 36 37 38 39
 0.13498     0.18752    10 14 18 27 28 29 30 33 35 36 37 38 39
 0.13498     0.18752    10 14 18 27 28 29 30 33 35 36 37 38 39
 0.14745     0.15513    10 11 14 18 27 28 29 30 33 36 37 38 39
 0.14466     0.16237    3 7 8 10 18 21 26 27 28 29 30 33 35 36 37 38 39
 0.15665     0.13126    4 10 14 18 23 27 28 29 32 33 34 37 38 39
 0.15402     0.13808    2 3 10 14 16 18 27 29 30 31 33 34 35 36 37 38 39
 0.14433     0.16323    1 2 3 4 6 10 13 15 17 20 22 26 27 28 30 32 33 35
36 37 38 39
 0.14581     0.15939    1 2 3 4 6 10 11 12 15 17 20 22 26 27 28 29 32 34
35 36 37 38 39
 0.13498     0.18752    10 14 18 27 28 29 30 33 35 36 37 38 39
 0.20722     0          4 10 11 14 18 21 23 27 28 29 36 37 39
 0.14433     0.16323    2 3 4 10 14 18 19 26 27 28 29 30 33 35 36 37 38 39
 0.15747     0.12913    1 2 4 7 8 10 18 23 27 28 29 32 34 37 38 39
 0.14844     0.15257    3 4 6 10 11 15 17 20 22 23 26 27 28 29 32 34 35
36 37 38 39
 0.15534     0.13467    10 18 27 28 29 30 32 33 37 38 39
 0.14614     0.15854    6 7 8 10 14 27 28 29 30 33 35 38 39
 0.15944     0.12402    14 18 19 20 27 28 29 30 32 33 37 38 39
 0.15205     0.1432     6 7 8 10 18 20 27 28 29 30 33 36 37 38 39
 0.13826     0.179      10 14 18 21 27 28 29 30 33 35 36 37 38 39
 0.15025     0.14788    1 10 13 14 18 21 23 25 26 27 28 29 36 37 38 39

Attribute Subset Evaluator (supervised, Class (nominal): 40 disease):
    Wrapper Subset Evaluator
    Learning scheme: weka.classifiers.trees.J48
    Scheme options: -C 0.25 -M 2          
    Accuracy estimation: classification error
    Number of folds for accuracy estimation: 5

Selected attributes: 10,14,18,27,28,29,30,33,35,36,37,38,39 : 13

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist