LSA implementation

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

LSA implementation

andria lan

Hi everyone,

Is there any reliable article (that is recommended by WEKA's people) about using LSA with Ranker search method?

Thanks in advance.

Andria


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: LSA implementation

Eibe Frank-2
Administrator
I'd use MultiSearch to find an appropriate number of latent variables. Here is an example:

  java weka.Run .MultiSearch -E ACC -W .AttributeSelectedClassifier -search ".MathParameter -property search.numToSelect -min 1 -max 4 -step 1.0 -base 10.0 -expression I" -t ~/datasets/UCI/iris.arff -- -W .SMO -E ".LatentSemanticAnalysis -N -R 0.99999" -S .Ranker

In this example, which is based on the iris data (just for illustrative purposes, and not because it is particularly useful here), SMO is used as the base learner and normalization is turned on for LSA. Also, the -R parameter is set with a value close to 1 so that *all* latent variables are actually being considered. Additional parameters are set so that 1 to 4 extracted features are considered by MultiSearch. Classification accuracy is what is optimized (ACC).

You can paste this configuration into the Classify panel of the Explorer if you omit "java weka.Run" at the start and also omit the specification of the training set ("-t ~/datasets/UCI/iris.arff").

Cheers,
Eibe




> On 3 Jun 2017, at 17:24, Andria Lan <[hidden email]> wrote:
>
> Hi everyone,
>
> Is there any reliable article (that is recommended by WEKA's people) about using LSA with Ranker search method?
>
> Thanks in advance.
>
> Andria
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: LSA implementation

andria lan
Dear Eibe, 

First of all, I'd like to thank you for the very useful contribution, and I'd highly appreciate if you could help me with the following issues:

1- Why did you select the "MultiSearch" method?
2- What are the recommended cases for using the "MultiSearch" method?
3- Why normalization is turned on for LSA, especially that the iris dataset seems that doesn't require to be normalized? 
4- When one could turn the normalization on?
5- What type of normalization is existed in LSA (i.e., attribute or instance)?
6- Why the -R parameter has set close to 1? 
7- Is setting the -R parameter close to 1 is always recommended--every time we use the LatentSemanticAnalysis?
8- Do you think that LatentSemanticAnalysis is *only* useful when performing natural language processing?

Thanks and regards, 
Andria


On Sun, Jun 4, 2017 at 9:21 AM, Eibe Frank <[hidden email]> wrote:
I'd use MultiSearch to find an appropriate number of latent variables. Here is an example:

  java weka.Run .MultiSearch -E ACC -W .AttributeSelectedClassifier -search ".MathParameter -property search.numToSelect -min 1 -max 4 -step 1.0 -base 10.0 -expression I" -t ~/datasets/UCI/iris.arff -- -W .SMO -E ".LatentSemanticAnalysis -N -R 0.99999" -S .Ranker

In this example, which is based on the iris data (just for illustrative purposes, and not because it is particularly useful here), SMO is used as the base learner and normalization is turned on for LSA. Also, the -R parameter is set with a value close to 1 so that *all* latent variables are actually being considered. Additional parameters are set so that 1 to 4 extracted features are considered by MultiSearch. Classification accuracy is what is optimized (ACC).

You can paste this configuration into the Classify panel of the Explorer if you omit "java weka.Run" at the start and also omit the specification of the training set ("-t ~/datasets/UCI/iris.arff").

Cheers,
Eibe




> On 3 Jun 2017, at 17:24, Andria Lan <[hidden email]> wrote:
>
> Hi everyone,
>
> Is there any reliable article (that is recommended by WEKA's people) about using LSA with Ranker search method?
>
> Thanks in advance.
>
> Andria
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: LSA implementation

Eibe Frank-2
Administrator

> On 4 Jun 2017, at 15:16, Andria Lan <[hidden email]> wrote:
>
> 1- Why did you select the "MultiSearch" method?

GridSearch only works for exactly two parameters, and CVParameterSelection cannot optimise nested parameters.

> 2- What are the recommended cases for using the "MultiSearch" method?

Principally, whenever one of the above two methods is not applicable.

> 3- Why normalization is turned on for LSA, especially that the iris dataset seems that doesn't require to be normalized?

No strong reason. If I remember correctly, I observed somewhat better accuracy with normalisation.

> 4- When one could turn the normalization on?

When you have bag-of-words vectors such as those generated by StringToWordVector, I wouldn't use attribute normalisation.

> 5- What type of normalization is existed in LSA (i.e., attribute or instance)?

Attribute normalisation.

> 6- Why the -R parameter has set close to 1?

If you set it to exactly 1, it will select exactly 1 latent component. If you set it to a value significantly smaller than 1, it will not give you all components.

> 7- Is setting the -R parameter close to 1 is always recommended--every time we use the LatentSemanticAnalysis?

Only if you don't want to limit the number of latent variables based on the sum of their singular values.

> 8- Do you think that LatentSemanticAnalysis is *only* useful when performing natural language processing?

It's generally used for NLP but there might be other cases where it is useful, not sure.

Cheers,
Eibe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: LSA implementation

andria lan

Thanks a lot for the extensive information. Finally, just wanted to confirm the following issue:

> GridSearch only works for exactly two parameters, and CVParameterSelection cannot optimise nested parameters.

Did you mean that MultiSearch works similar to GridSearch for the goal of parameters optimisation. Right?

Andria

>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: LSA implementation

Eibe Frank-2
Administrator

> On 5/06/2017, at 10:27 PM, Andria Lan <[hidden email]> wrote:
>
> Thanks a lot for the extensive information. Finally, just wanted to confirm the following issue:
>
> > GridSearch only works for exactly two parameters, and CVParameterSelection cannot optimise nested parameters.
>
> Did you mean that MultiSearch works similar to GridSearch for the goal of parameters optimisation. Right?

Yes. GridSearch has the option to extend the grid if necessary, which MultiSearch doesn’t have, but it can only deal with optimisation problems involving exactly two parameters.

Cheers,
Eibe


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: LSA implementation

andria lan

Many thanks, Eibe. Highly appreciat your extensive help.

Good luck to you.

Kind regards,
Andria

On 6 Jun 2017 6:48 a.m., "Eibe Frank" <[hidden email]> wrote:

> On 5/06/2017, at 10:27 PM, Andria Lan <[hidden email]> wrote:
>
> Thanks a lot for the extensive information. Finally, just wanted to confirm the following issue:
>
> > GridSearch only works for exactly two parameters, and CVParameterSelection cannot optimise nested parameters.
>
> Did you mean that MultiSearch works similar to GridSearch for the goal of parameters optimisation. Right?

Yes. GridSearch has the option to extend the grid if necessary, which MultiSearch doesn’t have, but it can only deal with optimisation problems involving exactly two parameters.

Cheers,
Eibe


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html