Doubt about StringToWordVector result

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Doubt about StringToWordVector result

Edward Wiskers
Hi, 

I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:

  * nice duck.
  * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

I have activated the TFTransformoption in  StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j


Thank you.
Edward




_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

result.png (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Doubt about StringToWordVector result

Manuel Leuenberger
Hi Edward,

If I am not mistaken, the default for the StringToWordVector is to output only word presence, i.e. 0 or 1, then apply the transformation. If you want word counts instead of presence, you need to set it explicitly using filter.setOutputWordCounts(true).

Cheers,
Manuel

On 11 May 2020, at 15:04, Edward Wiskers <[hidden email]> wrote:

Hi, 

I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:

  * nice duck.
  * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

I have activated the TFTransformoption in  StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j


Thank you.
Edward



<result.png>_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Doubt about StringToWordVector result

Edward Wiskers


Hi Manuel,


Appreciate the reply. Your answer is well-known, but has nothing to do with my results and doesn't help. For this reason, I am kindly asking *Weka developers* to answer my question:


I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:

  * nice duck.
  * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

I have activated the TFTransformoption in StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck (in the 2nd sentence)? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j


Thank you.
Edward

On Tue, 12 May 2020, 4:28 am Manuel Leuenberger, <[hidden email]> wrote:
Hi Edward,

If I am not mistaken, the default for the StringToWordVector is to output only word presence, i.e. 0 or 1, then apply the transformation. If you want word counts instead of presence, you need to set it explicitly using filter.setOutputWordCounts(true).

Cheers,
Manuel

On 11 May 2020, at 15:04, Edward Wiskers <[hidden email]> wrote:

Hi, 

I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:

  * nice duck.
  * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

I have activated the TFTransformoption in  StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j


Thank you.
Edward



<result.png>_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Doubt about StringToWordVector result

Eibe Frank-2
Administrator
Manuel’s answer is very much on the money. Perhaps you should try to think about things a little bit before dismissing really helpful and well-written information and posting your question again.

$ bc -l
l(1+1)
.69314718055994530941

f_{ij} is either 0 or 1 unless you turn on word counting.

Cheers,
Eibe

> On 12/05/2020, at 8:46 AM, Edward Wiskers <[hidden email]> wrote:
>
>
>
> Hi Manuel,
>
>
> Appreciate the reply. Your answer is well-known, but has nothing to do with my results and doesn't help. For this reason, I am kindly asking *Weka developers* to answer my question:
>
>
> I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:
>
>   * nice duck.
>   * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
>
> I have activated the TFTransformoption in StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck (in the 2nd sentence)? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j
>
>
> Thank you.
> Edward
>
> On Tue, 12 May 2020, 4:28 am Manuel Leuenberger, <[hidden email]> wrote:
> Hi Edward,
>
> If I am not mistaken, the default for the StringToWordVector is to output only word presence, i.e. 0 or 1, then apply the transformation. If you want word counts instead of presence, you need to set it explicitly using filter.setOutputWordCounts(true).
>
> Cheers,
> Manuel
>
>> On 11 May 2020, at 15:04, Edward Wiskers <[hidden email]> wrote:
>>
>> Hi,
>>
>> I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:
>>
>>   * nice duck.
>>   * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
>>
>> I have activated the TFTransformoption in  StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j
>>
>>
>> Thank you.
>> Edward
>>
>>
>>
>>
>> <result.png>_______________________________________________
>> Wekalist mailing list -- [hidden email]
>> Send posts to [hidden email]
>> To unsubscribe send an email to [hidden email]
>> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to [hidden email]
> To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to [hidden email]
> To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Doubt about StringToWordVector result

Edward Wiskers


$ bc -l
l(1+1)
.69314718055994530941

f_{ij} is either 0 or 1 unless you turn on word counting.

So this is applied for the 2nd sentence that I'm concerned about?

Edward 

Cheers,
Eibe

> On 12/05/2020, at 8:46 AM, Edward Wiskers <[hidden email]> wrote:
>
>
>
> Hi Manuel,
>
>
> Appreciate the reply. Your answer is well-known, but has nothing to do with my results and doesn't help. For this reason, I am kindly asking *Weka developers* to answer my question:
>
>
> I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:
>
>   * nice duck.
>   * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
>
> I have activated the TFTransformoption in StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck (in the 2nd sentence)? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j
>
>
> Thank you.
> Edward
>
> On Tue, 12 May 2020, 4:28 am Manuel Leuenberger, <[hidden email]> wrote:
> Hi Edward,
>
> If I am not mistaken, the default for the StringToWordVector is to output only word presence, i.e. 0 or 1, then apply the transformation. If you want word counts instead of presence, you need to set it explicitly using filter.setOutputWordCounts(true).
>
> Cheers,
> Manuel
>
>> On 11 May 2020, at 15:04, Edward Wiskers <[hidden email]> wrote:
>>
>> Hi,
>>
>> I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:
>>
>>   * nice duck.
>>   * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
>>
>> I have activated the TFTransformoption in  StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j
>>
>>
>> Thank you.
>> Edward
>>
>>
>>
>>
>> <result.png>_______________________________________________
>> Wekalist mailing list -- [hidden email]
>> Send posts to [hidden email]
>> To unsubscribe send an email to [hidden email]
>> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to [hidden email]
> To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to [hidden email]
> To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Doubt about StringToWordVector result

Eibe Frank-2
Administrator
Yes, the word occurs four times, but the default configuration of StringToWordVector will just use binary values (i.e., 0 or 1) to indicate whether a word is present or absent in a particular string attribute value.

Cheers,
Eibe

On 12/05/2020, at 9:40 AM, Edward Wiskers <[hidden email]> wrote:



$ bc -l
l(1+1)
.69314718055994530941

f_{ij} is either 0 or 1 unless you turn on word counting.

So this is applied for the 2nd sentence that I'm concerned about?

Edward 

Cheers,
Eibe

> On 12/05/2020, at 8:46 AM, Edward Wiskers <[hidden email]> wrote:
>
>
>
> Hi Manuel,
>
>
> Appreciate the reply. Your answer is well-known, but has nothing to do with my results and doesn't help. For this reason, I am kindly asking *Weka developers* to answer my question:
>
>
> I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:
>
>   * nice duck.
>   * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
>
> I have activated the TFTransformoption in StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck (in the 2nd sentence)? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j
>
>
> Thank you.
> Edward
>
> On Tue, 12 May 2020, 4:28 am Manuel Leuenberger, <[hidden email]> wrote:
> Hi Edward,
>
> If I am not mistaken, the default for the StringToWordVector is to output only word presence, i.e. 0 or 1, then apply the transformation. If you want word counts instead of presence, you need to set it explicitly using filter.setOutputWordCounts(true).
>
> Cheers,
> Manuel
>
>> On 11 May 2020, at 15:04, Edward Wiskers <[hidden email]> wrote:
>>
>> Hi,
>>
>> I am trying to use the StringToWordVector from the Preprocess panel on 2 sentences:
>>
>>   * nice duck.
>>   * If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
>>
>> I have activated the TFTransformoption in  StringToWordVector  and received the result in the attached file, my question is why I had 0.69 for the word duck? How this is computed in the formula: log(1+fij) ; fij is the frequency of word i in document (instance) j
>>
>>
>> Thank you.
>> Edward
>>
>>
>>
>>
>> <result.png>_______________________________________________
>> Wekalist mailing list -- [hidden email]
>> Send posts to [hidden email]
>> To unsubscribe send an email to [hidden email]
>> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to [hidden email]
> To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to [hidden email]
> To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html