string Attributes and addStringValue()

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

string Attributes and addStringValue()

Stephen E. Dewey

Hi there,

 

I inherited a program which uses the Weka API, and I’m a bit confused about the required usage for an Attribute of the “string” type.

 

As per the manual, the second argument to the Attribute constructor in such as case will be null:

Attribute string = new Attribute("name_of_attr", (ArrayList<String>) null);

 

After this, I’ve seen code which loops over all of the training data, adding all of the strings one-by-one:

for (int i = 0; i < data.size(); i++) {

            _textAttribute.addStringValue(data.get(i))

}

 

But I’ve also seen code that skips doing that. It isn’t calling addStringValue later on, instead it’s doing something like:

Instance instance = new SparseInstance(instances.numAttributes());

String messageStr = “abc”;

String label = “xyz”;

instance.setValue(classAttr, label);

instance.setValue(textAttr, messageStr);

instances.add(instance);

 

Note, I think the program might be broken currently. But in any case I’m a bit confused, is it required to add all of the string values to a string Attribute? Or can you just leave it empty? Are there other ways that “fill” it other than addStringValue?

 

Thanks a bunch!

Stephen


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: string Attributes and addStringValue()

Eibe Frank-2
Administrator
It should also work if you start with an empty string attribute, see the Javadoc of setValue():

--------------

/**
 * Sets a value of a nominal or string attribute to the given value. Performs
 * a deep copy of the vector of attribute values before the value is set.
 *
 * @param attIndex the attribute's index
 * @param value the new attribute value (If the attribute is a string
 *          attribute and the value can't be found, the value is added to the
 *          attribute).
 * @throws UnassignedDatasetException if the dataset is not set
 * @throws IllegalArgumentException if the selected attribute is not nominal
 *           or a string, or the supplied value couldn't be found for a
 *           nominal attribute
 */
public void setValue(int attIndex, String value);

--------------

Note that each time you call the setValue() method it makes a deep copy of the underlying double[] array that holds the data for the Instance, so you don’t want to use it to create instances with many attributes: the time required for constructing an instance in this manner is quadratic in the size of the instance. It is more efficient to create a double[] array in your program and then construct an Instance object from that array using an appropriate constructor method. The value of a string attribute in that double[] array needs to be set to the integer returned by addStringValue(), which will be the index of the value in the list of strings stored in the Attribute (obviously, the integer needs to be converted to a double) .

Anyway, if you just have a few attributes, I wouldn’t worry about the computational inefficiency and just use setValue().

Cheers,
Eibe


> On 14/02/2018, at 5:02 PM, Stephen E. Dewey <[hidden email]> wrote:
>
> Hi there,
>  
> I inherited a program which uses the Weka API, and I’m a bit confused about the required usage for an Attribute of the “string” type.
>  
> As per the manual, the second argument to the Attribute constructor in such as case will be null:
> Attribute string = new Attribute("name_of_attr", (ArrayList<String>) null);
>  
> After this, I’ve seen code which loops over all of the training data, adding all of the strings one-by-one:
> for (int i = 0; i < data.size(); i++) {
>             _textAttribute.addStringValue(data.get(i))
> }
>  
> But I’ve also seen code that skips doing that. It isn’t calling addStringValue later on, instead it’s doing something like:
> Instance instance = new SparseInstance(instances.numAttributes());
> String messageStr = “abc”;
> String label = “xyz”;
> instance.setValue(classAttr, label);
> instance.setValue(textAttr, messageStr);
> instances.add(instance);
>  
> Note, I think the program might be broken currently. But in any case I’m a bit confused, is it required to add all of the string values to a string Attribute? Or can you just leave it empty? Are there other ways that “fill” it other than addStringValue?
>  
> Thanks a bunch!
> Stephen
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html