WEKA’s output, in the Explorer and at the command-line, will show you, for each class, the F-measure for that class (obtained by treating that class as the positive class and the union of the other classes as the negative class).
WEKA will also show you the weighted average of these per-class F-measures. The weight for a class C is set to N(C) / N, where N(C) is the number of test instances in class C and N is the total number of test instances.
The Explorer and the command-line interface do not output an unweighted average of the per-class F-measures.
In my view, in most cases, AUROC or AUPRC are preferable to F-measure when comparing models. The F-measure is based on a single point contained in the recall-precision curve and the classifier is not optimised for this particular point. AUPRC, which WEKA also outputs, considers all points along the curve and is thus more robust.
If you really want to use F-measure, make sure you wrap all algorithms you are comparing into WEKA’s ThresholdSelector meta classifier and make sure that it is set to optimise F-measure. However, this is only applicable to datasets with two classes.