I am looking to utilizing a deep learning approach, where I want to use a neural network with one hidden layer. I noticed that there are algorithms such as the MultiLayerPerceptrons by Eibe Frank. One of the problems that I am finding with WekaDeepLearning4J is that it takes a relatively longer time for completion compared with the other algorithms. Is there a significant difference from this algorithm versus if I utilized WekaDeepLearning4J with one hidden layer?
Re: WekaDeepLearnig4J Versus Multilayer Perceptron
The MLPClassifier, MLPRegressor, and MLPAutoEncoder in the multiLayerPerceptrons package are all limited to neural networks with one hidden layer only, and they build fully connected networks. If these constraints are suitable for your application, they are good options because they don’t require much configuration to give reasonable results, particularly the former two: tuning the number of units in the hidden layer and the ridge parameter (the multiplier for the L_2 penalty) is all you generally need to do.
The idea with these implementations is that they run until full convergence to a local minimum of the penalised loss on the training data. The assumption is that a suitable choice for the ridge parameter value does not require early stopping on simple networks of this type to prevent overfitting.
Rather than using plain gradient descent, optimisation of the loss is done using either a quasi-Newton approach (using the default method called “BFGS”) or conjugated gradient descent, using a line search in both cases, and a choice of learning rate is thus not required (a big plus in my book!).
Note that most people would consider these “shallow” networks, not “deep” ones.