Optimisers

Functions:

adadelta(lr, clip_args, **kwargs)

Adadelta optimization is a stochastic gradient descent method that is based on adaptive learning rate per

adagrad(lr, clip_args, **kwargs)

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training.

adam(lr, clip_args, **kwargs)

Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. According to the paper Adam: A Method for Stochastic Optimization.Kingma et al., 2014, the method is ‘computationally efficient, has little memory requirement,invariant to diagonal rescaling of gradients, and is well suited for problems that are large in terms of data/parameters’.

adamax(lr, clip_args, **kwargs)

It is a variant of Adam based on the infinity norm.

get_optimiser([optimiser_type, lr])

Utility function for returning a Keras optimiser.

rmsprop(lr, clip_args, **kwargs)

This optimizer is usually a good choice for recurrent neural networks.

sgd(lr, clip_args, **kwargs)

Stochastic Gradient Descent.

optimisers.adadelta(lr, clip_args, **kwargs)
Adadelta optimization is a stochastic gradient descent method that is based on adaptive learning rate per

dimension to address two drawbacks: 1) the continual decay of learning rates throughout training 2) the need for a manually selected global learning rate

Two accumulation steps are required: 1) the accumulation of gradients squared, 2) the accumulation of updates squared.

Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done. Compared to Adagrad, in the original version of Adadelta you don’t have to set an initial learning rate. In this version, initial learning rate can be set, as in most other Keras optimizers.

optimisers.adagrad(lr, clip_args, **kwargs)

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.

optimisers.adam(lr, clip_args, **kwargs)

Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. According to the paper Adam: A Method for Stochastic Optimization.Kingma et al., 2014, the method is ‘computationally efficient, has little memory requirement,invariant to diagonal rescaling of

gradients, and is well suited for problems that are large in terms of data/parameters’.

optimisers.adamax(lr, clip_args, **kwargs)

It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the paper. Adamax is sometimes superior to adam, specially in models with embeddings.

optimisers.get_optimiser(optimiser_type='sgd', lr=0.01, **kwargs)

Utility function for returning a Keras optimiser. https://www.tensorflow.org/api_docs/python/tf/keras/optimizers

If no arguments provided then defaults to SGD with learning rate of 0.01.

Parameters
  • optimiser_type (str) – The name of the optimisation algorithm, must be in the optimisers dict

  • lr (float) – The learning rate to use

  • kwargs (dict) – Can contain any keyword arguments for the keras optimisers, otherwise uses default values

Returns

An instance of a keras optimiser

Return type

optimiser (tf.keras.optimizers.Optimizer)

optimisers.rmsprop(lr, clip_args, **kwargs)

This optimizer is usually a good choice for recurrent neural networks.

optimisers.sgd(lr, clip_args, **kwargs)

Stochastic Gradient Descent.