
class pystruct.learners.SubgradientSSVM(model, max_iter=100, C=1.0, verbose=0, momentum=0.0, learning_rate='auto', n_jobs=1, show_loss_every=0, decay_exponent=1, break_on_no_constraints=True, logger=None, batch_size=None, decay_t0=10, averaging=None, shuffle=False)[source]

Structured SVM solver using subgradient descent.

Implements a margin rescaled with l1 slack penalty. By default, a constant learning rate is used. It is also possible to use the adaptive learning rate found by AdaGrad.

This class implements online subgradient descent. If n_jobs != 1, small batches of size n_jobs are used to exploit parallel inference. If inference is fast, use n_jobs=1.


model : StructuredModel

Object containing model structure. Has to implement loss, inference and loss_augmented_inference.

max_iter : int, default=100

Maximum number of passes over dataset to find constraints and perform updates.

C : float, default=1.

Regularization parameter.

verbose : int, default=0


learning_rate : float or ‘auto’, default=’auto’

Learning rate used in subgradient descent. If ‘auto’, the pegasos schedule is used, which starts with learning_rate = n_samples * C.

momentum : float, default=0.0

Momentum used in subgradient descent.

n_jobs : int, default=1

Number of parallel jobs for inference. -1 means as many as cpus.

batch_size : int, default=None

Ignored if n_jobs > 1. If n_jobs=1, inference will be done in mini batches of size batch_size. If n_jobs=-1, batch learning will be performed, that is the whole dataset will be used to compute each subgradient.

show_loss_every : int, default=0

Controlls how often the hamming loss is computed (for monitoring purposes). Zero means never, otherwise it will be computed very show_loss_every’th epoch.

decay_exponent : float, default=1

Exponent for decaying learning rate. Effective learning rate is learning_rate / (decay_t0 + t)** decay_exponent. Zero means no decay.

decay_t0 : float, default=10

Offset for decaying learning rate. Effective learning rate is learning_rate / (decay_t0 + t)** decay_exponent.

break_on_no_constraints : bool, default=True

Break when there are no new constraints found.

logger : logger object.

averaging : string, default=None

Whether and how to average weights. Possible options are ‘linear’, ‘squared’ and None. The string reflects the weighting of the averaging:

  • linear: w_avg ~ w_1 + 2 * w_2 + ... + t * w_t
  • squared: w_avg ~ w_1 + 4 * w_2 + ... + t**2 * w_t

Uniform averaging is not implemented as it is worse than linear weighted averaging or no averaging.

shuffle : bool, default=False

Whether to shuffle the dataset in each iteration.


w : nd-array, shape=(model.size_joint_feature,)

The learned weights of the SVM.

``loss_curve_`` : list of float

List of loss values if show_loss_every > 0.

``objective_curve_`` : list of float

Primal objective after each pass through the dataset.

``timestamps_`` : list of int

Total training time stored before each iteration.


  • Nathan Ratliff, J. Andrew Bagnell and Martin Zinkevich:

    (Online) Subgradient Methods for Structured Prediction, AISTATS 2007

  • Shalev-Shwartz, Shai and Singer, Yoram and Srebro, Nathan and Cotter,

    Andrew: Pegasos: Primal estimated sub-gradient solver for svm, Mathematical Programming 2011


fit(X, Y[, constraints, warm_start, initialize]) Learn parameters using subgradient descent.
get_params([deep]) Get parameters for this estimator.
predict(X) Predict output on examples in X.
score(X, Y) Compute score as 1 - loss over whole data set.
set_params(**params) Set the parameters of this estimator.
__init__(model, max_iter=100, C=1.0, verbose=0, momentum=0.0, learning_rate='auto', n_jobs=1, show_loss_every=0, decay_exponent=1, break_on_no_constraints=True, logger=None, batch_size=None, decay_t0=10, averaging=None, shuffle=False)[source]
fit(X, Y, constraints=None, warm_start=False, initialize=True)[source]

Learn parameters using subgradient descent.


X : iterable

Traing instances. Contains the structured input objects. No requirement on the particular form of entries of X is made.

Y : iterable

Training labels. Contains the strctured labels for inputs in X. Needs to have the same length as X.

constraints : None

Discarded. Only for API compatibility currently.

warm_start : boolean, default=False

Whether to restart a previous fit.

initialize : boolean, default=True

Whether to initialize the model for the data. Leave this true except if you really know what you are doing.


Get parameters for this estimator.


deep: boolean, optional :

If True, will return the parameters for this estimator and contained subobjects that are estimators.


params : mapping of string to any

Parameter names mapped to their values.


Predict output on examples in X.


X : iterable

Traing instances. Contains the structured input objects.


Y_pred : list

List of inference results for X using the learned parameters.

score(X, Y)

Compute score as 1 - loss over whole data set.

Returns the average accuracy (in terms of model.loss) over X and Y.


X : iterable

Evaluation data.

Y : iterable

True labels.


score : float

Average of 1 - loss over training examples.


Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self :