Cumulative Bag-of-Words Model

class convokit.forecaster.cumulativeBoW.CumulativeBoW(vectorizer=None, clf_model=None, use_tokens=False, forecast_attribute_name: str = 'prediction', forecast_prob_attribute_name: str = 'score', decision_policy=None)

A cumulative bag-of-words forecasting model.

Parameters
  • vectorizer – optional vectorizer; default CV (min_df=10, max_df=0.5, ngram_range=(1,1), max_features=15000)

  • clf_model – optional classifier model; default standard-scaled logistic regression

  • use_tokens – if using default vectorizer, set this to true if input is already tokenized

  • forecast_attribute_name – name for DataFrame column containing predictions, default: “prediction”

  • forecast_prob_attribute_name – name for column containing prediction scores, default: “score”

fit(contexts, val_contexts=None)

Train this conversational forecasting model on the given data by fitting both the belief estimator and the decision policy.

Parameters
  • contexts – an iterator over context tuples

  • val_contexts – an optional second iterator over context tuples to be used as a separate held-out validation set. Concrete ForecasterModel implementations may choose to ignore this, or conversely even enforce its presence.

fit_belief_estimator(contexts, val_contexts=None)

Fit only the belief estimator component that produces continuous scores.

score(context) → float

Produce the belief estimator score for a context.

transform(contexts, forecast_attribute_name, forecast_prob_attribute_name)

Apply this trained conversational forecasting model to the given data, and return its forecasts in the form of a DataFrame indexed by (current) utterance ID

Parameters

contexts – an iterator over context tuples

Returns

a Pandas DataFrame, with one row for each context, indexed by the ID of that context’s current utterance. Contains two columns, one with raw probabilities named according to forecast_prob_attribute_name, and one with discretized (binary) forecasts named according to forecast_attribute_name. Subclass implementations of ForecasterModel MUST adhere to this return value specification!