Cumulative Bag-of-Words Model¶
-
class
convokit.forecaster.cumulativeBoW.CumulativeBoW(vectorizer=None, clf_model=None, use_tokens=False, forecast_attribute_name: str = 'prediction', forecast_prob_attribute_name: str = 'score', decision_policy=None)¶ A cumulative bag-of-words forecasting model.
- Parameters
vectorizer – optional vectorizer; default CV (min_df=10, max_df=0.5, ngram_range=(1,1), max_features=15000)
clf_model – optional classifier model; default standard-scaled logistic regression
use_tokens – if using default vectorizer, set this to true if input is already tokenized
forecast_attribute_name – name for DataFrame column containing predictions, default: “prediction”
forecast_prob_attribute_name – name for column containing prediction scores, default: “score”
-
fit(contexts, val_contexts=None)¶ Train this conversational forecasting model on the given data by fitting both the belief estimator and the decision policy.
- Parameters
contexts – an iterator over context tuples
val_contexts – an optional second iterator over context tuples to be used as a separate held-out validation set. Concrete ForecasterModel implementations may choose to ignore this, or conversely even enforce its presence.
-
fit_belief_estimator(contexts, val_contexts=None)¶ Fit only the belief estimator component that produces continuous scores.
-
score(context) → float¶ Produce the belief estimator score for a context.
-
transform(contexts, forecast_attribute_name, forecast_prob_attribute_name)¶ Apply this trained conversational forecasting model to the given data, and return its forecasts in the form of a DataFrame indexed by (current) utterance ID
- Parameters
contexts – an iterator over context tuples
- Returns
a Pandas DataFrame, with one row for each context, indexed by the ID of that context’s current utterance. Contains two columns, one with raw probabilities named according to forecast_prob_attribute_name, and one with discretized (binary) forecasts named according to forecast_attribute_name. Subclass implementations of ForecasterModel MUST adhere to this return value specification!