Classifier¶
Example usage: politeness classification.
- 
class 
convokit.classifier.classifier.Classifier(obj_type: str, labeller: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>, clf_model: convokit.classifier.classifierModel.ClassifierModel = None, clf_attribute_name: str = 'prediction', clf_prob_attribute_name: str = 'probability', pred_feats: List[str] = None)¶ Transformer that trains a classifier on the specified features of a Corpus’s objects.
Runs on the Corpus’s Speakers, Utterances, or Conversations (as specified by obj_type).
- Parameters
 obj_type – type of Corpus object to classify: ‘conversation’, ‘speaker’, or ‘utterance’
labeller – a (lambda) function that takes a Corpus object and returns True (y=1) or False (y=0) - i.e. labeller defines the y value of the object for fitting
clf_model – instance of a classifier model of type convokit.classifier.classifier.ClassifierModel
clf_attribute_name – the metadata attribute name to store the classifier prediction value under; default: “prediction”
clf_prob_attribute_name – the metadata attribute name to store the classifier prediction score under; default: “pred_score”
pred_feats – (Please note: usage of pred_feats is no longer recommended—users should define their own prediction features using
- their own custom dataset.) list of metadata attributes containing the features to be used in prediction.
 If the metadata attribute contains a dictionary, all the keys of the dictionary will be included in pred_feats. Each feature used should have a numeric/boolean type.
- 
accuracy(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Calculate the accuracy of the classification
- Parameters
 corpus – target Corpus
selector – (lambda) function selecting objects to include in this accuracy calculation; uses all objects by default
- Returns
 float value
- 
base_accuracy(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Get the base accuracy, i.e. the maximum of the percentages of results that are y=1 and y=0
- Parameters
 corpus – the classified Corpus
selector – (lambda) function selecting objects to include in this accuracy calculation; uses all objects by default
- Returns
 float value
- 
classification_report(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Generate classification report for transformed corpus using labeller for y_true and clf_attribute_name as y_pred
- Parameters
 corpus – target Corpus
selector – (lambda) function selecting objects to include in this classification report
- Returns
 classification report
- 
confusion_matrix(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Generate confusion matrix for transformed corpus using labeller for y_true and clf_attribute_name as y_pred
- Parameters
 corpus – target Corpus
selector – (lambda) function selecting objects to include in this confusion_matrix; uses all objects by default
- Returns
 sklearn confusion matrix
- 
evaluate_with_cv(corpus: convokit.model.corpus.Corpus = None, objs: List[convokit.model.corpusComponent.CorpusComponent] = None, cv=sklearn.model_selection.KFold, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Please note that Classifier.pred_feats is a deprecated attribute, and so this function may have undefined behavior. Evaluate the performance of predictive features (Classifier.pred_feats) in predicting for the label, using cross-validation for data splitting.
This method can be run on either a Corpus (passed in as the corpus parameter) or a list of Corpus component objects (passed in as the objs parameter). If run on a Corpus, the cross-validation will be run with the Classifier’s labeller and obj_type settings, and the selector parameter of this function.
- Parameters
 corpus – target Corpus (do not pass in objs if using this)
objs – target list of Corpus objects (do not pass in corpus if using this)
cv – cross-validation model to use: KFold(n_splits=5, shuffle=True) by default.
selector – if running on a Corpus, this is a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- Returns
 cross-validated accuracy score
- 
evaluate_with_train_test_split(corpus: convokit.model.corpus.Corpus = None, objs: List[convokit.model.corpusComponent.CorpusComponent] = None, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>, test_size: float = 0.2)¶ Please note that Classifier.pred_feats is a deprecated attribute, and so this function may have undefined behavior. Evaluate the performance of predictive features (Classifier.pred_feats) in predicting for the label, using a train-test split.
Run either on a Corpus (with Classifier labeller, selector, obj_type settings) or a list of Corpus objects
- Parameters
 corpus – target Corpus
objs – target list of Corpus objects
selector – if running on a Corpus, this is a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
test_size – size of test set
- Returns
 accuracy and confusion matrix
- 
fit(context_type: str, corpus: convokit.model.corpus.Corpus, y=None, context_selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>, val_context_selector: Optional[Callable[[convokit.model.corpusComponent.CorpusComponent], bool]] = None)¶ Trains the Transformer’s classifier model, with an optional selector that filters for objects to be fit on.
- Parameters
 context_type – type of Corpus object to classify: ‘conversation’, ‘speaker’, or ‘utterance’
corpus – target Corpus
context_selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the context_selector includes all objects of the specified type in the Corpus.
context_selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the val_context_selector is None.
- Returns
 the fitted Classifier Transformer
- 
fit_transform(corpus: convokit.model.corpus.Corpus, y=None, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>) → convokit.model.corpus.Corpus¶ Fit and run the Transformer on a single Corpus.
- Parameters
 corpus – the Corpus to use
- Returns
 same as transform
- 
get_coefs(feature_names: List[str], coef_func=None)¶ Get dataframe of classifier coefficients
- Parameters
 feature_names – list of feature names to get coefficients for
coef_func – function for accessing the list of coefficients from the classifier model; by default, assumes it is a pipeline with a logistic regression component
- Returns
 DataFrame of features and coefficients, indexed by feature names
- 
get_model()¶ Gets the Classifier’s internal model
- 
get_y_true_pred(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Get lists of true and predicted labels
- Parameters
 corpus – target Corpus
selector – (lambda) function selecting objects to get labels for; uses all objects by default
- Returns
 list of true labels, and list of predicted labels
- 
set_model(clf)¶ Sets the Classifier’s internal model
- 
summarize(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Generate a pandas DataFrame (indexed by object id, with prediction and prediction score columns) of classification results.
Run either on a target Corpus or a list of Corpus objects
- Parameters
 corpus – target Corpus
selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- Returns
 pandas DataFrame indexed by Corpus object id
- 
summarize_objs(objs: List[convokit.model.corpusComponent.CorpusComponent])¶ Generate a pandas DataFrame (indexed by object id, with prediction and prediction score columns) of classification results.
Runs on a list of Corpus objects.
- Parameters
 objs – list of Corpus objects
- Returns
 pandas DataFrame indexed by Corpus object id
- 
transform(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>) → convokit.model.corpus.Corpus¶ Run classifier on given corpus’s objects and annotate them with the predictions and prediction scores, with an optional selector that filters for objects to be classified. Objects that are not selected will get a metadata value of ‘None’ instead of the classifier prediction.
- Parameters
 corpus – target Corpus
selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- Returns
 annotated Corpus
- 
transform_objs(objs: List[convokit.model.corpusComponent.CorpusComponent], selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>) → List[convokit.model.corpusComponent.CorpusComponent]¶ Run classifier on list of Corpus objects and annotate them with the predictions and prediction scores
- Parameters
 objs – list of Corpus objects
- Returns
 list of annotated Corpus objects