Conversation¶

class convokit.model.conversation.Conversation(owner, id: Optional[str] = None, utterances: Optional[List[str]] = None, meta: Optional[Dict] = None)¶

Represents a discrete subset of utterances in the dataset, connected by a reply-to chain.

Parameters

owner – The Corpus that this Conversation belongs to
id – The unique ID of this Conversation
utterances – A list of the IDs of the Utterances in this Conversation
meta – Table of initial values for conversation-level metadata

Variables

id – the ID of the Conversation
meta – A dictionary-like view object providing read-write access to conversation-level metadata.

add_meta(key: str, value) → None¶

Adds a key-value pair to the metadata of the corpus object

Parameters

key – name of metadata attribute
value – value of metadata attribute

Returns

None

add_vector(vector_name: str)¶

Logs in the Corpus component object’s internal vectors list that the component object has a vector row associated with it in the vector matrix named vector_name.

Transformers that add vectors to the Corpus should use this to update the relevant component objects during the transform() step.

Parameters: vector_name – name of vector matrix
Returns: None

check_integrity(verbose: bool = True) → bool¶

Check the integrity of this Conversation; i.e. do the constituent utterances form a complete reply-to chain?

Parameters: verbose – whether to print errors indicating the problems with the Conversation
Returns: True if the conversation structure is complete else False

delete_vector(vector_name: str)¶

Delete a vector associated with this Corpus component object.

Parameters: vector_name –
Returns: None

get_chronological_speaker_list(selector: Callable[[convokit.model.speaker.Speaker], bool] = <function Conversation.<lambda>>)¶

Get the speakers in the conversation sorted in chronological order (speakers may appear more than once)

Parameters: selector – (lambda) function for which speakers should be included; all speakers are included by default
Returns: list of speakers for each chronological utterance

get_chronological_utterance_list(selector: Callable[[convokit.model.utterance.Utterance], bool] = <function Conversation.<lambda>>)¶

Get the utterances in the conversation sorted in increasing order of timestamp

Parameters: selector – function for which utterances should be included; all utterances are included by default
Returns: list of utterances, sorted by timestamp

get_longest_paths() → List[List[convokit.model.utterance.Utterance]]¶

Finds the Utterances form the longest path (i.e. root to leaf) in the Conversation tree. If there are multiple paths with tied lengths, returns all of them as a list of lists. If only one such path exists, a list containing a single list of Utterances is returned.

Returns: a list of lists of Utterances

get_root_to_leaf_paths() → List[List[convokit.model.utterance.Utterance]]¶

Get the paths (stored as a list of lists of utterances) from the root to each of the leaves in the conversational tree

Returns: List of lists of Utterances

get_speaker(speaker_id: str) → convokit.model.speaker.Speaker¶

Looks up the Speaker with the given name. Raises a KeyError if no speaker with that name exists.

Returns: the Speaker with the given speaker_id

get_speaker_ids() → List[str]¶

Produces a list of ids of all speakers in the Conversation, which can be used in calls to get_speaker() to retrieve specific speakers. Provides no ordering guarantees for the list.

Returns: a list of speaker ids

get_speakers_dataframe(selector: Optional[Callable[[convokit.model.speaker.Speaker], bool]] = <function Conversation.<lambda>>, exclude_meta: bool = False)¶

Get a DataFrame of the Speakers that have participated in the Conversation with fields and metadata attributes, with an optional selector that filters Speakers that should be included. Edits to the DataFrame do not change the corpus in any way.

param exclude_meta

whether to exclude metadata

param selector

selector: a (lambda) function that takes a Speaker and returns True or False (i.e. include / exclude). By default, the selector includes all Speakers in the Conversation.

return

a pandas DataFrame

get_subtree(root_utt_id)¶

Get the utterance node of the specified input id

Parameters: root_utt_id – id of the root node that the subtree starts from
Returns: UtteranceNode object

get_utterance(ut_id: str) → convokit.model.utterance.Utterance¶

Looks up the Utterance associated with the given ID. Raises a KeyError if no utterance by that ID exists.

Returns: the Utterance with the given ID

get_utterance_ids() → List[str]¶

Produces a list of the unique IDs of all utterances in the Conversation, which can be used in calls to get_utterance() to retrieve specific utterances. Provides no ordering guarantees for the list.

Returns: a list of IDs of Utterances in the Conversation

get_utterances_dataframe(selector=<function Conversation.<lambda>>, exclude_meta: bool = False)¶

Get a DataFrame of the Utterances in the COnversation with fields and metadata attributes. Set an optional selector that filters Utterances that should be included. Edits to the DataFrame do not change the corpus in any way.

Parameters

exclude_meta – whether to exclude metadata
selector – a (lambda) function that takes a Utterance and returns True or False (i.e. include / exclude). By default, the selector includes all Utterances in the Conversation.

Returns

a pandas DataFrame

get_vector(vector_name: str, as_dataframe: bool = False, columns: Optional[List[str]] = None)¶

Get the vector stored as vector_name for this object.

Parameters

vector_name – name of vector
as_dataframe – whether to return the vector as a dataframe (True) or in its raw array form (False). False by default.
columns – optional list of named columns of the vector to include. All columns returned otherwise. This parameter is only used if as_dataframe is set to True

Returns

a numpy / scipy array

iter_speakers(selector: Callable[[convokit.model.speaker.Speaker], bool] = <function Conversation.<lambda>>) → Generator[convokit.model.speaker.Speaker, None, None]¶

Get Speakers that have participated in the Conversation, with an optional selector that filters for Speakers that should be included.

param selector

a (lambda) function that takes a Speaker and returns True or False (i.e. include / exclude). By default, the selector includes all Speakers in the Conversation.

return

a generator of Speakers

iter_utterances(selector: Callable[[convokit.model.utterance.Utterance], bool] = <function Conversation.<lambda>>) → Generator[convokit.model.utterance.Utterance, None, None]¶

Get utterances in the Corpus, with an optional selector that filters for Utterances that should be included.

Parameters: selector – a (lambda) function that takes an Utterance and returns True or False (i.e. include / exclude). By default, the selector includes all Utterances in the Conversation.
Returns: a generator of Utterances

print_conversation_stats()¶

Helper function for printing the number of Utterances and Spekaers in the Conversation.

Returns: None (prints output)

print_conversation_structure(utt_info_func: Callable[[convokit.model.utterance.Utterance], str] = <function Conversation.<lambda>>, limit: int = None) → None¶

Prints an indented representation of utterances in the Conversation with conversation reply-to structure determining the indented level. The details of each utterance to be printed can be configured.

If limit is set to a value other than None, this will annotate utterances with an ‘order’ metadata indicating their temporal order in the conversation, where the first utterance in the conversation is annotated with 1.

Parameters

utt_info_func – callable function taking an utterance as input and returning a string of the desired utterance information. By default, this is a lambda function returning the utterance’s speaker’s id
limit – maximum number of utterances to print out. if k, this includes the first k utterances.

Returns

None. Prints to stdout.

retrieve_meta(key: str)¶

Retrieves a value stored under the key of the metadata of corpus object

Parameters: key – name of metadata attribute
Returns: value

traverse(traversal_type: str, as_utterance: bool = True)¶

Traverse through the Conversation tree structure in a breadth-first search (‘bfs’), depth-first search (dfs), pre-order (‘preorder’), or post-order (‘postorder’) way.

Parameters

traversal_type – dfs, bfs, preorder, or postorder
as_utterance – whether the iterator should yield the utterance (True) or the utterance node (False)

Returns

an iterator of the utterances or utterance nodes