The EvaluationManager and Evaluation classes are responsible for assessing the quality, compliance, and accuracy of generated responses. These evaluations are applied directly by the orchestrator after the message has been generated. The orchestrator calls the EvaluationManager to run the evaluations on the generated message, ensuring that the response meets the required standards before being finalized.
The Evaluation class is an abstract base class that defines the structure and behavior of individual evaluations. Each evaluation must implement the following methods:
get_name: Returns the unique name of the evaluation.
get_assessment_type: Specifies the type of assessment (e.g., hallucination or compliance).
run: Executes the evaluation logic on the response.
evaluation_metric_to_assessment: Converts the evaluation result into a user-facing assessment message.
classEvaluation(ABC):def__init__(self,name:EvaluationMetricName):self.name=namedefget_name(self)->EvaluationMetricName:"""Returns the unique name of the evaluation."""returnself.namedefget_assessment_type(self)->ChatMessageAssessmentType:"""Specifies the type of assessment (e.g., hallucination or compliance)."""raiseNotImplementedError("Subclasses must implement this method to return the assessment type.")asyncdefrun(self,loop_response:LanguageModelStreamResponse)->EvaluationMetricResult:"""Executes the evaluation logic."""raiseNotImplementedError("Subclasses must implement this method.")asyncdefevaluation_metric_to_assessment(self,evaluation_result:EvaluationMetricResult)->EvaluationAssessmentMessage:"""Converts the evaluation result into a user-facing assessment message."""raiseNotImplementedError("Subclasses must implement this method to convert evaluation results to assessment messages.")
The EvaluationManager is responsible for managing and executing evaluations. It allows evaluations to be registered, executed asynchronously, and their results integrated into the chat interface. The manager ensures that evaluations are run efficiently and their outcomes are displayed to the user.
run_evaluations(selected_evaluation_names: list[EvaluationMetricName], loop_response: LanguageModelStreamResponse, assistant_message_id: str)
Executes the selected evaluations asynchronously. Results are processed and returned as a list of EvaluationMetricResult.
execute_evaluation_call(evaluation_name: EvaluationMetricName, loop_response: LanguageModelStreamResponse, assistant_message_id: str)
Executes a single evaluation and returns its result.
asyncdefexecute_evaluation_call(self,evaluation_name:EvaluationMetricName,loop_response:LanguageModelStreamResponse,assistant_message_id:str,)->EvaluationMetricResult:evaluation_instance=self.get_evaluation_by_name(evaluation_name)ifevaluation_instance:awaitself._create_assistant_message(evaluation_instance,assistant_message_id)evaluation_metric_result=awaitevaluation_instance.run(loop_response)awaitself._show_message_assessment(evaluation_instance,evaluation_metric_result,assistant_message_id)returnevaluation_metric_resultreturnEvaluationMetricResult(name=evaluation_name,is_positive=True,value="RED",reason=f"Evaluation named {evaluation_name} not found",error=Exception("Evaluation named {evaluation_name} not found"),)
_create_evaluation_metric_result(result: Result[EvaluationMetricResult], evaluation_name: EvaluationMetricName)
Processes the result of an evaluation and ensures it is valid.
def_create_evaluation_metric_result(self,result:Result[EvaluationMetricResult],evaluation_name:EvaluationMetricName,)->EvaluationMetricResult:ifnotresult.success:returnEvaluationMetricResult(name=evaluation_name,is_positive=True,value="RED",reason=str(result.exception),error=Exception("Evaluation result is not successful"),)returnresult.unpack()
_show_message_assessment(evaluation_instance: Evaluation, evaluation_metric_result: EvaluationMetricResult, assistant_message_id: str)
Updates the chat interface with the evaluation results.
_create_assistant_message(evaluation_instance: Evaluation, assistant_message_id: str)
Creates a placeholder message in the chat interface while the evaluation is pending.