Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Semantic kernel QualityCheck NLP Server

From Leeroopedia
Knowledge Sources
Domains Python, FastAPI, NLP_Evaluation, Quality_Metrics
Last Updated 2026-02-11 00:00 GMT

Overview

FastAPI server providing NLP evaluation endpoints for summarization and translation quality metrics (BERT score, METEOR, BLEU, COMET), used as part of the QualityCheck demo in the Semantic Kernel repository.

Description

This Python file implements a FastAPI web server that exposes four HTTP POST endpoints for computing NLP evaluation metrics. The server uses the HuggingFace evaluate library for BERT, METEOR, and BLEU scores, and the comet library (Unbabel) for COMET translation quality scores. It is part of the QualityCheck demo sample that demonstrates how to integrate NLP quality metrics into Semantic Kernel workflows.

The server defines two Pydantic request models:

  • SummarizationEvaluationRequest - Accepts sources and summaries (lists of strings) for evaluating summarization quality
  • TranslationEvaluationRequest - Accepts sources and translations (lists of strings) for evaluating translation quality

Four endpoints are provided:

  • POST /bert-score/ - Computes BERTScore (precision, recall, F1) comparing summaries to source references
  • POST /meteor-score/ - Computes METEOR score for summarization evaluation
  • POST /bleu-score/ - Computes BLEU score for summarization evaluation
  • POST /comet-score/ - Computes COMET score using the Unbabel wmt22-cometkiwi-da model for translation quality evaluation

Usage

This server is run as a standalone FastAPI application during the QualityCheck demo. A .NET Semantic Kernel application calls these endpoints to evaluate the quality of AI-generated summaries and translations. Developers would start this server locally when running the QualityCheck demo to provide the NLP scoring backend. It requires Python dependencies including fastapi, evaluate, comet, and the underlying models.

Code Reference

Source Location

Signature

# Copyright (c) Microsoft. All rights reserved.

from typing import List
from pydantic import BaseModel
from fastapi import FastAPI
from evaluate import load
from comet import download_model, load_from_checkpoint

app = FastAPI()

class SummarizationEvaluationRequest(BaseModel):
    sources: List[str]
    summaries: List[str]

class TranslationEvaluationRequest(BaseModel):
    sources: List[str]
    translations: List[str]

@app.post("/bert-score/")
def bert_score(request: SummarizationEvaluationRequest):
    bertscore = load("bertscore")
    return bertscore.compute(predictions=request.summaries, references=request.sources, lang="en")

Import

# Run the FastAPI server using uvicorn
cd dotnet/samples/Demos/QualityCheck/python-server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

I/O Contract

Inputs

Name Type Required Description
sources List[str] yes List of source/reference texts to compare against
summaries List[str] yes (summarization) List of generated summaries to evaluate (used by /bert-score/, /meteor-score/, /bleu-score/)
translations List[str] yes (translation) List of generated translations to evaluate (used by /comet-score/)

Outputs

Name Type Description
/bert-score/ response object BERTScore results with precision, recall, and F1 arrays for each prediction
/meteor-score/ response object METEOR score result with a single meteor float value
/bleu-score/ response object BLEU score result with bleu float value and precisions, brevity_penalty arrays
/comet-score/ response object COMET model prediction results with scores for each source-translation pair

Usage Examples

Calling the BERT Score Endpoint

import requests

response = requests.post("http://localhost:8000/bert-score/", json={
    "sources": ["The cat sat on the mat."],
    "summaries": ["A cat was sitting on a mat."]
})
result = response.json()
# result contains: {"precision": [...], "recall": [...], "f1": [...], "hashcode": "..."}
print(f"BERT F1 Score: {result['f1'][0]:.4f}")

Calling the COMET Translation Score Endpoint

import requests

response = requests.post("http://localhost:8000/comet-score/", json={
    "sources": ["The weather is nice today."],
    "translations": ["El clima es agradable hoy."]
})
result = response.json()
# COMET score for translation quality evaluation
print(f"COMET Score: {result}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment