Implementation:Microsoft Semantic kernel QualityCheck NLP Server

Knowledge Sources	Microsoft_Semantic_kernel
Domains	Python, FastAPI, NLP_Evaluation, Quality_Metrics
Last Updated	2026-02-11 00:00 GMT

Overview

FastAPI server providing NLP evaluation endpoints for summarization and translation quality metrics (BERT score, METEOR, BLEU, COMET), used as part of the QualityCheck demo in the Semantic Kernel repository.

Description

This Python file implements a FastAPI web server that exposes four HTTP POST endpoints for computing NLP evaluation metrics. The server uses the HuggingFace evaluate library for BERT, METEOR, and BLEU scores, and the comet library (Unbabel) for COMET translation quality scores. It is part of the QualityCheck demo sample that demonstrates how to integrate NLP quality metrics into Semantic Kernel workflows.

The server defines two Pydantic request models:

SummarizationEvaluationRequest - Accepts sources and summaries (lists of strings) for evaluating summarization quality
TranslationEvaluationRequest - Accepts sources and translations (lists of strings) for evaluating translation quality

Four endpoints are provided:

POST /bert-score/ - Computes BERTScore (precision, recall, F1) comparing summaries to source references
POST /meteor-score/ - Computes METEOR score for summarization evaluation
POST /bleu-score/ - Computes BLEU score for summarization evaluation
POST /comet-score/ - Computes COMET score using the Unbabel wmt22-cometkiwi-da model for translation quality evaluation

Usage

This server is run as a standalone FastAPI application during the QualityCheck demo. A .NET Semantic Kernel application calls these endpoints to evaluate the quality of AI-generated summaries and translations. Developers would start this server locally when running the QualityCheck demo to provide the NLP scoring backend. It requires Python dependencies including fastapi, evaluate, comet, and the underlying models.

Code Reference

Source Location

Repository: Microsoft_Semantic_kernel
File: dotnet/samples/Demos/QualityCheck/python-server/app/main.py
Lines: 1-40

Signature

# Copyright (c) Microsoft. All rights reserved.

from typing import List
from pydantic import BaseModel
from fastapi import FastAPI
from evaluate import load
from comet import download_model, load_from_checkpoint

app = FastAPI()

class SummarizationEvaluationRequest(BaseModel):
    sources: List[str]
    summaries: List[str]

class TranslationEvaluationRequest(BaseModel):
    sources: List[str]
    translations: List[str]

@app.post("/bert-score/")
def bert_score(request: SummarizationEvaluationRequest):
    bertscore = load("bertscore")
    return bertscore.compute(predictions=request.summaries, references=request.sources, lang="en")

Import

# Run the FastAPI server using uvicorn
cd dotnet/samples/Demos/QualityCheck/python-server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

I/O Contract

Inputs

Name	Type	Required	Description
sources	List[str]	yes	List of source/reference texts to compare against
summaries	List[str]	yes (summarization)	List of generated summaries to evaluate (used by /bert-score/, /meteor-score/, /bleu-score/)
translations	List[str]	yes (translation)	List of generated translations to evaluate (used by /comet-score/)

Outputs

Name	Type	Description
/bert-score/ response	object	BERTScore results with precision, recall, and F1 arrays for each prediction
/meteor-score/ response	object	METEOR score result with a single meteor float value
/bleu-score/ response	object	BLEU score result with bleu float value and precisions, brevity_penalty arrays
/comet-score/ response	object	COMET model prediction results with scores for each source-translation pair

Usage Examples

Calling the BERT Score Endpoint

import requests

response = requests.post("http://localhost:8000/bert-score/", json={
    "sources": ["The cat sat on the mat."],
    "summaries": ["A cat was sitting on a mat."]
})
result = response.json()
# result contains: {"precision": [...], "recall": [...], "f1": [...], "hashcode": "..."}
print(f"BERT F1 Score: {result['f1'][0]:.4f}")

Calling the COMET Translation Score Endpoint

import requests

response = requests.post("http://localhost:8000/comet-score/", json={
    "sources": ["The weather is nice today."],
    "translations": ["El clima es agradable hoy."]
})
result = response.json()
# COMET score for translation quality evaluation
print(f"COMET Score: {result}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment