Principle:Mistralai Client python GCP Chat And FIM
| Knowledge Sources | |
|---|---|
| Domains | Cloud_Deployment, GCP, LLM_Inference, Code_Generation |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
A cloud-specific inference pattern that provides both chat completion and fill-in-the-middle (FIM) code completion through GCP Vertex AI, with automatic URL rewriting to Vertex AI rawPredict endpoints.
Description
GCP Chat and FIM extends the standard inference patterns for models deployed on Google Cloud. In addition to the standard chat completion API (chat.complete(), chat.stream()), the GCP client provides a unique fim.complete() method for fill-in-the-middle code completion. The GoogleCloudBeforeRequestHook transparently rewrites all request URLs to the Vertex AI rawPredict format. FIM allows code completion given both preceding and following code context.
Usage
Use this principle for inference on GCP-deployed Mistral models. The chat API is identical to the standard client. FIM is unique to the GCP client and is used for code completion tasks where both prefix and suffix context are available.
Theoretical Basis
FIM (Fill-in-the-Middle) completion:
- Provides a prompt (prefix code) and suffix (code after cursor)
- The model generates code to fill the gap between prefix and suffix
- Useful for IDE integration, code suggestion, and auto-completion
- The URL rewriting transforms: /v1/fim/completions → .../{model}:rawPredict