Implementation:Run llama Llama index CohereRerankerFinetuneEngine
Overview
The CohereRerankerFinetuneEngine provides a fine-tuning engine for Cohere's reranking models. It handles data upload, custom model creation via the Cohere API, and retrieval of the fine-tuned model as a LlamaIndex CohereRerank postprocessor. This module resides in the llama-index-finetuning package under the rerankers submodule.
Source file: llama-index-finetuning/llama_index/finetuning/rerankers/cohere_reranker.py (78 lines)
Dependencies
| Dependency | Purpose |
|---|---|
importlib.util |
Checking whether the cohere package is installed
|
os |
Reading the COHERE_API_KEY environment variable
|
llama_index.finetuning.types.BaseCohereRerankerFinetuningEngine |
Abstract base class defining the fine-tuning engine interface |
llama_index.postprocessor.cohere_rerank.CohereRerank |
LlamaIndex Cohere reranker postprocessor returned by get_finetuned_model
|
cohere |
Cohere SDK client (imported conditionally at runtime) |
cohere.custom_model_dataset.JsonlDataset |
Dataset wrapper for JSONL training/validation files (imported at runtime in finetune)
|
Class: CohereRerankerFinetuneEngine
Inherits from: BaseCohereRerankerFinetuningEngine
Constructor
def __init__(
self,
train_file_name: str = "train.jsonl",
val_file_name: Optional[str] = None,
model_name: str = "exp_finetune",
model_type: str = "RERANK",
base_model: str = "english",
api_key: Optional[str] = None,
) -> None
| Parameter | Type | Default | Description |
|---|---|---|---|
train_file_name |
str |
"train.jsonl" |
Path to the JSONL training data file |
val_file_name |
Optional[str] |
None |
Optional path to a JSONL validation data file |
model_name |
str |
"exp_finetune" |
Name for the custom model on Cohere's platform |
model_type |
str |
"RERANK" |
Type of custom model to create (always "RERANK" for rerankers)
|
base_model |
str |
"english" |
Base reranking model to fine-tune on top of |
api_key |
Optional[str] |
None |
Cohere API key; falls back to the COHERE_API_KEY environment variable
|
Initialization behavior:
- Checks if the
coheremodule is available usingimportlib.util.find_spec("cohere"). If not installed, raisesImportErrorwith installation instructions. - Resolves the API key from the parameter or the
COHERE_API_KEYenvironment variable. RaisesValueErrorif neither is available. - Creates a
cohere.Clientwith the API key and setsclient_name="llama_index". - Stores all configuration parameters as private instance attributes.
- Initializes
self._finetune_modeltoNone.
Method: finetune
def finetune(self) -> None
Launches the fine-tuning job on Cohere's platform.
Workflow:
- Imports
JsonlDatasetfromcohere.custom_model_dataset. - Creates a
JsonlDatasetinstance:- If a validation file is provided, passes both
train_fileandeval_file. - If no validation file is provided, passes only
train_file.
- If a validation file is provided, passes both
- Calls
self._model.create_custom_model()with:name-- the configured model namedataset-- the JSONL dataset objectmodel_type-- the model type string (e.g.,"RERANK")base_model-- the base model identifier
- Stores the returned custom model object in
self._finetune_model.
Method: get_finetuned_model
def get_finetuned_model(self, top_n: int = 5) -> CohereRerank
Returns a LlamaIndex CohereRerank postprocessor configured with the fine-tuned model.
| Parameter | Type | Default | Description |
|---|---|---|---|
top_n |
int |
5 |
Number of top-ranked results the reranker should return |
Workflow:
- Checks that
self._finetune_modelis notNone; raisesRuntimeErroriffinetune()has not been called yet. - Returns
CohereRerank(model=self._finetune_model.id, top_n=top_n, api_key=self.api_key).
Environment Variables
| Variable | Required | Default | Purpose |
|---|---|---|---|
COHERE_API_KEY |
Yes (unless api_key parameter is provided) |
none | API key for Cohere authentication |
Error Handling
| Condition | Exception | Message |
|---|---|---|
cohere package not installed |
ImportError |
"Cannot import cohere. Please install the package using pip install cohere."
|
| No API key found | ValueError |
"Must pass in cohere api key or specify via COHERE_API_KEY environment variable" |
get_finetuned_model called before finetune |
RuntimeError |
"Finetuned model is not set yet. Please run the finetune method first." |
Note: The API key fallback uses os.environ["COHERE_API_KEY"] which raises KeyError if the variable is not set, but the except clause catches IndexError instead. This means a missing environment variable would raise an unhandled KeyError rather than the intended ValueError.
See Also
- Run_llama_Llama_index_Reranker_Dataset_Gen -- Dataset generation for Cohere reranker fine-tuning
- Run_llama_Llama_index_CrossEncoderFinetuneEngine -- Cross-encoder fine-tuning engine (sentence-transformers based)