To improve the performance of an application, we need to setup an iterative loop that helps us systematically measure and improve performance of our application.
Setup your test-cases
src/evals/<test-group>/index.ts
file.Run an evaluation
Analyze results
evals
directory. Each test group should have an index.ts
file that exports a TestDatasetFN
function. This function should return an array of test-cases.
Here’s an example of statically defining test-cases:
Metric Name | Description |
---|---|
containsAllEvalMetric | Checks if the response contains the provided substring |
containsAnyEvalMetric | Checks if the response contains any of the provided substrings |
exactMatchEvalMetric | Check if the response is contains the exact string |
levensteinEvalMetric | NLP Similarity based metrics that checks the Levenstein distance |
rougeLCSSimilarityEvalMetric | Similarity metrics for sentence structure (Learn more) |
rougeNGramSimilarityEvalMetric | Similarity metrics at words and phrase levels (Learn more) |
rougeSkipBigramSimilarityEvalMetric | Captures more flexible phrase structures, reflecting coherence in non-contiguous word patterns (Learn more) |
response.metadata
. Here’s an example tracking total cost:
Chat
handler:
ResponseMetadataKey.TotalCost
ResponseMetadataKey.InputTokens
ResponseMetadataKey.OutputTokens
ResponseMetadataKey.TotalTokens
Execution Time
EvalMetric
object. Here’s an example of a custom metric that checks if the response length is within a specific range: