llm_load_test run

Load test the wisdom model

Parameters

host

  • The host endpoint of the gRPC call

port

  • The gRPC port on the specified host

duration

  • The duration of the load testing

plugin

  • The llm-load-test plugin to use (tgis_grpc_plugin or caikit_client_plugin for now)

  • default value: tgis_grpc_plugin

interface

  • (http or grpc) the interface to use for llm-load-test-plugins that support both

  • default value: grpc

model_id

  • The ID of the model to pass along with the GRPC call

  • default value: not-used

src_path

  • Path where llm-load-test has been cloned

  • default value: projects/llm_load_test/subprojects/llm-load-test/

streaming

  • Whether to stream the llm-load-test requests

  • default value: True

use_tls

  • Whether to set use_tls: True (grpc in Serverless mode)

concurrency

  • Number of concurrent simulated users sending requests

  • default value: 16

max_input_tokens

  • Max input tokens in llm load test to filter the dataset

  • default value: 1024

max_output_tokens

  • Max output tokens in llm load test to filter the dataset

  • default value: 512

max_sequence_tokens

  • Max sequence tokens in llm load test to filter the dataset

  • default value: 1536

endpoint

  • Name of the endpoint to query (for openai plugin only)

  • default value: /v1/completions