Documentation AIME API Benchmark

class run_api_benchmark.BenchmarkApiEndpoint

Benchmark tool to measure and monitor the performance of GPUs with multiple asynchronous requests on llama2_chat and stable_diffusion_xl_txt2img endpoints.

load_flags()

Parsing the command line arguments.

Returns:

The argparse object containing the command line arguments

Return type:

argparse.Namespace

run()

Starting the benchmark.

async progress_callback(progress_info, progress_data)

Called when job progress is received from the API Server. Initializes or updates the job related progress bar, updates the title and measures the number of current running jobs.

Parameters:
  • progress_info (dict) – Job progress information containing the job_id and the progress state like number of generated tokens so far or percentage.

  • progress_data (dict) – The already generated content like tokens or interim images.

async result_callback(result)

Called when the final job result is received. Removes the job related progress bar, processes information about the server and the worker and updates the title.

Parameters:

result (dict) – The final job result like a generated text, audio or images.

print_benchmark_summary_string()

Printing the benchmark summary and the results.

async handle_first_batch(progress_info)

Detecting the jobs of the first batch to exclude them from the benchmark results.

Parameters:

progress_info (dict) – Job progress information containing the job_id and the progress state like number of generated tokens so far or percentage.

update_title(result=None)

Updating the title bars for the header containing information about the benchmark.

update_worker_and_endpoint_data_in_title(result={})

Updating the title bars with information about the API server and the workers.

make_benchmark_result_string()

Making string containing mean benchmark results.

Returns:

Result string

Return type:

str

print_start_message()

Printing benchmark parameters at the start.

get_default_values_from_config()

Parsing the default job parameters from the related endpoint config file.

Returns:

Job parameters for API request.

Return type:

dict

async do_request_with_semaphore()

Limiting the concurrent requests using asyncio.Semaphore().

static get_unit(args)

Getting the unit of the generated objects like ‘tokens’ for llama2_chat and ‘images’ image generators.

Returns:

The unit string of the generated objects

Return type:

str