Toolbox Documentation
cluster
Commands relating to cluster scaling, upgrading and environment capture
- build_push_image Build and publish an image to quay using either a Dockerfile or git repo. 
- capture_environment Captures the cluster environment 
- create_htpasswd_adminuser Create an htpasswd admin user. 
- create_osd Create an OpenShift Dedicated cluster. 
- deploy_operator Deploy an operator from OperatorHub catalog entry. 
- destroy_ocp Destroy an OpenShift cluster 
- destroy_osd Destroy an OpenShift Dedicated cluster. 
- dump_prometheus_db Dump Prometheus database into a file 
- fill_workernodes Fills the worker nodes with place-holder Pods with the maximum available amount of a given resource name. 
- preload_image Preload a container image on all the nodes of a cluster. 
- query_prometheus_db Query Prometheus with a list of PromQueries read in a file 
- reset_prometheus_db Resets Prometheus database, by destroying its Pod 
- set_project_annotation Set an annotation on a given project, or for any new projects. 
- set_scale Ensures that the cluster has exactly scale nodes with instance_type instance_type 
- update_pods_per_node Update the maximum number of Pods per Nodes, and Pods per Core See alse: https://docs.openshift.com/container-platform/4.14/nodes/nodes/nodes-nodes-managing-max-pods.html 
- upgrade_to_image Upgrades the cluster to the given image 
- wait_fully_awake Waits for the cluster to be fully awake after Hive restart 
configure
Commands relating to TOPSAIL testing configuration
container_bench
Commands relating to the performance evaluation
- capture_container_engine_info Captures the info of the container engine 
- capture_system_state Captures the state of the remote Mac system 
- exec_benchmark Runs the exec benchmark with the given runtime 
- helloworld_benchmark Runs the helloworld benchmark with the given runtime 
- image_build_large_build_context_benchmark Runs the image build large build context benchmark with the given runtime 
- prepare_benchmark_script_on_remote Prepares the benchmark script on the remote machine 
crc
Commands relating to CRC
- refresh_image Update a CRC AMI image with a given SNC repo commit 
fine_tuning
Commands relating to RHOAI scheduler testing
- ray_fine_tuning_job Run a simple Ray fine-tuning Job. 
- run_fine_tuning_job Run a simple fine-tuning Job. 
run
Run `topsail` toolbox commands from a single config file.
gpu_operator
Commands for deploying, building and testing the GPU operator in various ways
- capture_deployment_state Captures the GPU operator deployment state 
- deploy_cluster_policy Creates the ClusterPolicy from the OLM ClusterServiceVersion 
- deploy_from_bundle Deploys the GPU Operator from a bundle 
- deploy_from_operatorhub Deploys the GPU operator from OperatorHub 
- enable_time_sharing Enable time-sharing in the GPU Operator ClusterPolicy 
- extend_metrics Enable time-sharing in the GPU Operator ClusterPolicy 
- get_csv_version Get the version of the GPU Operator currently installed from OLM Stores the version in the ‘ARTIFACT_EXTRA_LOGS_DIR’ artifacts directory. 
- run_gpu_burn Runs the GPU burn on the cluster 
- undeploy_from_operatorhub Undeploys a GPU-operator that was deployed from OperatorHub 
- wait_deployment Waits for the GPU operator to deploy 
- wait_stack_deployed Waits for the GPU Operator stack to be deployed on the GPU nodes 
jump_ci
Commands to run TOPSAIL scripts in a jump host
- ensure_lock Ensure that cluster lock with a given name is taken. Fails otherwise. 
- prepare_step Prepares the jump host for running a CI test step: 
- prepare_topsail Prepares the jump host for running TOPSAIL: - clones TOPSAIL repository - builds TOPSAIL image in the remote host 
- release_lock Release a cluster lock with a given name on a remote node 
- retrieve_artifacts Prepares the jump host for running a CI test step: 
- take_lock Take a lock with a given cluster name on a remote node 
kserve
Commands relating to RHOAI KServe component
- capture_operators_state Captures the state of the operators of the KServe serving stack 
- capture_state Captures the state of the KServe stack in a given namespace 
- deploy_model Deploy a KServe model 
- extract_protos Extracts the protos of an inference service 
- extract_protos_grpcurl Extracts the protos of an inference service, with GRPCurl observe 
- undeploy_model Undeploy a KServe model 
- validate_model Validate the proper deployment of a KServe model 
llm_load_test
Commands relating to llm-load-test
- run Load test the wisdom model 
local_ci
Commands to run the CI scripts in a container environment similar to the one used by the CI
mac_ai
Commands relating to the MacOS AI performance evaluation
- remote_build_virglrenderer Builds the Virglrenderer library 
- remote_capture_cpu_ram_usage Captures the CPU and RAM usage on MacOS 
- remote_capture_power_usage Captures the power usage on MacOS 
- remote_capture_system_state Captures the state of the remote Mac system 
- remote_capture_virtgpu_memory Captures the virt-gpu memory usage 
- remote_llama_cpp_pull_model Pulls a model with llama-cpp, on a remote host 
- remote_llama_cpp_run_bench Benchmark a model with llama_cpp, on a remote host 
- remote_llama_cpp_run_model Runs a model with llama_cpp, on a remote host 
- remote_ollama_pull_model Pulls a model with ollama, on a remote host 
- remote_ollama_run_model Runs a model with ollama, on a remote host 
- remote_ollama_start Starts ollama, on a remote host 
- remote_ramalama_run_bench Benchmark a model with ramalama, on a remote host 
- remote_ramalama_run_model Runs a model with ramalama, on a remote host 
nfd
Commands for NFD related tasks
- has_gpu_nodes Checks if the cluster has GPU nodes 
- has_labels Checks if the cluster has NFD labels 
- wait_gpu_nodes Wait until nfd find GPU nodes 
- wait_labels Wait until nfd labels the nodes 
nfd_operator
Commands for deploying, building and testing the NFD operator in various ways
- deploy_from_operatorhub Deploys the NFD Operator from OperatorHub 
- undeploy_from_operatorhub Undeploys an NFD-operator that was deployed from OperatorHub 
pipelines
Commands relating to RHODS
- capture_notebooks_state Capture information about the cluster and the RHODS notebooks deployment 
- capture_state Captures the state of a Data Science Pipeline Application in a given namespace. 
- deploy_application Deploy a Data Science Pipeline Application in a given namespace. 
- run_kfp_notebook Run a notebook in a given notebook image. 
remote
Commands relating to the setup of remote hosts
- build_image Builds a podman image 
- clone Clones a Github repository in a remote host 
- download Downloads a file in a remote host 
- retrieve Retrieves remote files locally 
repo
Commands to perform consistency validations on this repo itself
- generate_ansible_default_settings Generate the defaults/main/config.yml file of the Ansible roles, based on the Python definition. 
- generate_middleware_ci_secret_boilerplate Generate the boilerplace code to include a new secret in the Middleware CI configuration 
- generate_toolbox_related_files Generate the rst document and Ansible default settings, based on the Toolbox Python definition. 
- generate_toolbox_rst_documentation Generate the doc/toolbox.generated/*.rst file, based on the Toolbox Python definition. 
- send_cpt_notification Send a CPT notification to slack about the completion of a CPT job. 
- send_job_completion_notification Send a job completion notification to github and/or slack about the completion of a test job. 
- validate_no_broken_link Ensure that all the symlinks point to a file 
- validate_no_wip Ensures that none of the commits have the WIP flag in their message title. 
- validate_role_files Ensures that all the Ansible variables defining a filepath (project/*/toolbox/) do point to an existing file. 
- validate_role_vars_used Ensure that all the Ansible variables defined are actually used in their role (with an exception for symlinks) 
rhods
Commands relating to RHODS
- capture_state Captures the state of the RHOAI deployment 
- delete_ods Forces ODS operator deletion 
- deploy_addon Installs the RHODS OCM addon 
- deploy_ods Deploy ODS operator from its custom catalog 
- dump_prometheus_db Dump Prometheus database into a file 
- reset_prometheus_db Resets RHODS Prometheus database, by destroying its Pod. 
- undeploy_ods Undeploy ODS operator 
- update_datasciencecluster Update RHOAI datasciencecluster resource 
- wait_odh Wait for ODH to finish its deployment 
- wait_ods Wait for ODS to finish its deployment 
server
Commands relating to the deployment of servers on OpenShift
- deploy_ldap Deploy OpenLDAP and LDAP Oauth 
- deploy_minio_s3_server Deploy Minio S3 server 
- deploy_nginx_server Deploy an NGINX HTTP server 
- deploy_opensearch Deploy OpenSearch and OpenSearch-Dashboards 
- deploy_redis_server Deploy a redis server 
- undeploy_ldap Undeploy OpenLDAP and LDAP Oauth 
storage
Commands relating to OpenShift file storage
- deploy_aws_efs Deploy AWS EFS CSI driver and configure AWS accordingly. 
- deploy_nfs_provisioner Deploy NFS Provisioner 
- download_to_image Downloads the a dataset into an image in the internal registry 
- download_to_pvc Downloads the a dataset into a PVC of the cluster