Toolbox Documentation
busy_cluster
Commands relating to make a cluster busy with lot of resources
cleanup Cleanups namespaces to make a cluster un-busy
create_configmaps Creates configmaps and secrets to make a cluster busy
create_deployments Creates configmaps and secrets to make a cluster busy
create_jobs Creates jobs to make a cluster busy
create_namespaces Creates namespaces to make a cluster busy
status Shows the busyness of the cluster
cluster
Commands relating to cluster scaling, upgrading and environment capture
build_push_image Build and publish an image to quay using either a Dockerfile or git repo.
capture_environment Captures the cluster environment
create_htpasswd_adminuser Create an htpasswd admin user.
create_osd Create an OpenShift Dedicated cluster.
deploy_operator Deploy an operator from OperatorHub catalog entry.
destroy_ocp Destroy an OpenShift cluster
destroy_osd Destroy an OpenShift Dedicated cluster.
dump_prometheus_db Dump Prometheus database into a file
fill_workernodes Fills the worker nodes with place-holder Pods with the maximum available amount of a given resource name.
preload_image Preload a container image on all the nodes of a cluster.
query_prometheus_db Query Prometheus with a list of PromQueries read in a file
reset_prometheus_db Resets Prometheus database, by destroying its Pod
set_project_annotation Set an annotation on a given project, or for any new projects.
set_scale Ensures that the cluster has exactly scale nodes with instance_type instance_type
update_pods_per_node Update the maximum number of Pods per Nodes, and Pods per Core See alse: https://docs.openshift.com/container-platform/4.14/nodes/nodes/nodes-nodes-managing-max-pods.html
upgrade_to_image Upgrades the cluster to the given image
wait_fully_awake Waits for the cluster to be fully awake after Hive restart
configure
Commands relating to TOPSAIL testing configuration
cpt
Commands relating to continuous performance testing management
deploy_cpt_dashboard Deploy and configure the CPT Dashboard
fine_tuning
Commands relating to RHOAI scheduler testing
ray_fine_tuning_job Run a simple Ray fine-tuning Job.
run_fine_tuning_job Run a simple fine-tuning Job.
run_quality_evaluation Run a simple fine-tuning Job.
run
Run `topsail` toolbox commands from a single config file.
gpu_operator
Commands for deploying, building and testing the GPU operator in various ways
capture_deployment_state Captures the GPU operator deployment state
deploy_cluster_policy Creates the ClusterPolicy from the OLM ClusterServiceVersion
deploy_from_bundle Deploys the GPU Operator from a bundle
deploy_from_operatorhub Deploys the GPU operator from OperatorHub
enable_time_sharing Enable time-sharing in the GPU Operator ClusterPolicy
extend_metrics Enable time-sharing in the GPU Operator ClusterPolicy
get_csv_version Get the version of the GPU Operator currently installed from OLM Stores the version in the ‘ARTIFACT_EXTRA_LOGS_DIR’ artifacts directory.
run_gpu_burn Runs the GPU burn on the cluster
undeploy_from_operatorhub Undeploys a GPU-operator that was deployed from OperatorHub
wait_deployment Waits for the GPU operator to deploy
wait_stack_deployed Waits for the GPU Operator stack to be deployed on the GPU nodes
jump_ci
Commands to run TOPSAIL scripts in a jump host
ensure_lock Ensure that cluster lock with a given name is taken. Fails otherwise.
prepare_step Prepares the jump host for running a CI test step:
prepare_topsail Prepares the jump host for running TOPSAIL: - clones TOPSAIL repository - builds TOPSAIL image in the remote host
release_lock Release a cluster lock with a given name on a remote node
retrieve_artifacts Prepares the jump host for running a CI test step:
take_lock Take a lock with a given cluster name on a remote node
kepler
Commands relating to kepler deployment
deploy_kepler Deploy the Kepler operator and monitor to track energy consumption
undeploy_kepler Cleanup the Kepler operator and associated resources
kserve
Commands relating to RHOAI KServe component
capture_operators_state Captures the state of the operators of the KServe serving stack
capture_state Captures the state of the KServe stack in a given namespace
deploy_model Deploy a KServe model
extract_protos Extracts the protos of an inference service
extract_protos_grpcurl Extracts the protos of an inference service, with GRPCurl observe
undeploy_model Undeploy a KServe model
validate_model Validate the proper deployment of a KServe model
kubemark
Commands relating to kubemark deployment
deploy_capi_provider Deploy the Kubemark Cluster-API provider
deploy_nodes Deploy a set of Kubemark nodes
kwok
Commands relating to KWOK deployment
deploy_kwok_controller Deploy the KWOK hollow node provider
set_scale Deploy a set of KWOK nodes
llm_load_test
Commands relating to llm-load-test
run Load test the wisdom model
local_ci
Commands to run the CI scripts in a container environment similar to the one used by the CI
nfd
Commands for NFD related tasks
has_gpu_nodes Checks if the cluster has GPU nodes
has_labels Checks if the cluster has NFD labels
wait_gpu_nodes Wait until nfd find GPU nodes
wait_labels Wait until nfd labels the nodes
nfd_operator
Commands for deploying, building and testing the NFD operator in various ways
deploy_from_operatorhub Deploys the NFD Operator from OperatorHub
undeploy_from_operatorhub Undeploys an NFD-operator that was deployed from OperatorHub
notebooks
Commands relating to RHOAI Notebooks
benchmark_performance Benchmark the performance of a notebook image.
capture_state Capture information about the cluster and the RHODS notebooks deployment
cleanup Clean up the resources created along with the notebooks, during the scale tests.
dashboard_scale_test End-to-end scale testing of ROAI dashboard scale test, at user level.
locust_scale_test End-to-end testing of RHOAI notebooks at scale, at API level
ods_ci_scale_test End-to-end scale testing of ROAI notebooks, at user level.
pipelines
Commands relating to RHODS
capture_state Captures the state of a Data Science Pipeline Application in a given namespace.
deploy_application Deploy a Data Science Pipeline Application in a given namespace.
run_kfp_notebook Run a notebook in a given notebook image.
repo
Commands to perform consistency validations on this repo itself
generate_ansible_default_settings Generate the defaults/main/config.yml file of the Ansible roles, based on the Python definition.
generate_middleware_ci_secret_boilerplate Generate the boilerplace code to include a new secret in the Middleware CI configuration
generate_toolbox_related_files Generate the rst document and Ansible default settings, based on the Toolbox Python definition.
generate_toolbox_rst_documentation Generate the doc/toolbox.generated/*.rst file, based on the Toolbox Python definition.
send_job_completion_notification Send a job completion notification to github and/or slack about the completion of a test job.
validate_no_broken_link Ensure that all the symlinks point to a file
validate_no_wip Ensures that none of the commits have the WIP flag in their message title.
validate_role_files Ensures that all the Ansible variables defining a filepath (project/*/toolbox/) do point to an existing file.
validate_role_vars_used Ensure that all the Ansible variables defined are actually used in their role (with an exception for symlinks)
rhods
Commands relating to RHODS
capture_state Captures the state of the RHOAI deployment
delete_ods Forces ODS operator deletion
deploy_addon Installs the RHODS OCM addon
deploy_ods Deploy ODS operator from its custom catalog
dump_prometheus_db Dump Prometheus database into a file
reset_prometheus_db Resets RHODS Prometheus database, by destroying its Pod.
undeploy_ods Undeploy ODS operator
update_datasciencecluster Update RHOAI datasciencecluster resource
wait_odh Wait for ODH to finish its deployment
wait_ods Wait for ODS to finish its deployment
scheduler
Commands relating to RHOAI scheduler testing
cleanup Clean up the scheduler load namespace
create_mcad_canary Create a canary for MCAD Appwrappers and track the time it takes to be scheduled
deploy_mcad_from_helm Deploys MCAD from helm
generate_load Generate scheduler load
server
Commands relating to the deployment of servers on OpenShift
deploy_ldap Deploy OpenLDAP and LDAP Oauth
deploy_minio_s3_server Deploy Minio S3 server
deploy_nginx_server Deploy an NGINX HTTP server
deploy_opensearch Deploy OpenSearch and OpenSearch-Dashboards
deploy_redis_server Deploy a redis server
undeploy_ldap Undeploy OpenLDAP and LDAP Oauth
storage
Commands relating to OpenShift file storage
deploy_aws_efs Deploy AWS EFS CSI driver and configure AWS accordingly.
deploy_nfs_provisioner Deploy NFS Provisioner
download_to_image Downloads the a dataset into an image in the internal registry
download_to_pvc Downloads the a dataset into a PVC of the cluster