Contributing¶
First off, thanks for taking the time to contribute!
The following is a set of guidelines for contributing to Red Hat
OpenShift PSAP ci-artifacts. These are mostly guidelines, not
rules. Use your best judgment, and feel free to propose changes to
this document in a pull request.
—
The primary goal of the repository is to host the tools required for the nightly testing of the OpenShift operators under Red Hat PSAP team responsibility, and in particular, NVIDIA GPU Operator and the Special Resource Operator (SRO).
The secondary goal of the repository is to offer a toolbox for interacting with our operators, and configuring the cluster as required.
Pull Request Guidelines¶
Pull Requests (PRs) need to be
/approveand reviewed/lgtmby PSAP team members before being merged.PRs should have a proper description explaining the problem being solved, or the new feature being introduced.
PRs introducing or modifying
toolboxcommands should include a documentation commit, so thatdocsis kept up-to-date.
Review Guidelines¶
Reviews can be performed by anyone interested in the good health of the repository; but approval and/or
/lgtmis reserved to PSAP team members at the moment.Reviewers should ensure that the relevant testing (only
/test gpu-operator-e2eat the moment) has been successfully executing before the PR can be merged.In order to save unnecessary AWS cloud time, the testing is not automatically executed by Prow; it must be manually triggered.
OpenShift GitHub Botwill not merge a PR when thegpu-operator-e2etest failed, but it will merged it if it was never executed (or if it completed successfully, of course)
Style Guidelines¶
YAML style¶
Align nested lists with their parent’s label
- block:
- name: ...
block:
- name: ...
YAML files use the .yml extension
Ansible style¶
We strive to follow Ansible best practices in the different playbooks.
This command is executed as a GitHub-Action hook on all the new PRs, to help keeping a consistent code style:
ansible-lint -v --force-color -c config/ansible-lint.yml playbooks roles
Try to avoid using
shelltasks as much as possibleMake sure that
set -o pipefail;is part of the shell command whenever a|is involved (ansible-lintforgets some of them)Redirection into a
{{ artifact_extra_logs_dir }}file is a common exception
Use inline stanza for
debugandfailtasks, eg:
- name: The GFD did not label the nodes
fail: msg="The GFD did not label the nodes"
Coding guidelines¶
Keep the main log file clean when everything goes right, and store all the relevant information in the
{{ artifact_extra_logs_dir }}directory, eg:
- name: Inspect the Subscriptions status (debug)
shell:
oc describe subscriptions.operators.coreos.com/gpu-operator-certified
-n openshift-operators
> {{ artifact_extra_logs_dir }}/gpu_operator_Subscription.log
failed_when: false
Include troubleshooting inspection commands whenever possible/relevant (see above for an example)
mark them as
failed_when: falseto ensure that their execution doesn’t affect the testingadd
(debug)in the task name to make it clear that the command is not part of the proper testing.
Use
ignore_errors: trueonly for tracking known failures.use
failed_when: falseto ignore the task return codebut whenever possible, write tasks that do not fail, eg:
oc delete --ignore-not-found=true $MY_RESOURCE
Try to group related modifications in a dedicated commit, and stack commits in logical order (eg, 1/ add role, 2/ add toolbox script 3/ integrate the toolbox scrip in the nightly CI)
Commits are not squashed, so please avoid commits “fixing” another commit of the PR.
Hints: git revise
use
git revise <commit>to modify an older commit (not older thatmaster;-)use
git revise --cut <commit>to split a commit in two logical commitsor simply use
git commit --amendto modify the most recent commit
Getting Started¶
Duplicate the
templaterole to prepare the skeleton the new roleThe
gpu_operator_run_gpu-burnrole can be studied an example of a standalone role & toolbox script. New features should follow a similar model:
roles/gpu_operator_run_gpu-burn
Define the tasks of the new role:
├── tasks
│ └── main.yml
Define the role dependencies (at least
check_deps):
├── meta
│ └── main.yml
Define the role configuration variables and their default values:
├── defaults
│ └── main
│ └── config.yml
Define the script constant variables
├── files
│ ├── gpu_burn_cm_entrypoint.yml
│ └── gpu_burn_pod.yml
└── vars
└── main
└── resources.yml
Add a toolbox script entrypoint setting the role configuration variables
toolbox/gpu-operator/
└── run_gpu_burn.sh
If relevant, call the toolbox script from the right nightly CI entrypoint:
# in build/root/usr/local/bin/ci_entrypoint_gpu-operator.sh
validate_gpu_operator_deployment() {
...
toolbox/gpu-operator/run_gpu_burn.sh
}