September 6th, 2020¶
We released a module that helps the Researcher perform Hyperparameter optimization (HPO). HPO is about running many smaller experiments with varying parameters to help determine the optimal parameter set Hyperparameter Optimization Walk-through
September 3rd, 2020¶
GPU Fractions now run in training and not only interactive. GPU Fractions training jobs can be preempted, bin-packed and consolidated like any integer jobs. See Run:AI Scheduler Fraction for more.
August 10th, 2020¶
Run:AI Now supports Distributed Training and Gang Scheduling. For further information , see the Launch Distributed Training Workloads Walkthrough.
August 4th, 2020¶
There is now an optional second level of Project hierarchy called Departments. For further information on how to configure and use Departments, see Working with Departments
July 28th, 2020¶
You can now enforce a cluster-wise setting which mandates all containers running using the Run:AI CLI to run as non root. For further information, see Enforce non-root Containers
July 21th, 2020¶
It is now possible to mount a Persistent Storage Claim using the Run:AI CLI. See the
--pvc flag in the runai submit CLI flag
June 13th, 2020¶
New Settings for the Allocation of CPU and Memory¶
It is now possible to set limits for CPU and memory as well as to establish defaults based on the ratio of GPU to CPU and GPU to memory.
For further information see: Allocation of CPU and Memory
June 3rd, 2020¶
Node Group Affinity¶
Projects now support Node Affinity. This feature allows the administrator to assign specific projects to run only on specific nodes (machines). Example use cases:
- The project team needs specialized hardware (e.g. with enough memory)
- The project team is the owner of specific hardware which was acquired with a specialized budget
- We want to direct build/interactive workloads to work on weaker hardware and direct longer training/unattended workloads to faster nodes
For further information see: Working with Projects
Limit Duration of Interactive Jobs¶
Researchers frequently forget to close Interactive jobs. This may lead to a waste of resources. Some organizations prefer to limit the duration of interactive jobs and close them automatically.
For further information on how to set up duration limits see: Working with Projects
May 24th, 2020¶
Cluster installation now works with Kubernetes Operators. Operators make it easy to install, update, and delete a Run:AI cluster.
March 3rd, 2020¶
Admin Overview Dashboard¶
A new admin overview dashboard which shows a more holistic view of multiple clusters. Applicable for customers with more than one cluster.