Data Volumes
Data volumes offer a powerful solution for storing, managing, and sharing AI training data within the NVIDIA Run:ai platform. They promote collaboration, simplify data access control, and streamline the AI development lifecycle.
Acting as a central repository for organizational data resources, data volumes can represent datasets or raw data, that is stored in Kubernetes Persistent Volume Claims (PVCs).
Why use a data volume?¶
- Sharing with multiple scopes
 Unlike other Run:ai data sources, data volumes can be shared across projects, departments, or clusters, encouraging data reuse and collaboration within the organization.
- Storage saving
 A single copy of the data can be used across multiple scopes
Typical use cases¶
- Sharing large data sets
 In large organizations, the data is often stored in a remote location, which can be a barrier for large model training. Even if the data is transferred into the cluster, sharing it easily with multiple users is still challenging. Data volumes can help share the data seamlessly, with maximum security and control.
- Sharing data with colleagues
 When sharing training results, generated data sets, or other artifacts with team members is needed, data volumes can help make the data available easily.
Prerequisites¶
To create a data volume, there must be a project with a PVC in its namespace.
Working with data volumes is currently available using the API. To view the available actions, go to the Data volumes API reference.
Adding a new data volume¶
Data volume creation is limited to specific roles
Adding scopes for a data volume¶
Data volume sharing (adding scopes) is limited to specific roles
Once created, the data volume is available to its originating project (see the prerequisites above).
Data volumes can be shared with additional scopes in the organization.
Who can use a data volume?¶
Data volumes are used when submitting workloads. Any user, application or SSO group with a role that has permissions to create workloads can also use data volumes.
Researchers can list available data volumes within their permitted scopes for easy selection.