1. Storage Systems
RCIC supports several different storage systems, each with their own “sweet spot” for price and performance.
The storage systems depicted below are all available from the HPC3 cluster. Campus Storage (CRSP) is unique in that it can also be accessed from desktops, laptops, and other systems without going through HPC3.
The two major parallel file systems are DFS and CRSP and the section on comparing CRSP and DFS can help you choose the right system (or combination of systems) to store your data.
Of the multiple HPC3 storage systems available, connectivity, file System architecture, and physical hardware all contribute to performance.
Attention
The following summary explains what each storage system provides, what it should be used for, and shows links for in-depth how to use guides.
- Home
- See details in HOME storage guide.Provides a convenience access on all nodes via mount over NFSSlowest performance, yet is sufficient when used properlyUse to keep small source code or compiled binariesUse for small (order of Mbs) data filesDo not use for data intensive batch jobs
- Scratch
- See details in Scratch storage guide.Local disk space unique to each compute nodeFastest performance, data is removed when job completesUse as scratch storage for batch Jobs that repeatedly access many small files or make frequent small reads/writes:Not available on login nodes
- Parallel
- See details in DFS storage guide.Provides a convenience access on all nodes via mountPerformance is best for processing medium/large data files (order of 100s Mbs/Gbs)Use for batch jobs, most common place for data used in batch jobsUse to keep source code, binariesDo not use for writing/reading many small files
- Campus Storage
- See details in CRSP storage guide.Provides a convenience access on all nodes via mount over NFSPerformance is best for processing medium/large data files (order of 100s Mbs/Gbs)Use sometimes for batch Jobs, usually better to use DFS or local $TMPDIR storageUse to keep source code, binariesDo not use for writing/reading many small files
- Campus Storage Annex
- See details in CRSP ANNEX storage guide.Provides a convenience access on all nodes via mount over BeeGFSPerformance is best for processing medium/large data files (order of 100s Mbs/Gbs)Do not use for writing/reading many small files
1.1. CRSP vs. DFS
The largest capacity storage systems available are CRSP and DFS. Both are parallel filesystems but have different cost, availability, and usage models. The table below highlights the key differences and similarities between these two systems.
Feature |
CRSP |
DFS |
---|---|---|
Cost |
$50/TB/Year |
$100/TB/5Years |
Availability |
Highly-available. No routinely planned outages. Can survive many types of hardware failures without downtime |
Routine maintenance outage about 4X/year. Survives disk failures (RAID) only |
Access |
Access from any campus IP or VPN-connected laptop |
Access only from HPC3 |
Snapshots |
Daily file system snapshots allow users to self-recover from deletions or overwrites of files |
No Snapshots |
Backups |
Backed up daily offsite with 90 day retention of deleted/changed files |
|
Quota Management |
Labs have a space/#files quota. Users and groups can have (sub)quotas set within the lab |
All users share the same group quota. All files must be written with the same unix group id to access quota’ed space |
Performance |
High-performance but DFS is a better match for direct use from HPC3 |
High-performance. Most common storage for used on HPC3 |
Encryption at rest |
All data is encrypted at rest. |
Only dfs3b is encrypted at rest. |
File System |
IBM Storage Scale (aka GPFS) |
BeeGFS with Thinkparq support. Details |