COE Data Storage♯
Engineering and scientific research produces a lot of data. Our College's storage needs continue to grow, whether it be storage for research data to maintain compliance, college and departmental operational data, student team and project data, or data needed for internal or external collaboration.
We also support storage offered through Office365 include OneDrive and SharePoint.
All operational and research folders have a file called AccessList.txt that is updated nightly to reflect all users with access to a particular folder. All folder owners should review these files on a regular basis. Any required changes to access can be sent in as a ticket.
COE Main Storage Hardware♯
Our operational and research file storage solution consists of two 60-drive storage arrays. One of the arrays functions as the main file server repository, while the other is dedicated to backups. Both are configured as RAID10 arrays with multiple hotspares.
researchfiles.coe.drexel.edu are both virtual machines, with all storage volumes mounted as separate virtual hard drives formatted using NTFS with deduplication enabled. Currently, deduplication is showing space savings of an average of 50%. Since it's block-based, it will work better with text files, worst with already compressed files. Because each research folder is a separate volume, research groups will enjoy the full benefits of the dedup savings for their data (as a real-world example, there is a research group with over 15TB of data, deduped down to under 5TB, leaving them 5TB of free space). If you copy a large amount of data up to your folder, and later notice that the free space has gone back up, that’s the result of the new data being deduplicated. If you want to understand how this process works, Microsoft has a write-up that explains it very well.
Every night, a script runs that updates a file in each group's folder called
StorageUsageReport.txt. In it, the line marked
SavingsRate will give the percentage saved through deduplication.
File server backups♯
CTS backs up data on
researchfiles according to this schedule.