Tech Blog: UA Mathematics Research Team Develops New Data Corruption and Retention Technology

March 27, 2017
Image

RAID, or “redundant array of independent disks,” is a means of storing data across multiple hard disks [1]. Doing so balances the overlap of input/output operations, ultimately improving the performance and reliability of data storage [2]. RAID is indispensable in applications producing or consuming data at a rate exceeding, often by a large factor, the maximum data rate of an individual disk drive, such as recording video from a motion picture camera, or a high-speed medical imaging device. RAID 6, the current and most commonly used version, has offered “very high fault- and drive-failure tolerance” and is employed in environments where long-term data retention is critical [3]. However, due to the ever-increasing data density on disk drives, RAID 6 is expected to be obsolete by 2019 [4].

With this deadline in mind, Ph.D. student Mohamad Moussa and Professor Marek Rychlik in the University of Arizona Department of Mathematics in the College of Science, sought to develop an improved method for mitigating data loss due to faults in the storage medium (disk).

Why is this an important development? As an example, reading the same data bit at a rate of 1 gigabyte per second, every three hours the bit will be read incorrectly. The loss of a single bit is problematic because, when applied over time, it can lead to massive data corruption. In some applications, a loss of even a single bit can be catastrophic. For instance, in computing the total in a spreadsheet spanning an entire disk, a single bit read incorrectly can change and randomize the result.

RAID systems are also subject to human error. A known vulnerability of RAID 6 is that it fails when two random drives fail simultaneously. Moussa and Rychlik explain a possible scenario: “Let’s say a server has one hundred drives. When one fails and it’s neglected for a few hours, the RAID goes into its repair phase. During this time it is possible for a second drive to fail. If you lose the second one, you lose everything.”

Data centers, like this one in the UK National Archives, can hold tens of thousands of servers and are therefore at risk of experiencing data corruption and loss.

Data centers, like this one in the UK National Archives, can hold tens of thousands of servers and are therefore at risk of experiencing data corruption and loss. Photo credit: The UK National Archives

Even if the failed disk drive is immediately replaced with a new one, it can take hours or days to reconstruct its content and transfer it to a new disk drive. During this repair phase, a RAID 6 system is subject to catastrophic data loss if a second hard drive fails. The problem is exacerbated by increasingly large hard disk sizes and the growing quantities of data that today’s applications require.

To prevent such disasters, Moussa and Rychlik developed a RAID controller, a computer program responsible for reading and writing data to disks [4]. The controller computes the content of the parity drives via a complex formula, discovers which data has been corrupted, and then re-computes that data from other unaffected data. In the case of a large data loss such as a motion picture file, this technology would continue functioning after the loss of a single drive until it detects the loss of a second drive. The new controller uses an algorithm, which accomplishes the high degree of data protection required without significantly decreasing the processing speed or the need for expensive hardware – both of which are known drawbacks of other competing methods capable of withstanding a two-drive failure.

While the technology is appropriate for massive data storage solutions like data centers and cloud providers, it is also applicable in other areas requiring the storage of large enterprise level data such as: medical and financial records; large, single user video editing work stations; and correction of errors in signal transmission in telecommunications. In essence, this method arranges data in a pattern that is resistant to partial destruction, making it suitable for any kind of data transmission vulnerable to interference or partial data loss.

Moussa and Rychlik presented this technology during their participation in TLA’s first NSF I-Corps cohort of 2017. This NSF designated program helps inventors to outline their technology’s value proposition and to further understand their potential customer base.

To learn more about this technology, check out:

UA17-114 Error Correction System and Method

Click here to learn more about the data storage technologies available from the University of Arizona.

References:

[1] Rouse, Margaret. “What is RAID (Redundant array of independent disks)? - Definition from WhatIs.Com.” SearchStorage, TechTarget, Apr. 2015,

searchstorage.techtarget.com/definition/RAID. Accessed 9 Mar. 2017.

[2] RAID. 17 Jan. 2017, www.prepressure.com/library/technology/raid(link is external). Accessed 9 Mar. 2017.

[3] Rouse, Margaret. “What is RAID 6 (Redundant array of independent disks)? - Definition from WhatIs.Com.” SearchStorage, TechTarget, Dec. 2014, searchstorage.techtarget.com/definition/RAID-6-redundant-array-of-independent-disks. Accessed 10 Mar. 2017.

[4] Rychlik, Marek, and Mohamad Moussa. “I-Corps Pitch.” 22 Feb. 2017, Tucson, TLA I-Corps.

Contacts
Paul Tumarkin