Last May, Western Digital introduced its 26TB hard drive, the industry’s highest capacity drive available. It allows cloud-scale customers to achieve the ginormous capacities required for today’s massive data storage needs. The storage leader’s unceasing capacity gains are made possible by integrating multiple technology innovations. One such compilation is UltraSMR, which helped bring the 26TB hard drive to fruition.
UltraSMR fuses hardware, controller, and read channel technologies with proprietary firmware and algorithms to significantly expand the capacity advantage that Shingled Magnetic Recording (SMR) provides over Conventional Magnetic Recording (CMR).
Collectively, these technologies enhance the capacity of an SMR drive by an additional 10%, allowing a gain of 20% over a traditional CMR drive.
But reaching industry milestones like these does not happen overnight. The journey to UltraSMR was a multi-step, multi-year process as successive innovations were incorporated.
First there was SMR
Chad Mitchell, director of HDD CTO innovation and IP at Western Digital, leads a team of renowned engineers, each holding a multitude of patents, several of them with triple digits. Most of the engineers are based in Rochester, Minn., where UltraSMR has been in the making for the past decade. Mitchell’s team designs the features for each generation of HDD system on chip (SOC), including their ability to code, decode, and convert the 0’s and 1’s of data reads and writes.
“It started back in 2013 with the first SMR version Western Digital put out in the world’s first 14TB HDD,” said Mitchell. “Incrementally, we added the technologies that culminate into today’s UltraSMR.”
SMR, like shingles on a roof, overlays tracks on a disk so more tracks and data can be squeezed into the same space, enabling higher density and impressive capacity gains.
“If you put shingles all across your roof and they’re only adjacent to each other, you can only put on so many shingles,” said Ravi Pendekanti, SVP of HDD product management at Western Digital, in an interview with TechRepublic. If you make the shingles overlap a bit, you can add more shingles in the same capacity or in the same area. That’s essentially what SMR is.”
But without the separation of tracks, the only way to read and write data without corrupting other tracks is by doing so sequentially. That’s why in SMR drives, data isn’t written immediately onto the drive but written into a cache and then onto media as a large unit where the entire track is read instead of individual sectors.
SMR allows coding over larger areas because the data is coalesced before it’s written, thereby minimizing waste and increasing mechanical efficiencies. For enterprise use cases, it requires some upfront software changes to perform this type of data handling, but the one-time investment brings huge, ongoing benefits in terms of capacity efficiencies.
UltraSMR: A giant leap forward
As hard drive capacities increased, so did the challenge of error correction. As more bits get squeezed into the same form factor blocks, it became evident that an increased sector size would allow for more complex error correction algorithms and processes.
“Before SMR, CMR couldn’t do error correction on big blocks because there wasn’t an opportunity to code over large areas without degrading the random performance, but SMR is different,” said Rick Galbraith, distinguished engineer in Western Digital’s HDD Business.
UltraSMR allows for coding over a much larger area than in the past, and its sequential nature that was once seen as a hurdle now offers new opportunities for intelligent data handling.
To fix errors, the engineering team combines blocks of data with redundancy information to generate codewords. The larger the codeword size, the more perfect an error correction code can be. With partial redundancy covering an entire track, it’s a new world for error correction algorithms that can be more powerful and efficient at the same time.
In addition, the larger spatial area provided by UltraSMR helps even out the signal-to-noise ratio of the hard drive by distributing it over a wider area to dilute any defects.
“The [UltraSMR] advantage is about signal processing on the read back,” said Mitchell. “Solving errors on the fly, which you couldn’t do before.”
UltraSMR isn’t just about a single technology. It is an accumulation of hardware, software, and firmware advances that enabled the corresponding teams to reach the landmark capacity. These technologies include two-dimensional magnetic recording (TDMR), soft-track error correction code (sTECC), Distributed Sector (DSEC), and OptiNAND™, all built to work together.
Two-dimensional magnetic recording (TDMR) refers to adding a second reader on each arm suspended over the media surface. This helps to improve the signal-to-noise ratio (SNR) and avoid confusing the data on a desired track with what is written on adjacent tracks.
“TDMR is the difference between having one eye and two,” explains Galbraith. “TDMR allows depth perception and better judgment of what’s being viewed and reduces the electrical noise components of the read signal.”
In nature, when sensory organs are doubled, humans can get much more information about their surrounding environment. Two eyes rather than one enable depth perception. Two ears rather than one help determine which direction a predator is lurking.
In signal processing, TDMR allows the controller and firmware to combine the signals it reads to filter out off-track noise and better cancel inter-track interference.
Much of the advances of UltraSMR are enabled by groundbreaking error correction algorithms and processes.
“You can’t build anything without an error correction code,” said Galbraith. “We add redundancy to form codewords. A codeword is the combination of a block of data combined with a type of parity redundancy that can detect and correct errors,” he explained. How those codewords are constructed and deciphered is a playground of ongoing innovation.
Soft-track error correction code (sTECC) is a coding mechanism that enabled Western Digital to debut a 20TB SMR product in 2020. It adds error correction parity within tracks, using a small bit of real estate to check for data integrity.
UltraSMR with sTECC introduces parity by adding a correlation scheme where the read channel can analyze whether data is 100% correct. Galbraith describes it as “taking advantage of solving a set of separate puzzles by using information that connects the puzzles together.”
The combination of large block encoding with this advanced error correction algorithm ultimately allows engineers to increase track-per-inch (TPI) and deliver higher capacities.
More coding innovation
Distributed Sector (DSEC) came next, an innovation that spreads data over a group of sectors, breaking them apart and rewriting them across multiple sectors to average any error across the entire track. It’s like a diversified stock portfolio or a mutual fund where you don’t have all your eggs in one basket. Since increasing capacity means tracks are written closer together, distributing data over many sectors enables averaging the signal noise and reducing the margin for any positioning error from between written tracks.
“Logical sectors are distributed physically into many physical sectors,” Galbraith explains. “Physical defects are also distributed logically into many logical sectors. Both properties allow the logical sectors to become more correctable.”
Essentially, if data is spread across the entire track, any error is also spread out in smaller portions. As a result, data can be more easily retrieved instead of being lost entirely.
“The great thing about distributed sector is that it’s rate-less; it takes zero overhead,” said Galbraith. “You get good gains out of the fact that you didn’t add any redundancy to anything and it’s 100% efficient.”
Innovation across HDD and Flash
The final piece of the UltraSMR puzzle was the introduction of OptiNAND, a technology that enhances HDDs with flash embedded into the drive.
The technology is unique to Western Digital, accomplished with its manufacturing and vertical integration across its HDD and SSD engineering teams. Its inventors, including David Hall, distinguished engineer at Western Digital, began thinking about new ways to use NAND back in 2015, even before Western Digital’s acquisition of SanDisk.
OptiNAND improves hard drive capacity and performance by storing metadata in non-volatile flash instead of on the rotating media.
Just like building a house is easier if you start with blueprints, OptiNAND utilizes non-volatile memory to enhance the reliability of the data stored to the disk. The pairing allows a narrowing of the write tracks while using a memory option that could store far more cache than DRAM or NOR flash to increase performance.
The ultimate compilation for capacity leadership
Integrating these technologies over time has brought UltraSMR to fruition and its significant capacity gains over both CMR and SMR.
If SMR is a shingled roof, then UltraSMR is a thatched roof with a better weave, making it stronger and able to weather a windstorm.
“It’s a team effort,” said Mitchell. “The original idea of the innovation goes through a transformation that makes it real. On its journey, a large group of world-class engineers with diverse specialties are needed to write and integrate the new features into the existing customer code.”
Collaboration between hardware, servo mechanical, manufacturing, and firmware teams contributed to UltraSMR’s debut. Firmware engineers were enlisted to integrate the new innovations into the code and ensure they would all work together.
“Combining the innovations carefully and incrementally over the past 10 years allowed Western Digital to continue providing HDDs that maintain our high standard of quality. The team of engineers that made this happen are some of the best in our industry, and their innovative thinking and global collaboration across the entire company had to happen to make it all work together in one system,” said Mitchell.
With this kind of teamwork, the company is on the road to 50TB by continuing to push boundaries that make leaps in capacity. It’s about incremental steps, understanding how different technologies come together in an impactful way, and having a clear roadmap for the future of HDDs.