Microsoft open sources its data compression algorithm and hardware for the cloud

The amount of data that the big cloud computing providers now store is staggering, so it’s no surprise that most store all of this information as compressed data in some form or another — just like you used to zip your files back in the days of floppy disks, CD-ROMs and low-bandwidth connections. Typically, those systems are closely guarded secrets, but today, Microsoft open sourced the algorithm, hardware specification and Verilog source code for how it compresses data in its Azure cloud. The company is contributing all of this to the Open Compute Project (OCP).

Project Zipline, as Microsoft calls this project, can achieve 2x higher compression ratios compared to the standard Zlib-L4 64KB model. To do this, the algorithm — and its hardware implementation — were specifically tuned for the kind of large data sets Microsoft sees in its cloud. Because the system works at the systems level, there is virtually no overhead and Microsoft says that it is actually able to manage higher throughput rates and lower latency than other algorithms are currently able to achieve.

Microsoft stresses that it is also contributing the Verilog source code for register transfer language (RTL) — that is, the low-level code that makes this all work. “Contributing RTL at this level of detail as open source to OCP is industry leading,” Kushagra Vaid, the general manager for Azure hardware infrastructure, writes. “It sets a new precedent for driving frictionless collaboration in the OCP ecosystem for new technologies and opening the doors for hardware innovation at the silicon level.”

Microsoft is currently using this system in its own Azure cloud, but it is now also partnering with others in the Open Compute Project. Among these partners are Intel, AMD, Ampere, Arm, Marvell, SiFive, Broadcom, Fungible, Mellanox, NGD Systems, Pure Storage, Synopsys and Cadence.

“Over time, we anticipate Project Zipline compression technology will make its way into several market segments and usage models such as network data processing, smart SSDs, archival systems, cloud appliances, general purpose microprocessor, IoT, and edge devices,” writes Vaid.