What Will Blockchain Mean for Data Storage?

This is part two in a three-part series covering blockchain technologies. Read part one to learn how blockchain is modernizing enterprise apps.

Emerging technologies almost always raise an important question for companies on the brink of breakthroughs: What will this innovation mean for our existing IT infrastructure? Do we have the foundation to support it?

Blockchain just may be one of these scenarios. Those leveraging it will certainly face unique, new implications on an already complex center of gravity: data management. To improve applications, supply chains, contracts, transactions, processes, and more, getting data dialed in is a foundational step. Let’s look at why.

Blockchain Data Basics

As I noted in part one, blockchains are permanent, uneditable digital records of information, or “immutable ledgers.” (Immutable means you can’t delete or edit them, and ledgers are files where transactions are recorded.) These ledgers are distributed across a collection of decentralized nodes powered by computers around the world, rather than one centralized location, like a bank’s server. And because records exist in so many places, they aren’t owned by one entity.

In theory, no one can delete, change, or counterfeit records once they’re on the chain. And when records can’t be deleted, the associated data piles up.

Plus, blockchains are meant to be fast, streamlined, and lightweight. They’re not ideal for storing large amounts of data. Instead, when a transaction is logged onto a blockchain—say, a record of purchase—that event is logged across nodes. Any other data related to that transaction—for example, an image of the purchase, a description, etc.—is stored elsewhere. (This is not dissimilar to solutions like Portworx® by Pure Storage® that allows containerized apps to be stateful by connecting them to underlying storage.) That’s called “off-chain” data, and it needs to be easily accessible, even in a distributed environment.

How Might Data Flow Through a Blockchain?

Say a blockchain is recording the transfer process for an international shipment. When the shipment passes through customs, this is recorded on the blockchain—with metadata relating to its contents, the date, destination, etc. Aboard the ship, IoT sensors record the temperature and humidity in the container during transit. That data is permanently recorded in the event that there’s a quality concern upon receipt.

The beauty of this is that no one party owns the ledger, so no records on it can be disputed. Delays can be immediately traced. Data can’t be manipulated or removed. However, all of that related data likely won’t be stored on the blockchain. Instead, an encrypted hash directs users to off-chain storage where shipment data is logged, perhaps connected via an oracle network.

How Oracle Networks Connect Blockchains to Data

Blockchains on their own make great smart contracts and ledgers. They can maybe even carry out some simple calculations, but they often lack advanced capabilities and efficiencies. They can’t access off-chain data on their own, for one. Without a way to “plug” them into real-world data and applications, it’s hard to leverage the benefits of blockchain. Then, hitching a blockchain to a single server, API, or database makes the blockchain moot. Why? By introducing those capabilities, you also reintroduce centralization.

That is until oracle networks.

An oracle network, such as Chainlink, is a decentralized third-party technology that connects blockchain ledgers and smart contracts to the real world—and data storage. These provide the connective tissue, all while remaining decentralized.

But that can’t be just any storage—especially as blockchain applications scale. To uphold the promise of blockchain’s speed and efficiency, storage has to be fast, incredibly scalable, and able to consolidate diverse types of data.

Is a Blockchain a Replacement for a Database?

Yes and no. Both deal in the storage of data, but they do it differently. And where the blockchain excels in immutability, it lacks in efficiency. Many blockchains can’t exist without oracle networks that connect them to underlying database storage. You could think of a blockchain as a next-gen database in that it does store data, but with some key differences:

  • Blockchains are distributed, not centralized. Typically, your database exists in one place where you’re the sole administrator of what is written to it—and can control what goes on it. A blockchain doesn’t exist on one server, owned by one entity. It exists across many nodes, each owned by a different user.
  • Blockchains are immutable. This means that once something is stored on the blockchain, it can’t be deleted or changed. It’s a system of record that can only be added to. Traditional, transactional databases are designed to be modified. Right away, this makes blockchains ideal for some use cases but not all.
  • Blockchains have many administrators, not just one. This removes the need to trust any single administrator or person on the blockchain. The blockchain itself is the proof of validity and defense against fraud or mistrust.
  • Blockchains aren’t efficient for storing large file sizes. It’s both costly and time-consuming to try to store large amounts of data on a public blockchain. Storage of data on-chain isn’t a very scalable or efficient route for anything other than core ledger data and related hashes. Costs can rack up per terabyte on the chain with each transaction, plus fees each time you want to read that data.¹ It also takes time that SLAs can’t afford, such as minutes per megabyte. This makes blockchains nearly dependent on some sort of off-chain storage.

These differences can make blockchain a good fit when you need a system of record wrapped in total security, validity, and traceability. But for storage of larger files and more associated metadata, underlying databases will still be critical.

Blockchain Needs Dedicated, Modern Storage to Deliver

Blockchain is still maturing—good news for enterprises, but challenging news for storage considerations. Off-chain data is going to accumulate exponentially, and better data storage platforms must be embedded into these new strategies. They’ll also require modified data management practices, access permissions, data models, and datastores, so they don’t cannibalize storage for existing apps.

“Blockchain won’t be able to disrupt any real-world industry unless the problem of data storage is resolved.” –JaxEnter.com

For blockchain applications to meet their SLAs, off-chain data storage will need to be powerful, elastic, and scalable. Unified fast file and object storage (UFFO), in particular, will be important for managing data on a distributed system. Enterprises’ best bets as they wade into this new territory is to leverage and connect to existing, proven technologies such as Pure Storage FlashBlade® with NVMe.

Stay tuned for part three!


  1. https://medium.com/coinmonks/storing-on-ethereum-analyzing-the-costs-922d41d6b316