In 2006, the British mathematician and data science entrepreneur coined the phrase which quickly spread — “data is the new oil”. Perhaps it is not a coincidence that the phrase was coined when fields working directly with data, e.g., artificial intelligence, machine learning and data science were beginning to attract public attention.
Blockchain technology made the data, which had already been increasing at a staggering rate, more abundant. Add to this many kinds of data, such as
- historical price data on CEXs and DEXs,
- trading data (data on futures, and perpetual futures volume and volatility among others),
- importing off-chain data onto the chain,
- data on blocks produced by blockchain. Only Ethereum, the largest blockchain network, has produced more than 18 million blocks from its beginning until the time of this writing
, and a humongous amount of data in Web3.
In this article I’ll try to explain what different categories of players do in Web3 data landscape.
One of the most praised features of the blockchain technology is its transparency. All the data is public and can be accessed by anyone. But there is so much information about transactions, fees, addresses etc. that even the most passionate crypto enthusiast would be get baffled.
Enter a blockchain explorer. A blockchain explorer (also called a block explorer) is a software which using API retrieves data from multiple blockchain nodes. Explorers then arrange the fetched data in databases in a searchable format that can be accesses with a web browser. Below is the snapshot from Blockchain.com, the blockchain explorer for Bitcoin.
A typical block explorer gives information on various metrics, such as average block size, the most recent blocks, latest transactions, average transaction fees (gas), mempool status (unconfirmed transactions) among others. Block explorer can also be employed to track trades of big crypto players, aka whales, which have the potential to impact the crypto markets.
Block explorers are usually built to scan a particular blockchain network. Released in 2015, Etherscan is an Ethereum block explorer and analytics platform where you can find a wide range of data on Ethereum network from addresses and transaction hashes to tokens and blocks. Solana Explorer and Solanabeach are block explorers designed for, well, Solana.
Billions of dollars are traded every day on crypto exchanges. Trading volume at the largest crypto exchange, Binance is shy of $8 billion.
Decentralized exchanges (DEXs), though not as large as CEXs yet, are still a significant part of the crypto and DeFi ecosystem.
This trading activity generates a lot of data — OHLC (open, high, low and close for a particular time frame), tick-level order books (bids and offers), volatility to name just a few. This kind of data is a valuable tool for quant and algo crypto traders who build their models upon it. The problem is that to store so much data for such a long period requires a lot of storage capacity and servers. So, it isn’t surprising that after some time interval exchanges toss away this trove of data. This created an opportunity for a niche market which was filled by data providers, such as Kaiko and Amberdata. They offer data almost anything CEX- and DEX-related — real-time and historical price data both on spot and derivatives markets, full order book data, and liquidations among others.
Crypto intelligence platforms
Though Kaiko and Amberdata offer a huge amount of trading data, they cannot reveal to which protocols “smart money” is investing at the moment. They are not in the business of analyzing on-chain data. This is what crypto intelligence platforms do.
One of the biggest players in this field is Arkham Intelligence, the company which scans on-chain activity “and deanonymizes blockchain transactions, showing users the people and companies behind blockchain activity”. Arkham uses its algorithmic address-matching engine, Ultra, to attach wallets to specific real entities, such as crypto ventures funds or exchanges. This can be very valuable to make informed decisions in the crypto markets.
At the time of writing, Arkham covers not only the largest blockchains, Bitcoin and Ethereum, but also Base, BNB Chain, Polygon, and Optimism among others. In July, the released the first of its kind on-chain intelligence marketplace which bridges buyers and sellers of crypto intelligence. If you don’t understand how this works, an example will make it clear I believe. One of the offers in the marketplace at the moment is the following.
As you see, a user compiled a set of wallets known for their pump and dump activity. You can buy the information for 100K $ARKM, the native token of the platform. Or you can just place a bid for 10,000 $ARKM. One can also be a buyer of crypto intelligence. The owner of the offer below wants to find an address of Kosmos Ventures, a multi-strategy investment firm specializing in digital assets.
Although Arkham Intelligence definitely creates value for crypto traders deanonymizing the blockchain, there are controversies about the platform, especially from those privacy-obsessed. They fear that bringing transparency to the blockchain can dox, i.e., reveal the digital identities of crypto wallet owners.
Dashboards (Dune Analytics)
Dune is a platform containing data on “pretty much anything web3-related, including for EVMs like Ethereum, Polygon, Goerli, and Optimism — and non-EVM chains like Solana and Bitcoin.” It is a goldmine for a retail crypto investor because here you can find tools to analyze data on various protocols and tokens. Dune is maintained by the community where anyone interested in onchain data and able to query the data through SQL can contribute to the platform thus becoming a Dune Wizard.
Blockchain oracles are a piece of software linking blockchain networks to external systems. They enable off-chain, real-world data to be transported to the blockchain networks without which most of DeFi would be impossible because many decentralized applications require external inputs.
Many DeFi applications need external data. For example, an on-chain betting market would need real-time odds from multiple bookmakers; or a decentralized trading app where you can trade a security linked to ETH futures price should be able to fetch ETH futures prices from outside exchanges, such as Chicago Mercantile Exchange (CME). So, there’s an urge to connect smart contracts with outside world information.
This is what a blockchain oracle does. It is a third-party service feeding real-world data into smart contracts powering DeFi. Decentralized oracles go even one step further by combining oracles into one system. They query multiple data sources and return the data to blockchain. The aim is to reduce the risk of the single point of failure.
Chainlink is a leading decentralized oracle network. Its architecture consists of three parts — Basic Request Model, Decentralized Data Model, Off-Chain Reporting. Basis Request Model is what its name suggests. If a smart contract needs information about at which price SOL is trading now on Binance, Basic Request Model will do it. This part of the Chainlink architecture is responsible for querying data from a single data source.
Decentralized Data Model (DDM) introduces the idea of on-chain aggregation. Data is aggregated from multiple independent oracle nodes which increases reliability and trustworthiness of the answer. Chainlink Data Feeds function based on the Decentralized Data Model. Data Feeds are sources of off-chain data, such as weather events, business financials, outcomes of sports events or asset prices. Data is aggregated on-chain so that consumers can always retrieve the answer.
Finally, Off-Chain Reporting (OCR) is what makes Chainlink truly special in the context of decentralization. The execution happens mainly off-chain. Oracle operators (nodes) over a peer-to-peer network communicate with each other. Each node regularly reports its data and approves it with the signature. All reports are aggregated in one transaction which is the final answer for that round and which is then transmitted. The main advantage for oracle nodes of aggregating reports into a single transaction is that they pay much less gas costs. Submitting one transaction instead of many of them decreases congestion of Chainlink blockchain, too.
Web3 data landscape is rich and multifaceted with players providing various services. In this article I tried to categorize several types of actors related to Web3 data. There are crypto intelligence platforms, the most popular ones being Arkham and Nansen, which analyze on-chain activity and link wallets to real-world entities; there are block explorers, a piece of software which through API retrieve data from the nodes of a particular blockchain network; there are trading data providers, such as Kaiko and Amberdata which offer real-time and historical data everything trading-related from token prices to order book data; there are decentralized oracle networks without which it is hard to imagine blockchain ecosystem bring off-chain data onto chain; there are platforms providing beautiful visualizations on different protocols and tokens. I am pretty sure this list is far from completeness or comprehensiveness. Think of this article as a humble attempt to throw some light on the Web3 data landscape.