Dat is a new peer-to-peer hypermedia protocol providing public-key-addressed file archives which can be synced securely & browsed on-demand (which we mentioned briefly in another recent post on newly emerging decentralized and peer-to-peer technologies, where Beaker browser is a dedicated browser that handles the dat protocol). Dat is a dataset synchronization protocol that does not assume a dataset to be static or that the entire dataset is to be downloaded. The protocol is agnostic to the underlying transport layer or mechanism and data is stored in a format called SLEEP (Syncable Lightweight Event Emitting Persistence, files are append-only in nature).
The key features of Dat’s protocol and network layer design are:
- Content Integrity - Data and publisher integrity is verified through use of signed hashes of content.
- Decentralized Mirroring - Users sharing the same Dat automatically discover each other & exchange data in a swarm.
- Network Privacy - Dat provides certain privacy guarantees incl. end-to-end encryption.
- Incremental Versioning - Datasets can be efficiently sync’d, even in real time, to other peers.
- Random Access - Huge file hierarchies can be efficiently traversed remotely.
A reference implementation is available in Javascript. There's a Rust implementation actively under development as well.
Dat links are Ed25519 public keys which have a length of 32 bytes (64 characters Hex encoded). You can represent a Dat link in the following ways & Dat clients will be able to understand them:
- The standalone public key:
8e1c7189b1b2dbb5c4ec2693787884771…
- Using the dat:// protocol:
dat://8e1c7189b1b2dbb5c4ec2693787884771…
- As part of an HTTP URL:
https://datproject.org/8e1c7189b1b2dbb5…
Dat resembles IPFS in some ways and is also meant as an alternative hypermedia protocol to HTTP.
Hypercore and Hyperdrive: Protocol Stack + Filesystem
The storage, content integrity and networking protocols are implemented in a module called Hypercore (in Javascript and Python). For the purpose of synchronizing datasets there's a file system module called Hyperdrive on top of Hypercore. Hyperdrive works well when the data can be represented as files on a filesystem, which is the main use case with Dat.
Hypercore Registers: Digital Record-keeping
Hypercore Registers, one of the core mechanisms of Dat, are append-only streams of cryptographically hashed and signed content that can be verified by anyone that has the public key of the writer associated with that feed. Dat has two registers, for content and for metadata. The former contains the files of the repository and the latter stores the metadata about the files (like name, size, last modified time/date, etc.) — both registers are replicated in the process of synchronizing with another peer. When files are added each one of them is split up into some number of fragments and the chunks are then arranged into a Merkle tree which is used after for version control and replication processes.
Decentralized Mirroring
Dat is a p2p protocol designed to exchange pieces of a dataset amongst a swarm of peers. As soon as a peer acquires their first piece of data in the dataset they can choose to become a partial mirror for the dataset. If someone else contacts them and needs the piece they have, they can choose to share it. This can happen simultaneously while the peer is still downloading the pieces they want from others.
Source Discovery
Source discovery can happen over many kinds of networks, as long as the following actions can be modeled:
• join(key, [port]) — the performing of lookups on an interval for key (and specify port if wanting to announce that you share key as well).
• leave(key, [port]) — stop looking for key and specify port to stop announcing that you share key as well.
• foundpeer(key, ip, port) — called when a peer is found by performing a lookup.
In the Dat implementation the above actions are implemented on top of three types of discovery networks:
• (Centralized) DNS Name Servers — a standard mechanism for resolving keys to readable/intelligible/easy-to-remember human-readable addresses (implemented in Javascript).
• Multicast DNS (mDNS) — protocol resolving hostnames to IP addresses — useful for discovering peers on local networks (to be replaced with Hyperswarm later on).
• Kademlia DHT (Distributed Hash Table) — mitigating central points of failure and increasing the probability of Dat working even in the case of DNS servers being unreachable (see Hyperswarm).
Other discovery networks can also be implemented as needed, the above three having been chosen as a starting point so as to have a complementary set of strategies for increasing the probability of source discovery.
Peer Connections
After discovery, Dat should have a list of potential sources to try and contact. The reference implementation supports TCP, HTTP and UDP, although Dat itself is transport agnostic.
Further Links and Resources
https://datprotocol.github.io/how-dat-works/
https://github.com/datprotocol/whitepaper/blob/master/dat-paper.pdf
https://github.com/datproject/dat
https://github.com/datproject/dat-desktop
https://github.com/datproject/sdk — Write your own dat app.
https://github.com/datproject/dat-node — Node module for creating dat:// applications on distributed file systems.
https://github.com/datrs ; https://datrs.yoshuawuyts.com/ — Rust implementation.
Summary/Conclusion
Mostly a good learning resource for how to design protocols in combining primitives and build apps on top of that, from what I can conclude for myself. Building web apps and web sites on top of Dat and using the Beaker browser is super easy and straightforward (but not, at the moment, particularly secure and implementations are reference, i.e. clunky).