STORJ （STORJ）白皮书.pdf-_环境100文库huanjing100.com

资源描述：

StorjA Peer-to-Peer Cloud Storage NetworkShawn Wilkinson shawnstorj.io,Tome Boshevski tomestorj.io,Josh Brando josh.brando ,James Prestwich jamesstorj.io,Gordon Hall gordonhallopenmailbox.org,Patrick Gerbes Philip Hutchins Chris Pollard With contributions from Vitalik Buterin December 15, 2016v2.0AbstractA peer-to-peer cloud storage network implementing client-side encryp-tion would allow users to transfer and share data without reliance on athird party storage provider. The removal of central controls would mit-igate most traditional data failures and outages, as well as signi cantlyincrease security, privacy, and data control. Peer-to-peer networks aregenerally unfeasible for production storage systems, as data availabilityis a function of popularity, rather than utility. We propose a solution inthe of a challenge-response veri cation system coupled with directpayments. In this way we can periodically check data integrity, and of-fer rewards to peers maintaining data. We further propose a model foraddressing access and perance concerns with a set of independent orfederated nodes.11 IntroductionCloud storage has come to rely almost exclusively on large storage providers act-ing as trusted third parties to transfer and store data. This system su ers fromthe inherent weaknesses of a trust-based model. Because client-side encryp-tion is nonstandard, the traditional cloud is vulnerable to a variety of securitythreats, including man-in-the-middle attacks, malware, and application awsthat expose private consumer and corporate data. Moreover, because manystorage devices rely on the same infrastructure, failures is correlated across lesand systems.A decentralized cloud storage network o ers many advantages compared todatacenter-based cloud storage. Data security can be maintained using client-side encryption, while data integrity will be maintained via a proof of retrievabil-ity. The impact of infrastructure failures and security breaches will be greatlyreduced. An open market for data storage may drive down costs for variousstorage services by enabling more parties to compete using existing devices.Data on the network will be resistant to censorship, tampering, unauthorizedaccess, and data failures. This paper describes a concrete implementation ofsuch a network, and a set of tools for interacting with that network.2 DesignStorj is a protocol that creates a distributed network for the ation andcution of storage contracts between peers. The Storj protocol enables peerson the network to negotiate contracts, transfer data, verify the integrity andavailability of remote data, retrieve data, and pay other nodes. Each peer isan autonomous agent, capable of pering these actions without signi canthuman interaction. Many of the basic tools for these interactions are describedin this document. Full protocol documentation can be found elsewhere [1].2.1 Files as Encrypted ShardsA shard is a portion of an encrypted le to be stored on this network. Shardinghas a number of advantages to security, privacy, perance, and availability.Files should be encrypted client-side before being sharded. The referenceimplementation uses AES256-CTR, but convergent encryption or any other de-sirable system could be implemented. This protects the content of the data2from the storage provider, or farmer, housing the data. The data owner retainscomplete control over the encryption key, and thus over access to the data.The data owner may separately secure knowledge of how a le is shardedand where in the network the shards are located. As the set of shards in thenetwork grows, it becomes exponentially more di cult to locate any given shardset without prior knowledge of their locations see Section 6.3. This impliesthat security of the le is proportional to the square of the size of the network.Shard size is a negotiable contract parameter. To preserve privacy, it isrecommended that shard sizes be standardized as a byte multiple, such as 8 or32 MB. Smaller les may be lled with zeroes or random data. Standardizedsizes dissuade side-channel attempts to determine the content of a given shard,and can mask the ow of shards through the network.Sharding large les like video content and distributing the shards acrossnodes reduces the impact of content delivery on any given node. Bandwidthdemands are distributed more evenly across the network. In addition, the end-user can take advantage of parallel transfer, similar to BitTorrent [2] or otherpeer-to-peer networks.Because peers generally rely on separate hardware and infrastructure, datafailure is not correlated. This implies that creating redundant mirrors of shards,or applying a parity scheme across the set of shards is an extremely e ective of securing availability. Availability is proportional to the number ofnodes storing the data.Figure 1 Visualizing the Sharding Process1. Files are encrypted.2. Encrypted les are split into shards, or multiple les are combined to a shard.33. Audit pre-processing is pered for each shard see Section 2.3.4. Shards may be transmitted to the network.2.2 Kademlia and Modi cationsStorj is built on Kademlia [3], a distributed hash table DHT. It is importantto note that shards are not stored in the hash table. Rather, Kademlia creates adistributed network with e cient message routing and other desirable qualities.Storj adds several message types, and enhancements to core Kademlia function-ality see Appendix A. In the future, the hash table may be used as a store fordata location ination, or other purposes.2.2.1 Signature Veri cationSimilar to S/Kademlia [4], the Storj network requires peers to sign messages.To join the network a node must create an ECDSA keypair, kpriv;kpub. TheKademlia Node ID corresponds to ripemd160sha256kpub. As such, eachNode ID in the Storj network is also a valid Bitcoin address, which the nodecan spend from. Nodes sign all messages, and validate message signatures be-fore processing messages. This modi cation enforces long-term identity on thenetwork, and provides a proof of work deterrent to Eclipse attacks on Kademliarouting see Section 5.2. In the future there are a variety of other uses for thisaddress.2.3 Proofs of RetrievabilityProofs of retrievability guarantee the existence of a certain piece of data on aremote host. The ideal proof minimizes message size, can be calculated quickly,requires minimal pre-processing, and provides a high degree of con dence thatthe le is available and intact. To provide knowledge of data integrity andavailability to the data owner, Storj provides a standard at for issuing andverifying proofs of retrievability via a challenge-response interaction called anaudit or heartbeat.Our reference implementation uses Merkle trees [5] and Merkle proofs. Afterthe sharding process the data owner generates a set of n random challenge saltss0;s1;sn 1 and stores the set of salts s. The challenge salts are each prependedto the data d, and the resulting string is hashed to a pre-leaf p as suchpi Hsi d. Pre-leaves are hashed again, and the resulting digests become4the set of leaves l of a standard Merkle tree such that li HHsi d. Theleaf set is lled with hashes of a blank string until its cardinality is a power oftwo, to simplify the proof process.Figure 2 Storj Audit Tree withjlj 4Red outlines indicate the elements of a Merkle proof for s0The data owner stores the set of challenges, the Merkle root and the depthof the Merkle tree, then transmits the Merkle trees leaves to the farmer. Thefarmer stores the leaves along with the shard. Periodically, the data ownerselects a challenge from the stored set, and transmits it to the farmer. Challengesmay be selected according to any reasonable pattern, but should not be reused.The farmer uses the challenge and the data to generate the pre-leaf. The pre-leaf, along with the set of leaves, is used to generate a Merkle proof, which issent back to the data owner.The Storj Merkle proof always consists of exactly log2jlj 1 hashes, andthus is a compact transmission, even for large trees. The data owner usesthe stored Merkle root and tree depth to verify the proof by verifying that itslength is equal to the tree depth and the hashes provided recreate the storedroot. This scheme does not allow false negatives or false positives, as the hashfunction requires each bit to remain intact to produce the same output.52.3.1 Partial AuditsThe Merkle tree audit scheme requires signi cant computational overhead forthe data owner, as the entire shard must be hashed many times to generatepre-leaves. An extension of this scheme utilizes subsets of the data to perpartial audits, reducing computational overhead. This also has the advantageof signi cantly reducing I/O burden on farmer resources.This extension relies on two additional selectable parameters a set of byteindices x within the shard and a set of section lengths in bytes, b. The dataowner stores a set of 3-tuples s;x;b. To generate pre-leaf i, the data ownerprepends si to the bi bytes found at xi. During the audit process, the veri ertransmits s;x;bi, which the farmer uses to generate a pre-leaf. The Merkleproof is generated and veri ed as normal.Figure 3 Storj Audit Tree withjlj 4 and Partial AuditsRed outlines indicate the elements of a Merkle proof for s0Partial audits provide only probabilistic assurance that the farmer retainsthe entire le. They allow for false positive results, where the veri er believes thefarmer retains the intact shard, when it has actually been modi ed or partiallydeleted. The probability of a false positive on an individual partial audit iseasily calculable see Section 6.46Thus the data owner can have a known con dence level that a shard isstill intact and available. In practice, this is more complex, as farmers mayimplement intelligent strategies to attempt to defeat partial audits. Fortunately,this is a bounded problem in the case of iterative audits. The probability ofseveral consecutive false positives becomes very low, even when small portionsof the le have been deleted.In addition, partial audits can be easily mixed with full audits without re-structuring the Merkle tree or modifying the proof veri cation process. Manyaudit strategies that mix full and partial veri cation can be envisioned, each ofwhich provides di erent levels of con dence over time.A further extension of this scheme could use a deterministic seed instead ofa set of byte inds. This seed would be used to generate inds of many non-consecutive bytes in the le. Requiring many non-consecutive random byteswould provide additional resistance against malicious farmers attempting toimplement audit evasion strategies without signi cant extra overhead from pro-cessing or I/O.2.3.2 Other Proof-of-Retrievability SchemesOther audit schemes were examined, but deemed generally unfeasible. For ex-ample, Shacham and Waters proposed a compact proof [6] with several advan-tages over Merkle-tree schemes. This construction allows for an endless streamof challenges to be generated by the data owner with minimal stored ina-tion. It also allows for public veri ability of challenge responses.However, initial implementations indicate that the client-side pre-processingrequired for the Shacham-Waters scheme requires at least one order of magni-tude more computation time than hash-based s, rendering it too slowfor most applications.Proof of retrievability is an area of ongoing research, and other practicalschemes may be discovered in the future. As proof of retrievability schemesare discovered and implemented, the choice of scheme may become a negotiablecontract parameter. This would allow each data owner and node to implementa wide variety of schemes, and select the most advantageous scheme for a givenpurpose.72.3.3 Issuing AuditsTo issue audits, Storj extends the Kademlia message set with a new type AU-DIT for a full list of Kademlia extensions, see Appendix A. These messagesare sent from data owners to farmers and contain the hash of the data and achallenge. The farmer must respond with a Merkle proof as described above.Upon receipt and validation of the Merkle proof, the data owner must issuepayment to the farmer according to agreed-upon terms.Figure 4 Issuing and Verifying Storj Audits2.4 Contracts and NegotiationData storage is negotiated via a standard contract at [7]. The contract is aversioned data structure that describes the relationship between data owner andfarmer. Contracts should contain all ination necessary for each node to a relationship, transfer the data, create and respond to audits over time, andarbitrate payments. This includes shard hash, shard size, audit strategy, andpayment ination. Storj implements a publish/subscribe system to connectparties interested in ing a contract see Section 2.6.Each party should store a signed copy of the contract. Contracts exist solelyfor the bene t of the data owner and farmer, as no other node can verify theterms or state of the relationship. In the future, contract ination may bestored in the DHT, or in an external ledger like a Blockchain, which may allowsome outside veri cation of relationship terms.The contracting system extends Kademlia with four new message typesOFFER, CONSIGN, MIRROR, and RETRIEVE.To negotiate a contract, a node creates an OFFER message and sends itto a prospective partner. Prospective partners are found via the publish/sub-8scribe system described in Section 2.6. The OFFER message contains a fully-constructed contract that describes the desired relationship. The two nodesrepeatedly swap signed OFFER messages. For each new message in the OF-FER loop, the node either chooses to terminate negotiations, respond with anew signed counter-o er, or accept the contract by countersigning it. Once anOFFER is signed by both parties, they each store it locally, keyed by the hashof the data.Once an agreement is reached, the data owner sends a CONSIGN messageto the farmer. The message contains the leaves of the audit-tree. The farmermust respond with a PUSH token that authorizes the data owner to upload thedata via HTTP transfer see Section 2.10. This token is a random number,and can be delegated to a third party.RETRIEVE messages signify the intent to retrieve a shard from a farmer.These messages are nearly identical to CONSIGN messages, but do not containthe audit-tree leaves. The farmer responds to a valid RETRIEVE messagewith a PULL token that authorizes download of the data via a separate HTTPtransfer.MIRROR messages instruct a farmer to retrieve data from another farmer.This allows data owners to create redundant copies of a shard without expendingsigni cant additional bandwidth or time. After a successful

展开阅读全文

STORJ （STORJ）白皮书.pdf

最新标签