Improving Tezos Storage

Date: 2019-01-15
Catégorie: Blockchains



Running a Tezos node currently costs a lot of disk space, about 59 GB for the context database, the place where the node stores the states corresponding to every block in the blockchain, since the first one. Of course, this is going to decrease once garbage collection is integrated, i.e. removing very old information, that is not used and cannot change anymore (PR720 by Thomas Gazagnaire, Tarides, some early tests show a decrease to 14GB ,but with no performance evaluation). As a side note, this is different from pruning, i.e. transmitting only the last cycles for “light” nodes (PR663 by Thomas Blanc, OCamlPro). Anyway, as Tezos will be used more and more, contexts will keep growing, and we need to keep decreasing the space and performance cost of Tezos storage.

As one part of our activity at OCamlPro is to allow companies to deploy their own private Tezos networks, we decided to experiment with new storage layouts. We implemented two branches: our branch IronTez1 is based on a full LMDB database, as Tezos currently, but with optimized storage representation ; our branch IronTez2 is based on a mixed database, with both LMDB and file storage.

To test these branches, we started a node from scratch, and recorded all the accesses to the context database, to be able to replay it with our new experimental nodes. The node took about 12 hours to synchronize with the network, on which about 3 hours were used to write and read in the context database. We then replayed the trace, either only the writes or with both reads and writes.

Here are the results:

plot_sizes.png

The mixed storage is the most interesting: it uses half the storage of a standard Tezos node !

plot_times-1.png

Again, the mixed storage is the most efficient : even with reads and writes, IronTez2 is five time faster than the current Tezos storage.

Finally, here is a graph that shows the impact of the two attacks that happened in November 2018, and how it can be mitigated by storage improvement:

plot_diffs.png

The graph shows that, using mixed storage, it is possible to restore the storage growth of Tezos to what it was before the attack !

Interestingly, although these experiments have been done on full traces, our branches are completely backward-compatible : they could be used on an already existing database, to store the new contexts in our optimized format, while keeping the old data in the ancient format.

Of course, there is still a lot of work to do, before this work is finished. We think that there are still more optimizations that are possible, and we need to test our branches on running nodes for some time to get confidence (TzScan might be the first tester !), but this is a very encouraging work for the future of Tezos !



Au sujet d'OCamlPro :

OCamlPro développe des applications à haute valeur ajoutée depuis plus de 10 ans, en utilisant les langages les plus avancés, tels que OCaml et Rust, visant aussi bien rapidité de développement que robustesse, et en ciblant les domaines les plus exigeants (méthodes formelles, cybersécurité, systèmes distribués/blockchain, conception de DSLs). Fort de plus de 20 ingénieurs R&D, avec une expertise unique sur les langages de programmation, aussi bien théorique (plus de 80% de nos ingénieurs ont une thèse en informatique) que pratique (participation active au développement de plusieurs compilateurs open-source, prototypage de la blockchain Tezos, etc.), diversifiée (OCaml, Rust, Cobol, Python, Scilab, C/C++, etc.) et appliquée à de multiples domaines. Nous dispensons également des [formations sur mesure certifiées Qualiopi sur OCaml, Rust, et les méthodes formelles] (https://training.ocamlpro.com/) Pour nous contacter : contact@ocamlpro.com.