rkyv (archive) is a zero-copy deserialization framework for rust.

This book covers the motivation, architecture, and major features of rkyv. It is the best way to learn and understand rkyv, but won't go as in-depth on specifics as the documentation will. Don't be afraid to consult these other resources as you need while you read through.

Resources

Learning Materials

  • The rkyv discord is a great place to get help with specific issues and meet other people using rkyv
  • The rkyv github hosts the source and tracks project issues and milestones.

Documentation

  • rkyv, the core library
  • rkyv_dyn, which adds trait object support to rkyv

Benchmarks

  • The rust serialization benchmark is a shootout style benchmark comparing many rust serialization solutions. It includes special benchmarks for zero-copy serialization solutions like rkyv.

Sister Crates

  • bytecheck, which rkyv uses for validation
  • ptr_meta, which rkyv uses for pointer manipulation
  • rend, which rkyv uses for endian-agnostic features

Motivation

First and foremost, the motivation behind rkyv is improved performance. The way that it achieves that goal can also lead to gains in memory use, correctness, and security along the way.

Familiarity with other serialization frameworks and how traditional serialization works will help, but isn't necessary to understand how rkyv works.

Most serialization frameworks like serde define an internal data model that consists of basic types such as primitives, strings, and byte arrays. This splits the work of serializing a type into two stages: the frontend and the backend. The frontend takes some type and breaks it down into the serializable types of the data model. The backend then takes the data model types and writes them using some data format such as JSON, Bincode, TOML, etc. This allows a clean separation between the serialization of a type and the data format it is written to.

Serde describes its data model in the serde book. Everything serialized with serde eventually boils down to some combination of those types!

A major downside of traditional serialization is that it takes a considerable amount of time to read, parse, and reconstruct types from their serialized values.

In JSON for example, strings are encoded by surrounding the contents with double quotes and escaping invalid characters inside of them:

{ "line": "\"All's well that ends well\"" }
          ^^                          ^ ^

numbers are turned into characters:

{ "pi": 3.1415926 }
        ^^^^^^^^^

and even field names, which could be implicit in most cases, are turned into strings:

{ "message_size": 334 }
  ^^^^^^^^^^^^^^^

All those characters are not only taking up space, they're also taking up time. Every time we read and parse JSON, we're picking through those characters in order to figure out what the values are and reproduce them in memory. An f32 is only four bytes of memory, but it's encoded using nine bytes and we still have to turn those nine characters into the right f32!

This deserialization time adds up quickly, and in data-heavy applications such as games and media editing it can come to dominate load times. rkyv provides a solution through a serialization technique called zero-copy deserialization.

Zero-copy deserialization

Zero-copy deserialization is a technique that reduces the time and memory required to access and use data by directly referencing bytes in the serialized form.

This takes advantage of how we have to have some data loaded in memory in order to deserialize it. If we had some JSON:

{ "quote": "I don't know, I didn't listen." }
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Instead of copying those characters into a String, we could just borrow it from the JSON buffer as a &str. The lifetime of that &str would depend on our buffer and we wouldn't be allowed to drop it until we had dropped the string we were using.

Partial zero-copy

Serde and others have support for partial zero-copy deserialization, where bits and pieces of the deserialized data are borrowed from the serialized form. Strings, for example, can borrow their bytes directly from the serialized form in encodings like bincode that don't perform any character escaping. However, a string object must still be created to hold the deserialized length and point to the borrowed characters.

A good way to think about this is that even though we're borrowing lots of data from the buffer, we still have to parse the structure out:

struct Example<'a> {
  quote: &'a str,
  a: &'a [u8; 12],
  b: u64,
  c: char,
}

So a buffer might break down like this:

I don't know, I didn't listen.AAAAAAAAAAAABBBBBBBBCCCC
^-----------------------------^-----------^-------^---
 quote: str                    a: [u8; 12] b: u64  c: char

We do a lot less work, but we still have to parse, create, and return an Example<'a>:

Example {
  quote: str::from_utf8(&buffer[0..30]).unwrap(),
  a: &buffer[30..42],
  b: u64::from_le_bytes(&buffer[42..50]),
  c: char::from_u32(u32::from_le_bytes(&buffer[50..54]))).unwrap(),
}

And we can't borrow types like u64 or char that have alignment requirements since our buffer might not be properly aligned. We have to immediately parse and store those! Even though we borrowed 42 of the buffer's bytes, we missed out on the last 12 and still had to parse through the buffer to find out where everything is.

Partial zero-copy deserialization can considerably improve memory usage and often speed up some deserialiation, but with some work we can go further.

Total zero-copy

rkyv implements total zero-copy deserialization, which guarantees that no data is copied during deserialization and no work is done to deserialize data. It achieves this by structuring its encoded representation so that it is the same as the in-memory representation of the source type.

This is more like if our buffer was an Example:

struct Example {
  quote: String,
  a: [u8; 12],
  b: u64,
  c: char,
}

And our buffer looked like this:

I don't know, I didn't listen.__QOFFQLENAAAAAAAAAAAABBBBBBBBCCCC
^-----------------------------  ^---^---^-----------^-------^---
 quote bytes                    pointer  a           b       c
                                and len
                                ^-------------------------------
                                 Example

In this case, the bytes are padded to the correct alignment and the fields of Example are laid out exactly the same as they would be in memory. Our deserialization code can be much simpler:

unsafe { &*buffer.as_ptr().add(32).cast() }

This operation is almost zero work, and more importantly it doesn't scale with our data. No matter how much or how little data we have, it's always just a pointer offset and a cast to access our data.

This opens up blazingly-fast data loading and enables data access orders of magnitude more quickly than traditional serialization.

Architecture

The core of rkyv is built around relative pointers and three core traits: Archive, Serialize, and Deserialize. Each of these traits has a corresponding variant that supports unsized types: ArchiveUnsized, SerializeUnsized, and DeserializeUnsized.

A good way to think about it is that sized types are the foundation that unsized types are built on. That's not a fluke either, rkyv is built precisely so that you can build more complex abstractions out of lower-level machinery in a safe and composable way. It's not much different from what you normally do while programming!

The system is built to be flexible and can be extended beyond the provided types. For example, the rkyv_dyn crate adds support for trait objects by introducing new traits and defining how they build up to allow trait objects to be serialized and deserialized.

Relative pointers

Relative pointers are the bread and butter of total zero-copy deserialization, completely replacing the use of normal pointers. But why can't we use normal pointers?

Consider some zero-copy data on disc. Before we can use it, we need to load it into memory. But we can't control where in memory it gets loaded. Every time we load it, it could be located at a different address, and therefore the objects inside of it will be located at a different address.

One of the major reasons for this is actually security. Every time you run your program, it may run in a completely different random location in memory. This is called address space layout randomization and it helps prevent exploitation of memory corruption vulnerabilities.

At most, we can only control the alignment of our zero-copy data, so we need to work within those constraints.

This means that we can't store any pointers to that data, inside of it or outside of it. As soon as we reload the data, it might not be at the same address. That would leave our pointers dangling, and would almost definitely result in memory access violations. Some other libraries like abomonation store some extra data and perform a fast fixup step that takes the place of deserialization, but we can do better.

In order to perform that fixup step, abomonation requires that the buffer has a mutable backing. This is okay for many use cases, but there are also cases where we won't be able to mutate our buffer. One example is if we used memory-mapped files.

While normal pointers hold an absolute address in memory, relative pointers hold an offset to an address. This changes how the pointer behaves under moves:

PointerSelf is movedSelf and target are moved
Absolute✅ Target is still at address❌ Target no longer at address
Relative❌ Relative distance has changed✅ Self and target same relative distance apart

This is exactly the property we need to build data structures with total zero-copy deserialization. By using relative pointers, we can load data at any position in memory and still have valid pointers inside of it. Relative pointers don't require write access to memory either, so we can memory map entire files and instantly have access to their data in a structured manner.

rkyv's implementation of relative pointers is the RelPtr type.

Archive

Types that implement Archive have an alternate representation that supports zero-copy deserialization. The construction of archived types happens in two steps:

  1. Any dependencies of the type are serialized. For strings this would be the characters of the string, for boxes it would be the boxed value, and for vectors it would be any contained elements. Any bookkeeping from this step is bundled into a Resolver type and held onto for later. This is the serialize step.
  2. The resolver and original value are used to construct the archived value in the output buffer. For strings the resolver would be the position of the characters, for boxes it would be the position of the boxed value, and for vectors it would be the position of the archived elements. With the original values and resolvers combined, the archived version can be constructed. This is the resolve step.

Resolvers

A good example of why resolvers are necessary is when archiving a tuple. Say we have two strings:

#![allow(unused)]
fn main() {
let value = ("hello".to_string(), "world".to_string());
}

The archived tuple needs to have both of the strings right next to each other:

0x0000      AA AA AA AA BB BB BB BB
0x0008      CC CC CC CC DD DD DD DD

A and B might be the length and pointer for the first string of the tuple, and C and D might be the length and pointer for the second string.

When archiving, we might be tempted to serialize and resolve the first string, then serialize and resolve the second one. But this might place the second string's bytes ("world") between the two! Instead, we need to write out the bytes for both strings, and then finish archiving both of them. The tuple doesn't know what information the strings need to finish archiving themselves, so they have to provide it to the tuple through their Resolver.

This way, the tuple can:

  1. Archive the first string (save the resolver)
  2. Archive the second string (save the resolver)
  3. Resolve the first string with its resolver
  4. Resolve the second string with its resolver

And we're guaranteed that the two strings are placed right next to each other like we need.

Serialize

Types implement Serialize separately from Archive. Serialize creates a resolver for some object, then Archive turns the value and that resolver into an archived type. Having a separate Serialize trait is necessary because although a type may have only one archived representation, you may have options of what requirements to meet in order to create one.

The Serialize trait is parameterized over the serializer. The serializer is just a mutable object that helps the type serialize itself. The most basic types like u32 or char don't bound their serializer type because they can serialize themselves with any kind of serializer. More complex types like Box and String require a serializer that implements Serializer, and even more complex types like Rc and Vec require a serializer that additionally implement SharedSerializeRegistry or ScratchSpace.

Unlike Serialize, Archive doesn't parameterize over the serializer used to make it. It shouldn't matter what serializer a resolver was made with, only that it's made correctly.

Serializer

rkyv provides serializers that provide all the functionality needed to serialize standard library types, as well as serializers that combine other serializers into a single object with all of the components' capabilities.

The provided serializers offer a wide range of strategies and capabilities, but most use cases will be best suited by AllocSerializer.

Many types require scratch space to serialize. This is some extra allocated space that they can use temporarily and return when they're done. For example, Vec might request scratch space to store the resolvers for its elements until it can serialize all of them. Requesting scratch space from the serializer allows scratch space to be reused many times, which reduces the number of slow memory allocations performed while serializing.

Deserialize

Similarly to Serialize, Deserialize parameterizes over and takes a deserializer, and converts a type from its archived form back to its original one. Unlike serialization, deserialization occurs in a single step and doesn't have an equivalent of a resolver.

Deserialize also parameterizes over the type that is being deserialized into. This allows the same archived type to deserialize into multiple different unarchived types depending on what's being asked for. This helps enable lots of very powerful abstractions, but might require you to annotate types when deserializing.

This provides a more or less a traditional deserialization with the added benefit of being sped up somewhat by having very compatible representations. It also incurs both the memory and performance penalties of traditional deserialization, so make sure that it's what you need before you use it. Deserialization is not required to access archived data as long as you can do so through the archived versions.

Even the highest-performance serialization frameworks will hit a deserialization speed limit because of the amount of memory allocation that needs to be performed.

A good use for Deserialize is deserializing portions of archives. You can easily traverse the archived data to locate some subobject, then deserialize just that piece instead of the archive as a whole. This granular approach provides the benefits of both zero-copy deserialization as well as traditional deserialization.

Deserializer

Deserializers, like serializers, provide capabilities to objects during deserialization. Most types don't bound their deserializers, but some like Rc require special deserializers in order to deserialize memory properly.

Alignment

The alignment of a type restricts where it can be located in memory to optimize hardware loads and stores. Because rkyv creates references to values located in your serialized bytes, it has to ensure that the references it creates are properly aligned for the type.

In order to perform arithmetic and logical operations on data, modern CPUs need to load that data from memory into its registers. However, there's usually a hardware limitation on how the CPU can access that data: it can only access data starting at word boundaries. These words are the natural size for the CPU to work with; the word size is 4 bytes for 32-bit machines and 8 bytes for 64-bit machines. Imagine we had some data laid out like this:

0   4   8   C
AAAABBBBCCCCDDDD

On a 32-bit CPU, accesses could occur at any address that's a multiple of 4 bytes. For example, one could access A by loading 4 bytes from address 0, B by loading 4 bytes from address 4, and so on. This works great because our data is aligned to word boundaries. Unaligned data can throw a wrench in that:

0   4   8   C
..AAAABBBBCCCC

Now if we want to load A into memory, we have to:

  1. Load 4 bytes from address 0
  2. Throw away the first two bytes
  3. Load 4 bytes from address 4
  4. Throw away the last two bytes
  5. Combine our four bytes together

That forces us to do twice as many loads and perform some correction logic. That can have a real impact on our performance across the board, so we require all of our data to be properly aligned.

rkyv provides two main utilities for aligning byte buffers:

Both of these types align the bytes inside to 16-byte boundaries. This should be enough for almost all use cases, but if your particular situation requires even higher alignment then you may need to manually align your bytes.

In practice

rkyv has a very basic unaligned data check built in that may not catch every case. If you also validate your data, then it will always make sure that your data is properly aligned.

Common pitfalls

In some cases, your archived data may be prefixed by some extra data like the length of the buffer. If this extra data misaligns the following data, then the buffer will have to have the prefixing data removed before accessing it.

In other cases, your archived data may not be tight to the end of the buffer. Functions like archived_root rely on the end of the buffer being tight to the end of the data, and may miscalculate the positions of the contained values if it is not.

Format

Types which derive Archive generate an archived version of the type where:

  • Member types are replaced with their archived counterparts
  • Enums have #[repr(N)] where N is u8, u16, u32, u64, or u128, choosing the smallest possible type that can represent all of the variants.

For example, a struct like:

#![allow(unused)]
fn main() {
struct Example {
    a: u32,
    b: String,
    c: Box<(u32, String)>,
}
}

Would have the archived counterpart:

#![allow(unused)]
fn main() {
struct ArchivedExample {
    a: u32,
    b: ArchivedString,
    c: ArchivedBox<(u32, ArchivedString)>,
}
}

With the strict feature, these structs are additionally annotated with #[repr(C)] for guaranteed portability and stability.

In most cases, the strict feature will not be necessary and can reduce the space efficiency of archived types. Make sure you understand your use case carefully and read the crate documentation for details on the strict feature.

rkyv provides Archive implementations for common core and std types by default. In general they follow the same format as derived implementations, but may differ in some cases. For example, ArchivedString performs a small string optimization which helps reduce memory use.

Object order

rkyv lays out subobjects in depth-first order from the leaves to the root. This means that the root object is stored at the end of the buffer, not the beginning. For example, this tree:

  a
 / \
b   c
   / \
  d   e

would be laid out like this in the buffer:

b d e c a

from this serialization order:

a -> b
a -> c -> d
a -> c -> e
a -> c
a

This deterministic layout means that you don't need to store the position of the root object in most cases. As long as your buffer ends right at the end of your root object, you can use archived_root with your buffer.

Wrapper types

Wrapper types make it easy to customize the way that fields of types are archived. They make it easier to adapt rkyv to existing data models, and make serializing and deserializing idiomatic for even complicated types.

Annotating a field with #[with(...)] will wrap that field with the given types when the struct is serialized or deserialized. There's no performance penalty to actually wrap types, but doing more or less work during serialization and deserialization can affect performance. This excerpt is from the documentation for ArchiveWith:

#[derive(Archive, Deserialize, Serialize)]
struct Example {
    #[with(Incremented)]
    a: i32,
    // Another i32 field, but not incremented this time
    b: i32,
}

The Incremented wrapper is wrapping a, and the definition causes that field to be incremented in its archived form.

With

The core type behind wrappers is With. This struct is transparent, meaning that it's like another name for the type inside of it. rkyv uses With to wrap your fields when serializing and deserializing, and when you write your own wrappers they will be used with With as well.

See ArchiveWith for an example of how to write your own wrapper types.

Shared Pointers

The implementation details of shared pointers may be of interest to those using them. Specifically, the rules surrounding how and when shared and weak pointers are serialized and pooled may affect how you choose to use them.

Serialization

Shared pointers (Rc and Arc) are serialized whenever they're encountered for the first time, and the data address is reused when subsequent shared pointers point to the same data. This means that you can expect shared pointers to always point to the same value when archived, even if they are unsized to different types.

Weak pointers (rc::Weak and sync::Weak) have serialization attempted as soon as they're encountered. The serialization process upgrades them, and if it succeeds it serializes them like shared pointers. Otherwise, it serializes them like None.

Deserialization

Similarly, shared pointers are deserialized on the first encounter and reused afterward. Weak pointers do a similar upgrade attempt when they're encountered for the first time.

Serializers and Deserializers

The serializers for shared pointers hold the location of the serialized data. This means it's safe to serialize shared pointers to an archive across multiple serialize calls as long as you use the same serializer for each one. Using a new serializer will still do the right thing, but may end up duplicating the shared data.

The deserializers for shared pointers hold a shared pointer to any deserialized values, and will hold them in memory until the deserializer is dropped. This means that if you serialize only weak pointers to some shared data, they will point to the correct value when deserialized but will point to nothing as soon as the deserializer is dropped.

Unsized Types

rkyv supports unsized types out of the box and ships with implementations for the most common unsized types (strs and slices). Trait objects can also be supported with rkyv_dyn, see "Trait Objects" for more details.

Metadata

The core concept that enables unsized types is metadata. In rust, pointers to types can be different sizes, in contrast with languages like C and C++ where all pointers are the same size. This is important for the concept of sizing, which you may have encountered through rust's Sized trait.

Pointers are composed of two pieces: a data address and some metadata. The data address is what most people think of when they think about pointers; it's the location of the pointed data. The metadata for a pointer is some extra data that is needed to work safely with the data at the pointed location. It can be almost anything, or nothing at all for Sized types. Pointers with no extra metadata are sometimes called "thin" pointers, and pointers with metadata are sometimes called "wide" or "fat" pointers.

rkyv uses the ptr_meta crate to perform these conversions safely. In the future, these may be incorporated as part of the standard library.

Fundamentally, the metadata of a pointer exists to provide the program enough information to safely access, drop, and deallocate structures that are pointed to. For slices, the metadata carries the length of the slice, for trait objects it carries the virtual function table (vtable) pointer, and for custom unsized structs it carries the metadata of the single trailing unsized member.

Archived Metadata

For unsized types, the metadata for a type is archived separately from the relative pointer to the data. This mirrors how rust works internally to support archiving shared pointers and other exotic use cases. This does complicate things somewhat, but for most people the metadata archiving process will end up as just filling out a few functions and returning ().

This is definitely one of the more complicated parts of the library, and can be difficult to wrap your head around. Reading the documentation for ArchiveUnsized may help you understand how the system works by working through an example.

Trait Objects

Trait object serialization is supported through the rkyv_dyn crate. This crate is maintained as part of rkyv, but is separate from the main crate to allow other implementations to be used instead. This section will focus primarily on the architecture of rkyv_dyn and how to use it effectively.

rkyv_dyn may not work in some exotic environments due to the ✨magic✨ it uses to register trait objects. If you want these capabilities but rkyv_dyn doesn't work in your environment, feel free to file an issue or drop by in the discord to talk it through.

Core traits

The new traits introduced by rkyv_dyn are SerializeDyn and DeserializeDyn. These are effectively type-erased versions of SerializeUnsized and DeserializeUnsized so that the traits are object-safe. Likewise, it introduces type-erased versions of serializers and deserializers: DynSerializer and DynDeserializer. These attempt to provide the basic functionality required to serialize most types, but may be more or less capable than custom types require.

DynSerializer implements the Serializer and ScratchSpace traits, but that may not be suitable for all use cases. If you need more capabilities, file an issue or drop by in the discord to talk it through.

Architecture

It is highly recommended to use the provided archive_dyn macro to implement the new traits and set everything up correctly.

Using archive_dyn on a trait definition creates another trait definition with supertraits of your trait and SerializeDyn. This "shim" trait is blanket implemented for all types that implement your trait and SerializeDyn, so you should only ever have to implement your trait to use it.

The shim trait should be used everywhere that you have a trait object of your trait that you want to serialize. By default, it will be named "Serialize" + your trait name. A different approach that similar libraries take is directly adding SerializeDyn as a supertrait of your trait. While more ergonomic, this approach does not allow the implementation of the trait on types that cannot or should not implement SerializeDyn, so the shim trait approach was favored for rkyv_dyn.

When the shim trait is serialized, it stores the type hash of the underlying type in its metadata so it can get the correct vtable for it when accessed. This requires that all vtables for implementing types must be known ahead of time, which is when we use archive_dyn for the second time.

Validation

Validation can be enabled with the bytecheck feature. Validation leverages the bytecheck crate to perform archive validation, and allows the consumption of untrusted and malicious data.

To validate an archive, you first have to derive CheckBytes for your archived type:

use rkyv::{Archive, Deserialize, Serialize};

#[derive(Archive, Deserialize, Serialize)]
#[archive(check_bytes)]
pub struct Example {
    a: i32,
    b: String,
    c: Vec<bool>,
}

The #[archive(check_bytes)] attribute derives CheckBytes on the archived type. Finally, you can use check_archived_root to check an archive and get a reference to the archived value if it was successful:

use rkyv::check_archived_root;

let archived_example = check_archived_root::<Example>(buffer).unwrap();

More examples of how to enable and perform validation can be found in the rkyv_test crate's validation module.

The validation context

When checking an archive, a validation context is created automatically using some good defaults that will work for most archived types. If your type requires special validation logic, you may need to augment the capabilities of the validation context in order to check your type and use check_archived_root_with_context.

The DefaultValidator supports all builtin rkyv types, but changes depending on whether you have the alloc feature enabled or not.

Bounds checking and subtree ranges

All pointers are checked to make sure that they:

  • point inside the archive
  • are properly aligned
  • and have enough space afterward to hold the desired object

However, this alone is not enough to secure against recursion attacks and memory sharing violations, so rkyv uses a system to verify that the archive follows its strict ownership model.

Archive validation uses a memory model where all subobjects are located in contiguous memory. This is called a subtree range. When validating an object, the archive context keeps track of where subobjects are allowed to be located, and can reduce the subtree range from the beginning with push_prefix_subtree_range or the end with push_suffix_subtree_range. After pushing a subtree range, any subobjects in that range can be checked by calling their CheckBytes implementations. Once the subobjects are checked, pop_prefix_subtree_range and pop_suffix_subtree_range can be used to restore the original range with the checked section removed.

Validation and Shared Pointers

While validating shared pointers is supported, some additional restrictions are in place to prevent malicious data from validating:

Shared pointers that point to the same object will fail to validate if they are different types. This can cause issues if you have a shared pointer to the same array, but the pointers are an array pointer and a slice pointer. Similarly, it can cause issues if you have shared pointers to the same value as a concrete type (e.g. i32) and a trait object (e.g. dyn Any).

rkyv still supports these use cases, but it's not possible or feasible to ensure data integrity with these use cases. Alternative validation solutions like archive signatures and data hashes may be a better approach in these cases.

Feature Comparison

This is a best-effort feaure comparison between rkyv, FlatBuffers, and Cap'n Proto. This is by no means completely comprehensive, and pull requests that improve this are welcomed.

Feature matrix

FeaturerkyvCap'n ProtoFlatBuffers
Open type systemyesnono
Scalarsyesnoyes
Tablesno*yesyes
Schema evolutionno*yesyes
Zero-copyyesyesyes
Random-access readsyesyesyes
Validationupfront*on-demandyes
Reflectionno*yesyes
Object orderbottom-upeitherbottom-up
Schema languagederivecustomcustom
Usable as mutable stateyeslimitedlimited
Padding takes space on wire?yes*optionalno
Unset fields take space on wire?yesyesno
Pointers take space on wire?yesyesyes
Cross-languagenoyesyes
Hash maps and B-treesyesnono
Shared pointersyesnono

* rkyv's open type system allows extension types that provide these capabilities

Open type system

One of rkyv's primary features is that its type system is open. This means that users can write custom types and control their properties very finely. You can think of rkyv as a solid foundation to build many other features on top of. In fact, the open type system is already a fundamental part of how rkyv works.

Unsized types

Even though they're part of the main library, unsized types are built on top of the core serialization functionality. Types like Box and Rc/Arc that can hold unsized types are entry points for unsized types into the sized system.

Trait objects

Trait objects are further built on top of unsized types to make serializing and using trait objects easy and safe.

FAQ

Because it's so different from traditional serialization systems, a lot of people have questions about rkyv. This is meant to serve as a comprehensive, centralized source for answers.

How is rkyv zero-copy? It definitely copies the archive into memory.

Traditional serialization works in two steps:

  1. Read the data from disk into a buffer (maybe in pieces)
  2. Process the data in the buffer into the deserialized data structure

The copy happens when the data in the buffer ends up duplicated in the data structure. Zero-copy deserialization doesn't deserialize the buffer into a separate structure and thus avoids this copy.

You can actually even avoid reading the data from disk into a buffer in most environments by using memory mapping.

How does rkyv handle endianness?

rkyv supports three endiannesses: native, little, and big. Native endianness will be either little or big, but removes the abstraction layer to more easily work with the underlying types.

You can enable specific endiannesses with the little_endian and big_endian features.

Is rkyv cross-platform?

Yes, but rkyv has been tested mostly on x86 machines and wasm. There may be bugs that need to get fixed for other architectures.

Can I use this in embedded and #[no_std] environments?

Yes, disable the std feature for no_std. You can additionally disable the alloc feature to disable all memory allocation capabilities.

Safety

Isn't this very unsafe if you access untrusted data?

Yes, but you can still access untrusted data if you validate the archive first with bytecheck. It's an extra step, but it's usually still less than the cost of deserializing using a traditional format. rkyv has proven to round-trip faster than bincode for all tested use cases.

Doesn't that mean I always have to validate?

No. There are many other ways you can verify your data, for example with checksums and signed buffers.

Isn't it kind of deceptive to say rkyv is fast and then require validation?

The fastest path to access archived data is marked as unsafe. This doesn't mean that it's unusable, it means that it's only safe to call if you can verify its preconditions:

The value must be archived at the given position in the byte array.

As long as you can (reasonably) guarantee that, then accessing the archive is safe. Not every archive needs to be validated, and you can use a variety of different techniques to guarantee data integrity and security.

Even if you do need to always validate your data before accessing it, validation is always faster than deserializing with other high-performance formats. A round-trip is still faster, even though it's not by the same margins.

Contributors

Thanks to all the contributors who have helped document rkyv:

If you feel you're missing from this list, feel free to add yourself in a PR.