<?xml version="1.0" encoding="utf-8"?>
<resources>
    <string name="">:idprefix: rationale_</string>
    <string name="">= Implementation Rationale</string>
    <string name="">Closed-addressing Containers</string>
    <string name="">`boost::unordered_[multi]set` and `boost::unordered_[multi]map` adhere to the standard requirements for unordered associative containers, so the interface was fixed. But there are still some implementation decisions to make. The priorities are conformance to the standard and portability.</string>
    <string name="">The http://en.wikipedia.org/wiki/Hash_table[Wikipedia article on hash tables^] has a good summary of the implementation issues for hash tables in general.</string>
    <string name="">Data Structure</string>
    <string name="">By specifying an interface for accessing the buckets of the container the standard pretty much requires that the hash table uses closed addressing.</string>
    <string name="">It would be conceivable to write a hash table that uses another method. For example, it could use open addressing, and use the lookup chain to act as a bucket but there are some serious problems with this:</string>
    <string name="">The standard requires that pointers to elements aren\'t invalidated, so</string>
    <string name="">the elements can\'t be stored in one array, but will need a layer of indirection instead - losing the efficiency and most of the memory gain, the main advantages of open addressing.</string>
    <string name="">Local iterators would be very inefficient and may not be able to</string>
    <string name="">meet the complexity requirements.</string>
    <string name="">There are also the restrictions on when iterators can be invalidated. Since</string>
    <string name="">open addressing degrades badly when there are a high number of collisions the restrictions could prevent a rehash when it\'s really needed. The maximum load factor could be set to a fairly low value to work around this - but the standard requires that it is initially set to 1.0.</string>
    <string name="">And since the standard is written with a eye towards closed</string>
    <string name="">addressing, users will be surprised if the performance doesn\'t reflect that.</string>
    <string name="">So closed addressing is used.</string>
    <string name="">Number of Buckets</string>
    <string name="">There are two popular methods for choosing the number of buckets in a hash table. One is to have a prime number of buckets, another is to use a power of 2.</string>
    <string name="">Using a prime number of buckets, and choosing a bucket by using the modulus of the hash function\'s result will usually give a good result. The downside is that the required modulus operation is fairly expensive. This is what the containers used to do in most cases.</string>
    <string name="">Using a power of 2 allows for much quicker selection of the bucket to use, but at the expense of losing the upper bits of the hash value. For some specially designed hash functions it is possible to do this and still get a good result but as the containers can take arbitrary hash functions this can\'t be relied on.</string>
    <string name="">To avoid this a transformation could be applied to the hash function, for an example see http://web.archive.org/web/20121102023700/http://www.concentric.net/~Ttwang/tech/inthash.htm[Thomas Wang\'s article on integer hash functions^]. Unfortunately, a transformation like Wang\'s requires knowledge of the number of bits in the hash value, so it was only used when `size_t` was 64 bit.</string>
    <string name="">Since release 1.79.0, https://en.wikipedia.org/wiki/Hash_function#Fibonacci_hashing[Fibonacci hashing] is used instead. With this implementation, the bucket number is determined by using `(h * m) &gt;&gt; (w - k)`, where `h` is the hash value, `m` is `2^w` divided by the golden ratio, `w` is the word size (32 or 64), and `2^k` is the number of buckets. This provides a good compromise between speed and distribution.</string>
    <string name="">Since release 1.80.0, prime numbers are chosen for the number of buckets in tandem with sophisticated modulo arithmetic. This removes the need for \"mixing\" the result of the user\'s hash function as was used for release 1.79.0.</string>
    <string name="">Open-addresing Containers</string>
    <string name="">The C++ standard specification of unordered associative containers impose severe limitations on permissible implementations, the most important being that closed addressing is implicitly assumed. Slightly relaxing this specification opens up the possibility of providing container variations taking full advantage of open-addressing techniques.</string>
    <string name="">The design of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` has been guided by Peter Dimov\'s https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^]. We discuss here the most relevant principles.</string>
    <string name="">Hash Function</string>
    <string name="">Given its rich functionality and cross-platform interoperability, `boost::hash` remains the default hash function of open-addressing containers. As it happens, `boost::hash` for integral and other basic types does not possess the statistical properties required by open addressing; to cope with this, we implement a post-mixing stage:</string>
    <string name="">{nbsp}{nbsp}{nbsp}{nbsp} _a_ &lt;- _h_ *mul* _C_, + {nbsp}{nbsp}{nbsp}{nbsp} _h_ &lt;- *high*(_a_) *xor* *low*(_a_),</string>
    <string name="">where *mul* is an _extended multiplication_ (128 bits in 64-bit architectures, 64 bits in 32-bit environments), and *high* and *low* are the upper and lower halves of an extended word, respectively. In 64-bit architectures, _C_ is the integer part of 2^64^&amp;#8725;https://en.wikipedia.org/wiki/Golden_ratio[_&amp;phi;_], whereas in 32 bits _C_ = 0xE817FB2Du has been obtained from https://arxiv.org/abs/2001.05304[Steele and Vigna (2021)^].</string>
    <string name="">When using a hash function directly suitable for open addressing, post-mixing can be opted out of via a dedicated `link:../../../../container_hash/doc/html/hash.html#ref_hash_is_avalanchinghash[hash_is_avalanching]` trait. `boost::hash` specializations for string types are marked as avalanching.</string>
    <string name="">Platform Interoperability</string>
    <string name="">The observable behavior of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` is deterministically identical across different compilers as long as their ``std::size_t``s are the same size and the user-provided hash function and equality predicate are also interoperable —this includes elements being ordered in exactly the same way for the same sequence of operations.</string>
    <string name="">Although the implementation internally uses SIMD technologies, such as https://en.wikipedia.org/wiki/SSE2[SSE2^] and https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(NEON)[Neon^], when available, this does not affect interoperatility. For instance, the behavior is the same for Visual Studio on an x64-mode Intel CPU with SSE2 and for GCC on an IBM s390x without any supported SIMD technology.</string>
    <string name="">Concurrent Containers</string>
    <string name="">The same data structure used by Boost.Unordered open-addressing containers has been chosen also as the foundation of `boost::concurrent_flat_set`/`boost::concurrent_node_set` and `boost::concurrent_flat_map`/`boost::concurrent_node_map`:</string>
    <string name="">Open-addressing is faster than closed-addressing alternatives, both in non-concurrent and</string>
    <string name="">concurrent scenarios.</string>
    <string name="">Open-addressing layouts are eminently suitable for concurrent access and modification</string>
    <string name="">with minimal locking. In particular, the metadata array can be used for implementations of lookup that are lock-free up to the last step of actual element comparison.</string>
    <string name="">Layout compatibility with Boost.Unordered flat containers allows for</string>
    <string name="">xref:concurrent.adoc#concurrent_interoperability_with_non_concurrent_containers[fast transfer] of all elements between a concurrent container and its non-concurrent counterpart, and vice versa.</string>
    <string name="">Hash Function and Platform Interoperability</string>
    <string name="">Concurrent containers make the same decisions and provide the same guarantees as Boost.Unordered open-addressing containers with regards to xref:#rationale_hash_function[hash function defaults] and xref:#rationale_platform_interoperability[platform interoperability].</string>
</resources>
