Headline
CVE-2022-29210: Fix TensorKey hash function. · tensorflow/tensorflow@1b85a28
TensorFlow is an open source platform for machine learning. In version 2.8.0, the TensorKey
hash function used total estimated AllocatedBytes()
, which (a) is an estimate per tensor, and (b) is a very poor hash function for constants (e.g. int32_t
). It also tried to access individual tensor bytes through tensor.data()
of size AllocatedBytes()
. This led to ASAN failures because the AllocatedBytes()
is an estimate of total bytes allocated by a tensor, including any pointed-to constructs (e.g. strings), and does not refer to contiguous bytes in the .data()
buffer. The discoverers could not use this byte vector anyway because types such as tstring
include pointers, whereas they needed to hash the string values themselves. This issue is patched in Tensorflow versions 2.9.0 and 2.8.1.
Permalink
Browse files
Fix TensorKey hash function.
The original hash function only used total estimated `AllocatedBytes()`, which (a) is an estimate per tensor, and (b) is a very poor hash function for constants (e.g. `int32_t`). It also tried to access individual tensor bytes through `tensor.data()` of size `AllocatedBytes()`. This led to ASAN failures because the `AllocatedBytes()` is an estimate of total bytes allocated by a tensor, including any pointed-to constructs (e.g. strings), and does not refer to contiguous bytes in the `.data()` buffer. We couldn’t use this byte vector anyways, since types like `tstring` include pointers, whereas we need to hash the string values themselves.
Modified the hash function to more closely mirror the `==` operator. This correctly handles `tstring` and any numeric types that do have contiguous storage. Other types are currently left as unimplemented.
PiperOrigin-RevId: 446265413
- Loading branch information
Showing with 17 additions and 11 deletions.
- +17 −9 tensorflow/core/framework/tensor_key.h
- +0 −2 tensorflow/python/kernel_tests/data_structures/BUILD
Related news
### Impact The [`TensorKey` hash function](https://github.com/tensorflow/tensorflow/blob/f3b9bf4c3c0597563b289c0512e98d4ce81f886e/tensorflow/core/framework/tensor_key.h#L53-L64) used total estimated `AllocatedBytes()`, which (a) is an estimate per tensor, and (b) is a very poor hash function for constants (e.g. `int32_t`). It also tried to access individual tensor bytes through `tensor.data()` of size `AllocatedBytes()`. This led to ASAN failures because the `AllocatedBytes()` is an estimate of total bytes allocated by a tensor, including any pointed-to constructs (e.g. strings), and does not refer to contiguous bytes in the `.data()` buffer. We couldn't use this byte vector anyways, since types like `tstring` include pointers, whereas we need to hash the string values themselves. ### Patches We have patched the issue in GitHub commit [1b85a28d395dc91f4d22b5f9e1e9a22e92ccecd6](https://github.com/tensorflow/tensorflow/commit/1b85a28d395dc91f4d22b5f9e1e9a22e92ccecd6). The fix will b...