tft.bucketize
Stay organized with collections Save and categorize content based on your preferences.
Returns a bucketized column, with a bucket index assigned to each input.
tft.bucketize( x: common_types.ConsistentTensorType, num_buckets: int, epsilon: Optional[float] = None, weights: Optional[tf.Tensor] = None, elementwise: bool = False, name: Optional[str] = None ) -> common_types.ConsistentTensorType
Used in the notebooks
Args |
x | A numeric input Tensor , SparseTensor , or RaggedTensor whose values should be mapped to buckets. For a CompositeTensor only non-missing values will be included in the quantiles computation, and the result of bucketize will be a CompositeTensor with non-missing values mapped to buckets. If elementwise=True then x must be dense. |
num_buckets | Values in the input x are divided into approximately equal-sized buckets, where the number of buckets is num_buckets . |
epsilon | (Optional) Error tolerance, typically a small fraction close to zero. If a value is not specified by the caller, a suitable value is computed based on experimental results. For num_buckets less than 100, the value of 0.01 is chosen to handle a dataset of up to ~1 trillion input data values. If num_buckets is larger, then epsilon is set to (1/num_buckets ) to enforce a stricter error tolerance, because more buckets will result in smaller range for each bucket, and so we want the boundaries to be less fuzzy. See analyzers.quantiles() for details. |
weights | (Optional) Weights tensor for the quantiles. Tensor must have the same shape as x. |
elementwise | (Optional) If true, bucketize each element of the tensor independently. |
name | (Optional) A name for this operation. |
Returns |
A Tensor of the same shape as x , with each element in the returned tensor representing the bucketized value. Bucketized value is in the range [0, actual_num_buckets). Sometimes the actual number of buckets can be different than num_buckets hint, for example in case the number of distinct values is smaller than num_buckets, or in cases where the input values are not uniformly distributed. NaN values are mapped to the last bucket. Values with NaN weights are ignored in bucket boundaries calculation. |
Raises |
TypeError | If num_buckets is not an int. |
ValueError | If value of num_buckets is not > 1. |
ValueError | If elementwise=True and x is a CompositeTensor . |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-11-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-11-01 UTC."],[],[],null,["# tft.bucketize\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/transform/blob/v1.16.0/tensorflow_transform/mappers.py#L1746-L1826) |\n\nReturns a bucketized column, with a bucket index assigned to each input. \n\n tft.bucketize(\n x: common_types.ConsistentTensorType,\n num_buckets: int,\n epsilon: Optional[float] = None,\n weights: Optional[tf.Tensor] = None,\n elementwise: bool = False,\n name: Optional[str] = None\n ) -\u003e common_types.ConsistentTensorType\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - [TFX Estimator Component Tutorial](https://www.tensorflow.org/tfx/tutorials/tfx/components) - [TFX Keras Component Tutorial](https://www.tensorflow.org/tfx/tutorials/tfx/components_keras) |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `x` | A numeric input `Tensor`, `SparseTensor`, or `RaggedTensor` whose values should be mapped to buckets. For a `CompositeTensor` only non-missing values will be included in the quantiles computation, and the result of `bucketize` will be a `CompositeTensor` with non-missing values mapped to buckets. If elementwise=True then `x` must be dense. |\n| `num_buckets` | Values in the input `x` are divided into approximately equal-sized buckets, where the number of buckets is `num_buckets`. |\n| `epsilon` | (Optional) Error tolerance, typically a small fraction close to zero. If a value is not specified by the caller, a suitable value is computed based on experimental results. For `num_buckets` less than 100, the value of 0.01 is chosen to handle a dataset of up to \\~1 trillion input data values. If `num_buckets` is larger, then epsilon is set to (1/`num_buckets`) to enforce a stricter error tolerance, because more buckets will result in smaller range for each bucket, and so we want the boundaries to be less fuzzy. See analyzers.quantiles() for details. |\n| `weights` | (Optional) Weights tensor for the quantiles. Tensor must have the same shape as x. |\n| `elementwise` | (Optional) If true, bucketize each element of the tensor independently. |\n| `name` | (Optional) A name for this operation. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `Tensor` of the same shape as `x`, with each element in the returned tensor representing the bucketized value. Bucketized value is in the range \\[0, actual_num_buckets). Sometimes the actual number of buckets can be different than num_buckets hint, for example in case the number of distinct values is smaller than num_buckets, or in cases where the input values are not uniformly distributed. NaN values are mapped to the last bucket. Values with NaN weights are ignored in bucket boundaries calculation. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|---------------------------------------------------|\n| `TypeError` | If num_buckets is not an int. |\n| `ValueError` | If value of num_buckets is not \\\u003e 1. |\n| `ValueError` | If elementwise=True and x is a `CompositeTensor`. |\n\n\u003cbr /\u003e"]]