@@ -148,6 +148,109 @@ Fixed shape tensor
148148 by this specification. Instead, this extension type lets one use fixed shape tensors
149149 as elements in a field of a RecordBatch or a Table.
150150
151+ .. _variable_shape_tensor_extension :
152+
153+ Variable shape tensor
154+ =====================
155+
156+ * Extension name: `arrow.variable_shape_tensor `.
157+
158+ * The storage type of the extension is: ``StructArray `` where struct
159+ is composed of **data ** and **shape ** fields describing a single
160+ tensor per row:
161+
162+ * **data ** is a ``List `` holding tensor elements (each list element is
163+ a single tensor). The List's value type is the value type of the tensor,
164+ such as an integer or floating-point type.
165+ * **shape ** is a ``FixedSizeList<int32>[ndim] `` of the tensor shape where
166+ the size of the list ``ndim `` is equal to the number of dimensions of the
167+ tensor.
168+
169+ * Extension type parameters:
170+
171+ * **value_type ** = the Arrow data type of individual tensor elements.
172+
173+ Optional parameters describing the logical layout:
174+
175+ * **dim_names ** = explicit names to tensor dimensions
176+ as an array. The length of it should be equal to the shape
177+ length and equal to the number of dimensions.
178+
179+ ``dim_names `` can be used if the dimensions have well-known
180+ names and they map to the physical layout (row-major).
181+
182+ * **permutation ** = indices of the desired ordering of the
183+ original dimensions, defined as an array.
184+
185+ The indices contain a permutation of the values [0, 1, .., N-1] where
186+ N is the number of dimensions. The permutation indicates which
187+ dimension of the logical layout corresponds to which dimension of the
188+ physical tensor (the i-th dimension of the logical view corresponds
189+ to the dimension with number ``permutations[i] `` of the physical tensor).
190+
191+ Permutation can be useful in case the logical order of
192+ the tensor is a permutation of the physical order (row-major).
193+
194+ When logical and physical layout are equal, the permutation will always
195+ be ([0, 1, .., N-1]) and can therefore be left out.
196+
197+ * **uniform_shape ** = sizes of individual tensor's dimensions which are
198+ guaranteed to stay constant in uniform dimensions and can vary in
199+ non-uniform dimensions. This holds over all tensors in the array.
200+ Sizes in uniform dimensions are represented with int32 values, while
201+ sizes of the non-uniform dimensions are not known in advance and are
202+ represented with null. If ``uniform_shape `` is not provided it is assumed
203+ that all dimensions are non-uniform.
204+ An array containing a tensor with shape (2, 3, 4) and whose first and
205+ last dimensions are uniform would have ``uniform_shape `` (2, null, 4).
206+ This allows for interpreting the tensor correctly without accounting for
207+ uniform dimensions while still permitting optional optimizations that
208+ take advantage of the uniformity.
209+
210+ * Description of the serialization:
211+
212+ The metadata must be a valid JSON object that optionally includes
213+ dimension names with keys **"dim_names" ** and ordering of dimensions
214+ with key **"permutation" **.
215+ Shapes of tensors can be defined in a subset of dimensions by providing
216+ key **"uniform_shape" **.
217+ Minimal metadata is an empty string.
218+
219+ - Example with ``dim_names `` metadata for NCHW ordered data (note that the first
220+ logical dimension, ``N ``, is mapped to the **data ** List array: each element in the List
221+ is a CHW tensor and the List of tensors implicitly constitutes a single NCHW tensor):
222+
223+ ``{ "dim_names": ["C", "H", "W"] } ``
224+
225+ - Example with ``uniform_shape `` metadata for a set of color images
226+ with fixed height, variable width and three color channels:
227+
228+ ``{ "dim_names": ["H", "W", "C"], "uniform_shape": [400, null, 3] } ``
229+
230+ - Example of permuted 3-dimensional tensor:
231+
232+ ``{ "permutation": [2, 0, 1] } ``
233+
234+ For example, if the physical **shape ** of an individual tensor
235+ is ``[100, 200, 500] ``, this permutation would denote a logical shape
236+ of ``[500, 100, 200] ``.
237+
238+ .. note ::
239+
240+ With the exception of ``permutation ``, the parameters and storage
241+ of VariableShapeTensor relate to the *physical * storage of the tensor.
242+
243+ For example, consider a tensor with::
244+ shape = [10, 20, 30]
245+ dim_names = [x, y, z]
246+ permutations = [2, 0, 1]
247+
248+ This means the logical tensor has names [z, x, y] and shape [30, 10, 20].
249+
250+ .. note ::
251+ Values inside each **data ** tensor element are stored in row-major/C-contiguous
252+ order according to the corresponding **shape **.
253+
151254=========================
152255Community Extension Types
153256=========================
0 commit comments