| Append input B
to the end of input A
.
|
| - It is required that this operation
| run in-place, meaning that the input
| A
blob must match the output blob.
|
| - All except the outer-most dimension
| must be the same between A
and B
.
|
| - Input A
may have to be re-allocated
| in order for accommodate to the new size.
| Currently, an exponential growth ratio
| is used in order to ensure amortized
| constant time complexity.
|
| Github Links:
|
| - https://github.com/pytorch/pytorch/blob/master/caffe2/operators/dataset_ops.cc
|
| Checks that the given data fields represents
| a consistent dataset under the schema
| specified by the fields
argument.
|
| Operator fails if the fields are not
| consistent. If data is consistent,
| each field’s data can be safely appended
| to an existing dataset, keeping it consistent.
|
| Collect tensor into tensor vector by
| reservoir sampling, argument num_to_collect
| indicates the max number of tensors
| that will be collected.
|
| The first half of the inputs are tensor
| vectors, which are also the outputs.
| The second half of the inputs are the
| tensors to be collected into each vector
| (in the same order).
|
| The input tensors are collected in all-or-none
| manner. If they are collected, they
| will be placed at the same index in the
| output vectors.
|
| Compute the offsets matrix given cursor
| and data blobs. Need to be ran at beginning
| or after reseting cursor
|
| Input(0) is a blob pointing to a TreeCursor,
| and [Input(1),… Input(num_fields)]
| a list of tensors containing the data
| for each field of the dataset.
|
| ComputeOffset is thread safe.
|
| Concat Tensors in the std::unique_ptr<std::vector>
| along the first dimension.
|
| Creates a cursor to iterate through a list of
| tensors, where some of those tensors contain the
| lengths in a nested schema. The schema is
| determined by the fields
arguments.
|
| For example, to represent the following schema:
|
| Struct(
| a=Int(),
| b=List(List(Int)),
| c=List(
| Struct(
| c1=String,
| c2=List(Int),
| ),
| ),
| )
|
| the field list will be:
| [
| “a”,
| “b:lengths”,
| “b:values:lengths”,
| “b:values:values”,
| “c:lengths”,
| “c:c1”,
| “c:c2:lengths”,
| “c:c2:values”,
| ]
|
| And for the following instance of the struct:
|
| Struct(
| a=3,
| b=[[4, 5], [6, 7, 8], [], [9]],
| c=[
| Struct(c1=‘alex’, c2=[10, 11]),
| Struct(c1=‘bob’, c2=[12]),
| ],
| )
|
| The values of the fields will be:
| {
| “a”: [3],
| “b:lengths”: [4],
| “b:values:lengths”: [2, 3, 0, 1],
| “b:values:values”: [4, 5, 6, 7, 8, 9],
| “c:lengths”: [2],
| “c:c1”: [“alex”, “bob”],
| “c:c2:lengths”: [2, 1],
| “c:c2:values”, [10, 11, 12],
| }
|
| In general, every field name in the format
| “{prefix}:lengths” defines a domain “{prefix}”,
| and every subsequent field in the format
| “{prefix}:{field}” will be in that domain, and the
| length of the domain is provided for each entry of
| the parent domain. In the example, “b:lengths”
| defines a domain of length 4, so every field under
| domain “b” will have 4 entries. The “lengths”
| field for a given domain must appear before any
| reference to that domain.
|
| Returns a pointer to an instance of the Cursor,
| which keeps the current offset on each of the
| domains defined by fields
. Cursor also ensures
| thread-safety such that ReadNextBatch and
| ResetCursor can be used safely in parallel.
|
| A cursor does not contain data per se, so calls to
| ReadNextBatch actually need to pass a list of
| blobs containing the data to read for each one of
| the fields.
|
| Get the current offset in the cursor.
|
| Given a dataset under a schema specified
| by the fields
argument, pack all the
| input tensors into one, where each tensor
| element represents a row of data (batch
| of size 1). This format allows easier
| use with the rest of Caffe2 operators.
|
| Read the next batch of examples out of
| the given cursor and data blobs.
|
| Input(0) is a blob pointing to a TreeCursor,
| and [Input(1),… Input(num_fields)]
| a list of tensors containing the data
| for each field of the dataset.
|
| ReadNextBatch is thread safe.
|
| Read the next batch of examples out of the given cursor,
| idx blob, offset matrix and data blobs.
|
| Input(0) is a blob pointing to a TreeCursor,
| Input(1) is a blob pointing to the shuffled idx
| Input(2) is a blob pointing to the offset matrix and
| [Input(3),… Input(num_fields)] a list of tensors containing the data for
| each field of the dataset.
|
| ReadRandomBatch is thread safe.
| Resets the offsets for the given TreeCursor.
| This operation is thread safe.
|
| Compute the sorted indices given a field index to
| sort by and break the sorted indices into chunks
| of shuffle_size * batch_size and shuffle each
| chunk, finally we shuffle between batches. If
| sort_by_field_idx is -1 we skip sort.
|
| For example, we have data sorted as
| 1,2,3,4,5,6,7,8,9,10,11,12
|
| and batchSize = 2 and shuffleSize = 3, when we
| shuffle we get:
| [3,1,4,6,5,2] [12,10,11,8,9,7]
|
| After this we will shuffle among different batches
| with size 2
| [3,1],[4,6],[5,2],[12,10],[11,8],[9,7]
|
| We may end up with something like
| [9,7],[5,2],[12,10],[4,6],[3,1],[11,8]
|
| Input(0) is a blob pointing to a TreeCursor, and
| [Input(1),… Input(num_fields)] a list of tensors
| containing the data for each field of the dataset.
|
| SortAndShuffle is thread safe.
| Provides functionality to iterate
| across a list of tensors where some of
| those tensors represent lengths in
| a hierarchical structure.
|
| Simple wrapper class allowing an easy
| traversal of the tensors representing
| the hirerarchical structure.
|
| Simple Proxy class to expose nicer API
| for field access
|
| Trim the given dataset inplace, given
| the dataset blobs and the field specs.
|
| Trimming happens such that the dataset
| will contain the largest possible number
| of records that is a multiple of the ‘multiple_of’
| argument.
|
| Given a packed dataset (packed by the
| PackRecordsOp) and the fields
argument
| describing the datasets schema, return
| the original dataset format. Number
| of returned tensors is equal to the number
| of fields in the fields
argument.
|
| The first input is the packed tensor
| to be unpacked. Optionally, you can
| provide prototype tensors to give the
| expected shapes of the output tensors.
| This is helpful when you expected to
| unpack empty tensor, e.g., output of
| a sampling process.
|