Crate caffe2op_dataset

source ·

Structs

  • | Append input B to the end of input A. | | - It is required that this operation | run in-place, meaning that the input | A blob must match the output blob. | | - All except the outer-most dimension | must be the same between A and B. | | - Input A may have to be re-allocated | in order for accommodate to the new size. | Currently, an exponential growth ratio | is used in order to ensure amortized | constant time complexity. | | Github Links: | | - https://github.com/pytorch/pytorch/blob/master/caffe2/operators/dataset_ops.cc |
  • | Checks that the given data fields represents | a consistent dataset under the schema | specified by the fields argument. | | Operator fails if the fields are not | consistent. If data is consistent, | each field’s data can be safely appended | to an existing dataset, keeping it consistent. |
  • | Collect tensor into tensor vector by | reservoir sampling, argument num_to_collect | indicates the max number of tensors | that will be collected. | | The first half of the inputs are tensor | vectors, which are also the outputs. | The second half of the inputs are the | tensors to be collected into each vector | (in the same order). | | The input tensors are collected in all-or-none | manner. If they are collected, they | will be placed at the same index in the | output vectors. |
  • | Compute the offsets matrix given cursor | and data blobs. Need to be ran at beginning | or after reseting cursor | | Input(0) is a blob pointing to a TreeCursor, | and [Input(1),… Input(num_fields)] | a list of tensors containing the data | for each field of the dataset. | | ComputeOffset is thread safe. |
  • | Concat Tensors in the std::unique_ptr<std::vector> | along the first dimension. |

  • | Creates a cursor to iterate through a list of | tensors, where some of those tensors contain the | lengths in a nested schema. The schema is | determined by the fields arguments. | | For example, to represent the following schema: | | Struct( | a=Int(), | b=List(List(Int)), | c=List( | Struct( | c1=String, | c2=List(Int), | ), | ), | ) | | the field list will be: | [ | “a”, | “b:lengths”, | “b:values:lengths”, | “b:values:values”, | “c:lengths”, | “c:c1”, | “c:c2:lengths”, | “c:c2:values”, | ] | | And for the following instance of the struct: | | Struct( | a=3, | b=[[4, 5], [6, 7, 8], [], [9]], | c=[ | Struct(c1=‘alex’, c2=[10, 11]), | Struct(c1=‘bob’, c2=[12]), | ], | ) | | The values of the fields will be: | { | “a”: [3], | “b:lengths”: [4], | “b:values:lengths”: [2, 3, 0, 1], | “b:values:values”: [4, 5, 6, 7, 8, 9], | “c:lengths”: [2], | “c:c1”: [“alex”, “bob”], | “c:c2:lengths”: [2, 1], | “c:c2:values”, [10, 11, 12], | } | | In general, every field name in the format | “{prefix}:lengths” defines a domain “{prefix}”, | and every subsequent field in the format | “{prefix}:{field}” will be in that domain, and the | length of the domain is provided for each entry of | the parent domain. In the example, “b:lengths” | defines a domain of length 4, so every field under | domain “b” will have 4 entries. The “lengths” | field for a given domain must appear before any | reference to that domain. | | Returns a pointer to an instance of the Cursor, | which keeps the current offset on each of the | domains defined by fields. Cursor also ensures | thread-safety such that ReadNextBatch and | ResetCursor can be used safely in parallel. | | A cursor does not contain data per se, so calls to | ReadNextBatch actually need to pass a list of | blobs containing the data to read for each one of | the fields. |
  • | Get the current offset in the cursor. |
  • | Given a dataset under a schema specified | by the fields argument, pack all the | input tensors into one, where each tensor | element represents a row of data (batch | of size 1). This format allows easier | use with the rest of Caffe2 operators. |
  • | Read the next batch of examples out of | the given cursor and data blobs. | | Input(0) is a blob pointing to a TreeCursor, | and [Input(1),… Input(num_fields)] | a list of tensors containing the data | for each field of the dataset. | | ReadNextBatch is thread safe. |
  • | Read the next batch of examples out of the given cursor, | idx blob, offset matrix and data blobs. | | Input(0) is a blob pointing to a TreeCursor, | Input(1) is a blob pointing to the shuffled idx | Input(2) is a blob pointing to the offset matrix and | [Input(3),… Input(num_fields)] a list of tensors containing the data for | each field of the dataset. | | ReadRandomBatch is thread safe.
  • | Resets the offsets for the given TreeCursor. | This operation is thread safe. |

  • | Compute the sorted indices given a field index to | sort by and break the sorted indices into chunks | of shuffle_size * batch_size and shuffle each | chunk, finally we shuffle between batches. If | sort_by_field_idx is -1 we skip sort. | | For example, we have data sorted as | 1,2,3,4,5,6,7,8,9,10,11,12 | | and batchSize = 2 and shuffleSize = 3, when we | shuffle we get: | [3,1,4,6,5,2] [12,10,11,8,9,7] | | After this we will shuffle among different batches | with size 2 | [3,1],[4,6],[5,2],[12,10],[11,8],[9,7] | | We may end up with something like | [9,7],[5,2],[12,10],[4,6],[3,1],[11,8] | | Input(0) is a blob pointing to a TreeCursor, and | [Input(1),… Input(num_fields)] a list of tensors | containing the data for each field of the dataset. | | SortAndShuffle is thread safe.


  • | Provides functionality to iterate | across a list of tensors where some of | those tensors represent lengths in | a hierarchical structure. |

  • | Simple wrapper class allowing an easy | traversal of the tensors representing | the hirerarchical structure. |
  • | Simple Proxy class to expose nicer API | for field access |
  • | Trim the given dataset inplace, given | the dataset blobs and the field specs. | | Trimming happens such that the dataset | will contain the largest possible number | of records that is a multiple of the ‘multiple_of’ | argument. |
  • | Given a packed dataset (packed by the | PackRecordsOp) and the fields argument | describing the datasets schema, return | the original dataset format. Number | of returned tensors is equal to the number | of fields in the fields argument. | | The first input is the packed tensor | to be unpacked. Optionally, you can | provide prototype tensors to give the | expected shapes of the output tensors. | This is helpful when you expected to | unpack empty tensor, e.g., output of | a sampling process. |

Constants

Type Definitions