ptars 0.0.7

Fast conversion from protobuf to Apache Arrow and back
Documentation

ptars

Ruff PyPI Version Python Version Github Stars codecov Build Status License Downloads Downloads snyk GitHub issues Contributing FOSSA Status Repo Size

Protobuf to Arrow, using Rust

Example

Take a protobuf:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

And convert serialized messages directly to pyarrow.RecordBatch:

from ptars import HandlerPool


messages = [
    SearchRequest(
        query="protobuf to arrow",
        page_number=0,
        result_per_page=10,
    ),
    SearchRequest(
        query="protobuf to arrow",
        page_number=1,
        result_per_page=10,
    ),
]
payloads = [message.SerializeToString() for message in messages]

pool = HandlerPool([SearchRequest.DESCRIPTOR.file])
handler = pool.get_for_message(SearchRequest.DESCRIPTOR)
record_batch = handler.list_to_record_batch(payloads)
query page_number result_per_page
protobuf to arrow 0 10
protobuf to arrow 1 10

You can also convert a pyarrow.RecordBatch back to serialized protobuf messages:

array: pa.BinaryArray = handler.record_batch_to_array(record_batch)
messages_back: list[SearchRequest] = [
    SearchRequest.FromString(s.as_py()) for s in array
]

Benchmark against protarrow

Ptars is a rust implementation of protarrow, which is implemented in plain python. It is:

  • 2.5 times faster when converting from proto to arrow.
  • 3 times faster when converting from arrow to proto.
---- benchmark 'to_arrow': 2 tests ----
Name (time in ms)        Mean          
---------------------------------------
protarrow_to_arrow     9.4863 (2.63)   
ptars_to_arrow         3.6009 (1.0)    
---------------------------------------

---- benchmark 'to_proto': 2 tests -----
Name (time in ms)         Mean          
----------------------------------------
protarrow_to_proto     20.8297 (3.20)   
ptars_to_proto          6.5013 (1.0)    
----------------------------------------