This week’s featured open-source project is Apache Arrow Flight, an RPC framework for high-performance data services based on Arrow data. The project was co-developed by data lake engine company Dremio, which recently added new support, and is built on top of gRPC and the IPC format.
According to the team, Flight works by defining a set of RPC methods for uploading/downloading data and then retrieves metadata about data streams. It then lists available data streams and applies application-specific RPC methods.
Additionally, one Flight client can connect to any Flight service to perform operations, supporting application-implemented authentication methods.
For error handling. Arrow Flight defines its own set of error codes, according to the Apache Arrow site.
With the new Dremio support, clients can now communicate with Dremio’s data lake service up to ten times faster than using decade-old technologies such as Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC).
“While [ODBC and JDBC] are fine for applications that require small datasets, they are a bottleneck for modern applications, such as machine learning, where millions of records are retrieved over the wire. Today we are announcing the availability of Arrow Flight in Dremio, which will open the door for new applications of data and set the performance standard for high-speed data transfer in the modern enterprise,” said Tomer Shiran, the founder and chief product officer at Dremio.