705 questions
Tooling
0
votes
0
replies
15
views
Send streams of interleaved dataframe-like data
I am working on a co-simulation framework. As part of its job, it connects to several other tools to send and receive information from them as time in the simulation progresses. This data mostly takes ...
0
votes
0
answers
97
views
ADBC Flight SQL query on StarRocks database drops column names
I am using the ADBC Flight SQL driver to query a StarRocks database. This works well (and is insanely fast) when the query is a SELECT on a single table. But as soon as I add a JOIN to the query, all ...
0
votes
1
answer
102
views
Plot histogram in R using arrow for big datasets
I am trying to plot a histogram using a huge file (45 gb 600M rows 10 columns .tsv). The file is structured as follows:
Image | X | Y | Channel1 | Channel2 | Channel3 | Channel4 | Channel1/Channel2 | ...
3
votes
1
answer
212
views
Should I expect better performance from R arrow with partitioned parquet files?
I set up a folder of partitioned parquet files for a project at work, and I'm experiencing severe performance issues. Several hours to do the aggregation.
I made this minimal example to show the ...
0
votes
1
answer
283
views
PyIceberg append fails with "Signer set, but token is not available"
I'm working on writing data to an Iceberg table using PyIceberg (0.6.0+) with a Ceph S3-compatible backend, via Lakekeeper (https://github.com/lakekeeper/lakekeeper) as my REST catalog and metadata ...
0
votes
0
answers
64
views
Serializing dates using rust's tauri-specta and apache arrow
I'm working on a Tauri application that uses tauri-specta for type safety and I can't figure out how to properly serialize dates. This is the file where most of the serialization and deserialization ...
0
votes
0
answers
181
views
How to loop through Apache Arrow data in C++?
struct Widget
{
std::string foo;
std::string bar;
int baz;
};
So far, I've been saving Widget structs directly to binary files. To read them back, I use reinterpret_cast to convert raw ...
-3
votes
1
answer
2k
views
How do I read a `.arrow` (Apache Arrow aka Feather V2 format) file with Python Pandas?
I'm trying to read an .arrow format file with Python pandas.
pandas does not have a read_arrow function. However, it does have read_csv, read_parquet, and other similarly named functions.
How can I ...
0
votes
0
answers
41
views
How can I add a new FieldVector to an existing VectorSchemaRoot
In Java Apache Arrow, I have an existing VectorSchemaRoot that's created following this documentation:
BitVector bitVector = new BitVector("boolean", allocator);
bitVector.allocateNew();
for ...
0
votes
0
answers
77
views
Store multiple Arrow tables in a single file
We are looking at developing an exchange and archival format for data that can be represented as multiple tables: one-to-three tables to be specific, each with a different schema.
I am looking at ...
0
votes
0
answers
110
views
How to read large Arrow IPC files in batches for transformation with low memory usage?
I'm working with a Rust-based data processing pipeline using the polars and arrow2 crates. I have a flow where I batch-read CSVs and write them to an Arrow IPC file using IpcWriter with compression ...
0
votes
0
answers
41
views
Using Arrow ADBC to register UDF
While chasing down performance and cleaner code, I've ran into this problem I've found that UDF's are seemingly connection dependent, not database dependent. If I, for instance, used a trigger on ...
0
votes
0
answers
310
views
Issue Writing Polars DataFrame in Chunks to Arrow/Parquet Without Corruption
Issue Writing Polars DataFrame in Chunks to Arrow/Parquet Without Corruption
What I Am Trying to Do
I'm trying to write a Polars DataFrame in chunks to either an Arrow IPC file or a Parquet file ...
0
votes
2
answers
176
views
Attaching an adbc connection to an sqlite in-memory database
I have a setup where I'm utilizing two connections for sqlite: A dbapi-based sqlite connection from Arrow ADBC so I can have access to ingesting and fetching arrow data, and a native sqlite3 ...
0
votes
2
answers
111
views
ERROR [HYC00] [Apache Arrow][Flight SQL] (100) Unsupported function for parameterised query to Dremio via ODBC
I am using the ODBC diver via .NET 4.8 to connect to Dremio but getting this error:
System.Data.Odbc.OdbcException
HResult=0x80131937
Message=ERROR [HYC00] [Apache Arrow][Flight SQL] (100) ...