2,838 questions
-1
votes
0
answers
51
views
Sum up columns with the same string in the column name
I want to sum up (horizontally) some columns that share a common string in their name. For example, given the following columns
a_red, a_blue, a_green, b_red, b_yellow, b_blue
I would like to add in ...
Best practices
0
votes
0
replies
19
views
best way to leverage polars multithreading with scikit-learn compatibility
I've been working on a project for rapidly testing thousands of outcome variables on a standard set of predictors and covariates using polars. It's working very well, with speed ups as high as 16x ...
2
votes
1
answer
82
views
Polars to_date conversion from str to date fails with proper format
I have a polars dataframe with a date column I've built with the format %YW%W. I want to convert the column to a date, so I wrote the following snippet:
agg_pivoted_df = pivoted_df.with_columns([
...
2
votes
1
answer
194
views
Polars (Python) is unable to read Unicode character U+2019
I have a JSON file that I'm trying to read into a Polars dataframe but keep getting an error message. I've been able to pin it to a specific character, but I don't know what to do about it. JSON file ...
2
votes
1
answer
98
views
Cumulative sum with group_by [duplicate]
Suppose I have the following DataFrame of the number of births in each state in each year:
df = pl.DataFrame(
{
"state": ["CA", "CA", "CA", "TX&...
Advice
0
votes
3
replies
79
views
Polars upsample without removing existing data
If I have a polars dataframe like
┌───────────┬─────────────────────┬───────────┐
│ sensor_id ┆ ts ┆ value │
│ --- ┆ --- ┆ --- │
│ i32 ┆ datetime[...
5
votes
1
answer
144
views
Replace elements in a List column with multiple other elements
Consider the following DataFrame:
df = pl.DataFrame({
"a":[
["55", "87.19"],
["55.11","55.12"],
["55", "27.89"]
...
4
votes
1
answer
116
views
Get unique elements in multiple List columns
Consider a DataFrame with multiple list columns, for instance:
df = pl.DataFrame({
"a": [range(1,3), range(5,10)],
"b": [range(4,9), range(6,11)]
})
Let's print it:
...
0
votes
1
answer
119
views
How to preserve all digits when reading Excel number with Polars when casting to String (Utf8)?
I’m reading an XLSX file uploaded from a React frontend to a FastAPI backend. On the frontend, I use the xlsx library to read and display the data as JSON:
reader.onload = (evt) => {
const data = ...
0
votes
1
answer
135
views
Debugging problems with uv, Polars, and Visual Studio Code
I'm having a lot of problems in the interaction between uv, Polars, and Visual Studio Code.
I run my Polars code within a virtual environment built with uv.
Sometimes the kernel just dies.
Sometimes a ...
Advice
0
votes
2
replies
82
views
How to ergonomically perform matrix multiplication inside Polars streaming engine?
I have a large table of data, in the range of hundreds of millions of rows/events, each which has around 50 numerical columns, call them c1 through c50. For each event, say I want to perform matrix-...
2
votes
2
answers
137
views
Polars add elements to list
Suppose I have the following polars DataFrame:
df = pl.DataFrame({"a": [["A111", "A110"], ["Z254"], ["B897", "C768", "D456"]]})
...
2
votes
1
answer
87
views
Polars list aggregation: what are "aggregation expression"?
The documentation for polars.Expr.list.agg says:
Run any polars aggregation expression against the lists’ elements.
One would think that the "aggregation expression"s the documentation ...
4
votes
1
answer
154
views
Filter empty string in a polars lazyframe
I am trying to filter out the URI column from a parquet file having over 50 million rows containing empty string using
import polars as pl
lf = pl.scan_parquet("data.parquet")
lf.filter(pl....
0
votes
0
answers
122
views
Polars Out-of-Memory when performing a series of joins with 1:1 match and high number of columns
I am performing a series of left joins on Polars LazyFrames:
final_df = (
lf1.join(lf2, on="id", how="left")
.join(lf3, on="id", how="left")
....