database-performance – A Passionate Techie

I have a DB query that matches the desired rows. Let’s say (for simplicity):

select * from stats where id in (1, 2);

Now I want to extract several frequency statistics (count of distinct values) for multiple columns, across these matching rows:

-- `stats.status` is one such column
select status, count(*) from stats where id in (1, 2) group by 1 order by 2 desc;

-- `stats.category` is another column
select category, count(*) from stats where id in (1, 2) group by 1 order by 2 desc;

-- etc.

Is there a way to re-use the same underlying query in SqlAlchemy? Raw SQL works too.

Or even better, return all the histograms at once, in a single command?

I’m mostly interested in performance, because I don’t want Postgres to run the same row-matching many times, once for each column, over and over. The only change is which column is used for the histogram grouping. Otherwise it’s the same set of rows.

Solution:

User Abelisto‘s comment & the other answer both have the correct sql required to generate the histogram for multiple fields in 1 single query.

The only edit I would suggest to their efforts is to add an ORDER BY clause, as it seems from OP’s attempts that more frequent labels are desired at the top of the result. You might find that sorting the results in python rather than in the database is simpler. In that case, disregard the complexity brought on the order by clause.

Thus, the modified query would be:

SELECT category, status, count(*)
FROM stats
WHERE id IN (1, 2)
GROUP BY GROUPING SETS ( 
  (category), (status) 
)
ORDER BY 
  category IS NULL, status IS NULL, 3 DESC

It is also possible to express the same query using sqlalchemy.

from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()
class Stats(Base):
    __tablename__ = 'stats'
    id = Column(Integer, primary_key=True)
    category = Column(Text)
    status = Column(Text)

stmt = select(
    [Stats.category, Stats.status, func.count(1)]
).where(
    Stats.id.in_([1, 2])
).group_by(
    func.grouping_sets(tuple_(Stats.category), 
                       tuple_(Stats.status))
).order_by(
    Stats.category.is_(None),
    Stats.status.is_(None),
    literal_column('3').desc()
)

Investigating the output, we see that it generates the desired query (extra newlines added in output for legibility)

print(stmt.compile(compile_kwargs={'literal_binds': True}))
# outputs:
SELECT stats.category, stats.status, count(1) AS count_1 
FROM stats 
WHERE stats.id IN (1, 2) 
GROUP BY GROUPING SETS((stats.category), (stats.status)) 
ORDER BY stats.category IS NULL, stats.status IS NULL, 3 DESC

	gamejudilebaran.word… on Chef: Roles and Environme…
	WARN: Waiting for se… on OSSEC start problem due to…
	aratik711 on Ansible issues
	aratik711 on Chef: Roles and Environme…
	situs judi on Chef: Roles and Environme…

Tag: database-performance

Reuse same query across multiple group-bys?

Rate this:

Share this: