Skip to content Skip to sidebar Skip to footer

Django: Return One Filtered Object Per Foreign Key

Is it possible to return querysets that return only one object per foreign key? For instance, I want the to get the latest comments from django_comments, but I only want one comme

Solution 1:

This is a fairly difficult thing to do in SQL at all; you probably won't be able to do it through the ORM.

You can't use GROUP BY for this. That's used for telling SQL how to group items for aggregation, which isn't what you're doing here. "SELECT x, y FROM table GROUP BY x" is illegal SQL, because the value of y is meaningless.

Let's look at this with a clear schema in mind:

CREATETABLE objects ( id INTEGERPRIMARY KEY, name VARCHAR );
CREATETABLE comments ( object_id INTEGERREFERENCES objects (id), text VARCHARNOTNULL, dateTIMESTAMPNOTNULL );

INSERTINTO objects (id, name) VALUES (1, 'object 1'), (2, 'object 2');
INSERTINTO comments (object_id, text, date) VALUES
   (1, 'object 1 comment 1', '2010-01-02'),
   (1, 'object 1 comment 2', '2010-01-05'),
   (2, 'object 2 comment 1', '2010-01-08'),
   (2, 'object 2 comment 2', '2010-01-09');

SELECT*FROM objects o JOIN comments c ON (o.id = c.object_id);

The most elegant way I've seen for doing this is Postgresql 8.4's windowing functions.

SELECT * FROM (
    SELECT
        o.*, c.*,
        rank() OVER (PARTITION BY object_id ORDERBYdate DESC) AS r
    FROM objects o JOIN comments c ON (o.id = c.object_id)
) AS s
WHERE r = 1;

That'll select the first comment for each object by date, newest first. If you don't see what this is doing, execute the inner SELECT on its own and watch how it generates rank(), which makes it pretty straightforward.

I know other ways of doing this with Postgresql, but I don't know how to do this in other databases.

Trying to compute this dynamically is likely to give you serious headaches--and it takes more work to make these complex queries perform well, too. Chances are you're better off doing this the simple way: store a last_comment_id field for each object and update it when a comment is added or deleted, so you can just join and sort. You could probably use SQL triggers to handle this updating automatically.

Solution 2:

Thanks Glenn and vdboor. Agreed, the proposed idea creates way to much sql complexity and will seriously impact performance.

The last_comment_id suggestion is very good, but I believe that for my particular situation the best thing to do is create a separate "THREAD" model that stores the content_type and object_pk of the original object commented upon as well as the id and timestamp of the object's last comment, among a few other things. This will allow simple content object lookups and chronologically filtered querysets, and will make what's happening under the hood more closely mirror the front-end presentation, which is probably a good idea for posterity. :)

Cheers,

jnh

Solution 3:

Consider storing the last post as a foreign key somewhere (e.g. in the parent object table). Each time a message is posted or deleted, update this key.

Yes, it's duplication, but worth considering. Having to run complex queries for each request (especially the index page) could take your application performance down. This is the pragmatic way to get the desired effect without losing performance.

Post a Comment for "Django: Return One Filtered Object Per Foreign Key"