Multi-table selection problem w/ SUM(), LEFT OUTER JOIN and GROUP BY

wyrd

Well-known member
Joined
Aug 23, 2002
Messages
1,408
Location
California
If youre willing to read what Im about to type and copy/paste then you may want to get a can of soda, youll be here a while. Oh.. and thanks for doing so. :)

Now to the problem. Im not really sure how to explain this, except that when I use OUTER JOIN and two SUM()s in a SELECT, it computes the sum incorrectly. This is the SELECT query in which Im having the problem with (Ill get to my table designs in just a second);

Code:
SELECT TOP 50  
	prod_items.id, 
	prod_items.item_number, 
	prod_items.description, 
	prod_items.obsolete, 
	prod_items.wholesale, 	prod_items.retail, 
	prod_cats.name AS category, 
	SUM(total_bought) AS bought, 
	SUM(total_sold) AS sold  

FROM prod_items  

JOIN prod_cats ON (prod_items.cat_id = prod_cats.id) 
LEFT OUTER JOIN prod_items_bought ON (prod_items.id = prod_items_bought.item_id) 
LEFT OUTER JOIN prod_items_sold ON (prod_items.id = prod_items_sold.item_id)

WHERE cat_id = 6 

GROUP BY 
	prod_items.id, 
	prod_items.item_number,
 	prod_items.description, 
	prod_items.obsolete, 
	prod_items.wholesale, 
	prod_items.retail, 
	prod_cats.name 

ORDER BY description, prod_items.id

To make sense out of what the heck this SELECT query is doing, Ive gone ahead and did a SELECT * FROM table in all of the tables that this SELECT query uses. Im hoping the column names are self explanatory, and the sample data in which youre about to see will help make sense of the problem as I further explain as we go. I apologize if the data is a little hard to read.

Code:
1> select * from prod_items
2> go
 id          item_number    description                    cat_id obsolete wholesale    retail
 ----------- -------------- ------------------------------ ------ -------- ------------ ------------
           1 030530144407V  56dfg                               5        0        .0000        .0000
           2 030530145237V  Hfd                                 5        0      23.0000     234.0000
           5 030531170748V  More leaves                         6        0        .0000        .0000
           7 030531170747V  Neck dupe item #                    5        1      43.0000      64.0000
           8 030603233140V  Sdf                                 6        0        .0000        .0000
           3 030531170749V  Sdf                                 5        0      23.0000      54.0000
           4 030531170747V  Small Earrings                      6        0       1.0000       3.0000
           9 030608150624V  Some item..                         6        0      11.0000      22.0000
           6 030531170747V  Test dupe item                      6        1       3.0000      10.0000

(9 rows affected)

1> select * from prod_cats
2> go
 id     name
 ------ --------------------
      5 Necklaces
      6 Leaves

(2 rows affected)

1> select * from shows
2> go
 id          name                           city            state start_date          end_date
 ----------- ------------------------------ --------------- ----- ------------------- -------------------
           6 [Show Name]                    [City]          CA    2003-06-01 00:00:00 2003-06-01 00:00:00
           7 June 2                         [City]          CA    2003-06-02 00:00:00 2003-06-02 00:00:00
           3 Some show                      Some city       ST    2003-06-07 00:00:00 2003-06-08 00:00:00
           1 Art Festival                   Pleasanton      CA    2003-06-10 00:00:00 2003-06-11 00:00:00
           5 Another show                   Another city    CA    2003-06-20 00:00:00 2003-06-21 00:00:00
           4 Cross month show               City            CA    2003-06-30 00:00:00 2003-07-01 00:00:00
           2 Art Festival                   Stockton        VA    2003-09-11 00:00:00 2003-09-12 00:00:00

(7 rows affected)

1> select * from prod_items_bought
2> go
 id          total_bought cost         item_id     date_bought
 ----------- ------------ ------------ ----------- -------------------
           4            1       1.0000           4 2003-06-09 11:23:00
           3            1       3.0000           6 2003-06-09 11:12:00
           2            2       6.0000           6 2003-06-08 23:59:00
           1            2      22.0000           9 2003-06-08 23:53:00

(4 rows affected)

1> select * from prod_items_sold
2> go
 id          total_sold amount       item_id     show_id     date_sold

 ----------- ---------- ------------ ----------- ----------- -------------------
           4          1      12.0000           6        NULL 2003-06-09 23:59:00
           2          1      30.0000           9        NULL 2003-06-09 23:54:00
           5          2      20.0000           6        NULL 2003-06-09 11:04:00
           6          5      50.0000           6        NULL 2003-06-09 11:04:00
           3          1      15.0000           6           3 2003-06-08 23:59:00
           1          1      25.0000           9           3 2003-06-08 23:54:00

Take a look closely at total_bought and total_sold from the prod_items_bought and prod_items_sold tables. As you notice, for total_bought, item_id 4 has a total of 1 bought, item_id 6 has a total of 3 bought, item_id 9 has a total of 2 bought. For total_sold, item_id 6 has a total of 9 sold and item_id 9 has a total of 2 sold. This is the information in which Id expect my SELECT query to retrieve using SUM(). Unfortunately, its not. Take a look below.

Code:
1> SELECT TOP 50
2>      prod_items.id,
3>      prod_items.item_number,
4>      prod_items.description,
5>      prod_items.obsolete,
6>      prod_items.wholesale,
8>      prod_items.retail,
9>      prod_cats.name AS category,
10>     SUM(total_bought) AS bought,
11>     SUM(total_sold) AS sold
12>
13> FROM prod_items
14>
15> JOIN prod_cats ON (prod_items.cat_id = prod_cats.id)
16> LEFT OUTER JOIN prod_items_bought ON (prod_items.id = prod_items_bought.item_id)
17> LEFT OUTER JOIN prod_items_sold ON (prod_items.id = prod_items_sold.item_id)
18>
19> WHERE cat_id = 6
20>
21> GROUP BY
22>     prod_items.id,
23>     prod_items.item_number,
24>
25>     prod_items.description,
26>     prod_items.obsolete,
27>     prod_items.wholesale,
28>     prod_items.retail,
29>     prod_cats.name
30>
31> ORDER BY description, prod_items.id
32> go
 id          item_number    description                    obsolete wholesale    retail       category             bought      sold
 ----------- -------------- ------------------------------ -------- ------------ ------------ -------------------- ----------- -----------
           5 030531170748V  More leaves                           0        .0000        .0000 Leaves                      NULL        NULL
           8 030603233140V  Sdf                                   0        .0000        .0000 Leaves                      NULL        NULL
           4 030531170747V  Small Earrings                        0       1.0000       3.0000 Leaves                         1        NULL
           9 030608150624V  Some item..                           0      11.0000      22.0000 Leaves                         4           2
           6 030531170747V  Test dupe item                        1       3.0000      10.0000 Leaves                        12          18
Warning: Null value is eliminated by an aggregate or other SET operation.

Notice the bought/sold totals? The only correct values are item_id 4, which has a total bought of 1, item_id 9 which has a total sold of 2. Yet item_id 9 has an incorrect total bought of 4 (supposed to be 2), and item_id 6 has the wrong total bought of 12 (should be 3) and total sold of 18 (should be 9). I cannot seem to figure out why.

Also while Im at it, what does the Warning that it gave me mean? :) Perhaps its linked to my problem and I just dont know it.

(post is so freakn long I gotta make a second post.. to be continued!)
 
In any case, if I remove one of the SUMs and the LEFT OUTER JOIN table that belongs to it, it calculates correctly. Weird. Take a look for yourself (for views sake, I commented out the SUM and LEFT OUTER JOIN in which I initially removed, that way you could see what was taken out of the original query)

Code:
1> SELECT TOP 50
2>      prod_items.id,
3>      prod_items.item_number,
4>      prod_items.description,
5>      prod_items.obsolete,
6>      prod_items.wholesale,   prod_items.retail,
7>      prod_cats.name AS category,
8>      SUM(total_bought) AS bought --,
9>      --SUM(total_sold) AS sold
10>
11> FROM prod_items
12>
13> JOIN prod_cats ON (prod_items.cat_id = prod_cats.id)
14> LEFT OUTER JOIN prod_items_bought ON (prod_items.id = prod_items_bought.item_id)
15> --LEFT OUTER JOIN prod_items_sold ON (prod_items.id = prod_items_sold.item_id)
16>
17> WHERE cat_id = 6
18>
19> GROUP BY
20>     prod_items.id,
21>     prod_items.item_number,         prod_items.description,
22>     prod_items.obsolete,
23>     prod_items.wholesale,
24>     prod_items.retail,
25>     prod_cats.name
26>
27> ORDER BY description, prod_items.id
28> go
 id          item_number    description                    obsolete wholesale    retail       category             bought
 ----------- -------------- ------------------------------ -------- ------------ ------------ -------------------- -----------
           5 030531170748V  More leaves                           0        .0000        .0000 Leaves                      NULL
           8 030603233140V  Sdf                                   0        .0000        .0000 Leaves                      NULL
           4 030531170747V  Small Earrings                        0       1.0000       3.0000 Leaves                         1
           9 030608150624V  Some item..                           0      11.0000      22.0000 Leaves                         2
           6 030531170747V  Test dupe item                        1       3.0000      10.0000 Leaves                         3
Warning: Null value is eliminated by an aggregate or other SET operation.

(5 rows affected)

1> SELECT TOP 50
2>      prod_items.id,
3>      prod_items.item_number,
4>      prod_items.description,
5>      prod_items.obsolete,
6>      prod_items.wholesale,   prod_items.retail,
7>      prod_cats.name AS category,
8>      --SUM(total_bought) AS bought,
9>      SUM(total_sold) AS sold
10>
11> FROM prod_items
12>
13> JOIN prod_cats ON (prod_items.cat_id = prod_cats.id)
14> --LEFT OUTER JOIN prod_items_bought ON (prod_items.id = prod_items_bought.item_id)
15> LEFT OUTER JOIN prod_items_sold ON (prod_items.id = prod_items_sold.item_id)
16>
17> WHERE cat_id = 6
18>
19> GROUP BY
20>     prod_items.id,
21>     prod_items.item_number,         prod_items.description,
22>     prod_items.obsolete,
23>     prod_items.wholesale,
24>     prod_items.retail,
25>     prod_cats.name
26>
27> ORDER BY description, prod_items.id
28> go
 id          item_number    description                    obsolete wholesale    retail       category             sold
 ----------- -------------- ------------------------------ -------- ------------ ------------ -------------------- -----------
           5 030531170748V  More leaves                           0        .0000        .0000 Leaves                      NULL
           8 030603233140V  Sdf                                   0        .0000        .0000 Leaves                      NULL
           4 030531170747V  Small Earrings                        0       1.0000       3.0000 Leaves                      NULL
           9 030608150624V  Some item..                           0      11.0000      22.0000 Leaves                         2
           6 030531170747V  Test dupe item                        1       3.0000      10.0000 Leaves                         9
Warning: Null value is eliminated by an aggregate or other SET operation.

(5 rows affected)

If anyone has any insight as to why this is so, Im all ears. At the very least, thanks for reading and a double thanks to anyone who responds.
 
Too late and too much drink to look over all that (will promise to do so in the morning though)
The Warning: Null value is eliminated by an aggregate or other SET operation message you are getting basically means there are NULL values in one or more of the fields you are perfoming aggregate operations on (SUM, AVG, MIN, MAX etc) - this can throw the results of some operations (AVG for example)., not sure why that should affect a sum though?

how are the tables related?
 
Thanks for the info. Have any suggestions on how I can force a return of 0 instead of NULL?

Here are the table relations;

prod_items.cat_id foreign key to prod_cats.id
prod_items_bought.item_id foreign key to prod_items.id
prod_items_sold.item_id foreign key to prod_items.id
prod_items_sold.show_id foreign key to prod_shows.id
 
Pondering about this myself, I decided to go ahead and re-write the SELECT query with subqueries. It works as intended, but JOINs are supposed to be more efficient with a simple SELECT query like mine (at least thats what I remember reading somewhere).

Regardless of finding another solution that works, Id still like to solve the problem in my original with JOINs. Even if JOINs really arent that more efficient then subqueries, Id still like to know why the wrong calculations were produced, that way I can spot this potential problem in the future.

Heres the solution using subqueries;

Code:
1> SELECT TOP 50
2>      prod_items.id,
3>      prod_items.item_number,
4>      prod_items.description,
5>      prod_items.obsolete,
6>      prod_items.wholesale,
7>      prod_items.retail,
8>      prod_cats.name AS category,
9>
10>     (SELECT SUM(total_bought)
11>             FROM prod_items_bought
12>             WHERE prod_items_bought.item_id = prod_items.id) AS bought,
13>
14>     (SELECT SUM(total_sold)
15>             FROM prod_items_sold
16>             WHERE prod_items_sold.item_id = prod_items.id) AS sold
17>
18> FROM prod_items
19>
20> JOIN prod_cats ON (prod_items.cat_id = prod_cats.id)
21> WHERE cat_id = 6
22> ORDER BY description, prod_items.id
23> go
 id          item_number    description                    obsolete wholesale    retail       category             bought      sold
 ----------- -------------- ------------------------------ -------- ------------ ------------ -------------------- ----------- -----------
           5 030531170748V  More leaves                           0        .0000        .0000 Leaves                      NULL        NULL
           8 030603233140V  Sdf                                   0        .0000        .0000 Leaves                      NULL        NULL
           4 030531170747V  Small Earrings                        0       1.0000       3.0000 Leaves                         1        NULL
           9 030608150624V  Some item..                           0      11.0000      22.0000 Leaves                         2           2
           6 030531170747V  Test dupe item                        1       3.0000      10.0000 Leaves                         3           9

(5 rows affected)
 
Just looked over at my thread on microsoft.public.sqlserver.programming and am getting some interesting solutions on there. If anyone is up for a read;

http://www.developersdex.com/sql/message.asp?r=3081971&p=581

Oh.. and my problem is pretty much solved. If anyone has any other solutions not given either in this thread or in the link above, Im more then willing to give em a shot to see if theyre more efficient.
 
Back
Top