Saturday, January 9, 2010

Conditional MySQL statements: Lifesaver on occasions

Have you tried building a pageview counter? Lets say you want to count the number of pageviews on a site/page on the following granularity - today, yesterday and accumulated. Now, lets assume that the Table is defined as follows:

CREATE TABLE pageviews (
     page_id varchar(100) not null,
     last_update_time int(11) not null default UNIX_TIMESTAMP(),
     today int(11) not null default 0,
     yesterday int(11) not null default 0,
     acc int(11) not null default 0
);

Now, your counter would normally apply a query like this to increment the counter:
update pageviews set today = today+1, acc = acc + 1, last_update_time = UNIX_TIMESTAMP() where page_id = 'xxx';

But when do you move the today counter to yesterday? Remember that after the stroke of midnight, today count should switch to yesterday and reset to 0. You could always write a cronjob which would do this for all the rows in the table at the stroke of midnight, but the trouble is that as the number of rows increase, the execution time of such a query increase and in the meanwhile (say between 0001hrs and 0010hrs), queries on the page counting client would either be blocked or be incorrectly executed making it an undesirable situation.

The solution to that is a conditional MySQL statement like the following:

UPDATE pageviews SET 
        yesterday = CASE WHEN last_update_time > xxx_ts THEN yesterday ELSE today END,
        today     = CASE WHEN last_update_time > xxx_ts THEN today + 1 ELSE 1     END, 
        acc = acc + 1, last_update_time = UNIX_TIMESTAMP() WHERE page_id = 'xxx';
where xxx_ts is the unix timestamp of the stroke of midnight of today. 

This is essentially crunching all that complex logic of having to worry about the switch time into the query. Remember that at the MySQL end, all the queries will be applied one after the other, and hence even if the same query is run one after the other, it will be set correctly.

4 comments:

  1. You can do the same by "Insert ... on duplicate update" construct in mysql (http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html)

    -Shanky
    ReplyDelete
  2. The solution doesn't handle the count cleanly from midnight - till the first page view occurs.


    Case 1 - What happens when a particular page doesn't receive a hit till late evening. You'll be showing wrong values.
    Case 2 - A page doesn't receive clicks for an entire day. The calculations will be plain wrong.
    ReplyDelete
  3. @Gokul A: Haven't I written about the cron job you would have to run each night? Wouldn't that update both the cases you are mentioning? This solution is deployed on a very high traffic site with millions of users and it works, so if the conditions you mention are definitely handled - it is a different matter that I might be missing a step in this blog entry (though I don't see what it is when I am reading it presently.)
    ReplyDelete
  4. @Shreeni - So you take advantage of the fact that you are most surely to get hits on the page? Or do you've a combination of the cron as well as the conditional updates on the DB?

    Anyways, my point was that if the traffic isn't high and the counter solely relies on the conditional update, we might have incorrect results.
    ReplyDelete