Standalone cache according to http requests

Posted: July 27th, 2008 | Author: sofia | Filed under: curious | Tags: |

Maybe this already exists but since my searches didn’t turn up anything, i thought i’d post this.

You have an app coded more or less rest style. Every post request implies there was a data change (-> cache becomes stale), every get request implies there was no change in the data (-> cache stays fresh). So you know that if a post request was made to domain.com/admin/news, the news cache becomes stale. I won’t go really deep here, in that if you change item 8 of the news table, you might only have 2 stale caches, the one that lists the news and the one that shows item 8 of the news table ( ie. domain.com/news and domain.com/news/8 or domain.com/news/title-of-article) and not every cache belonging to the news group but let’s keep it simple here.

I would like to know if there’s anything out there that parses the apache logs for post requests and if there was a post/put/delete in any url, according to a few configurable rules, it will automatically do a get to the correspondent url. For example, if a post was made to domain.com/admin/news/8 then it would be able to, upon parsing of the apache logs, do a get request to domain.com/news/8, generating the cache for the next user that comes along instead of waiting for the next user to generate a fresh cache - keeping him waiting . It would just increase the cache hit ratio per user. It would of course run as a cron job.

I like this solution because it really keeps the caching code (if cache exists, expire cache, use cache, etc) outside the app, becoming simply another layer, where it really should be.

I really think that this makes sense from a rest perspective so i suspect it’s already out there..

So anyone know of anything? Preferably in php, but python or ruby is ok too.

Thanx :=)

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Mixx
  • Google
  • description

8 Comments on “Standalone cache according to http requests”

  1. #1 cpinto said at 22:07 pm on July 27th, 2008:

    Why not invalidate the cache on POST?

  2. #2 sofia said at 22:42 pm on July 27th, 2008:

    That’s what would be done but what i would like to avoid is to put that logic in the application code itself where’s it’s normally put. So in normal scenarios, you have the admin controller which in case of a post request (delete/update/insert actions), deletes the respective cache. Then in the frontend controller you have something like this
    if cache:
    //use cache
    else:
    //fetch from the database, build the response and save into the cache

    I would like to avoid this and extract the cache code from the specifics of the application code. The app code would have no cache code at all and the caching logic could just be plugged into any existing site without any customization on the application code.
    The idea is you have a cronjob that checks for post requests on specifics urls of the site. So you could have a rule like
    if a post request was made to admin/news/8 then
    delete the news/8 cache,
    do a get to news/8 and
    save the response to the cache folder.

    If a cache then exists in the cache folder (something like cachefolder/news/8.html), it would be served directly by Apache, never reaching the application code itself.

    The idea is that there’s a cache layer completely separate from the application code. In this way it could be plugged into any existing site. And it simply seems cleaner to me..

  3. #3 Alves said at 23:06 pm on July 27th, 2008:

    If you have already setup a cache that responds to all GET requests, than I’d suggest you use the POST-REDIRECT-GET pattern, which says that after a POST the server should return a 302 Response Code to redirect the browser to another page using GET. You can read the details in http://www.theserverside.com/tt/articles/article.tss?l=RedirectAfterPost but using your example, it would work like this:

    1. Browser sends> POST domain.com/news/8
    2. Server responds> REDIRECT (302) to domain.com/news/8
    3. Browser sends> GET domain.com/news/8

    As a bonus, this pattern helps avoid duplicate form submissions and sort of minimizes the “evil” back button.

    Hope this helps…

  4. #4 cpinto said at 23:15 pm on July 27th, 2008:

    there are a few alternative strategies, one is to use versioning so you’d have /news/8/1, news/8/2 and so forth where the last digit represents the version of the information. when none is asked you redirect to the latest version. couple this with reverse proxy and set up an expiry time well into the future.

    another is to create your own http request handler that does the same thing, if it’s a post it invalidates the complete response cache, else just return whatever you have stored.

  5. #5 Alves said at 23:31 pm on July 27th, 2008:

    The far future expiration time is a good trick but it uses the browser cache. I believe Sofia was interested in a server cache, so that everyone can see the cached version right just after the edit.

  6. #6 sofia said at 23:32 pm on July 27th, 2008:

    @alves

    I get your logic but normally the changes to an article - keeping up with the example - are done in the backend which is located at a different url. For example in wordpress, it’s at wp-admin by default, so that when you edit an article the url is something like http://www.domain.com/wp-admin/post.php?action=edit&post=1, then when you update it either redirects to the edit action again or to the list of posts, either way always in wp-admin part of the blog. So that was not exactly what i’m looking for, i don’t wanna force a redirect to the site itself.
    The thing is the POST is not always done on the same url as the GET, maybe not very restful but there’s usually a separate admin interface, eg. in django it’s at http://site.com/admin/.

    @cpinto
    thanx for the response :=) i didn’t really get this ‘another is to create your own http request handler that does the same thing, if it’s a post it invalidates the complete response cache, else just return whatever you have stored.’ - doesn’t this imply that both the post and the get are done on the same url? (see my response to alves)

  7. #7 Alves said at 00:00 am on July 28th, 2008:

    Sofia, you don’t have to redirect to the same url. When the server responds with a redirect (302) you (the programmer) just set the location to whatever you want. For example, a publish action could be implemented in wordpress like this:

    1. Browser sends> POST http://www.domain.com/wp-admin/post.php?action=publish&post=1
    2. Server responds> REDIRECT (302) to http://www.domain.com/
    3. Browser sends> GET http://www.domain.com/

  8. #8 cpinto said at 00:56 am on July 28th, 2008:

    not really. for example, if you’re using python and wsgi or mod_python, in the request handler you can put some code for caching. since the request handler is executed independently of the requested URL you should be able to invalidate the cache on a POST to /admin/news/8 and use the cache when you GET /news/8.

    but this really depends on what you’re using, it’s a generic enough concept but the implementation is not.


Leave a Reply