Standalone cache according to http requests

Posted: July 27th, 2008 | Author: sofia | Filed under: ask questions, curious, performance | Tags: | 8 Comments »

Maybe this already exists but since my searches didn’t turn up anything, i thought i’d post this.

You have an app coded more or less rest style. Every post request implies there was a data change (-> cache becomes stale), every get request implies there was no change in the data (-> cache stays fresh). So you know that if a post request was made to domain.com/admin/news, the news cache becomes stale. I won’t go really deep here, in that if you change item 8 of the news table, you might only have 2 stale caches, the one that lists the news and the one that shows item 8 of the news table ( ie. domain.com/news and domain.com/news/8 or domain.com/news/title-of-article) and not every cache belonging to the news group but let’s keep it simple here.

I would like to know if there’s anything out there that parses the apache logs for post requests and if there was a post/put/delete in any url, according to a few configurable rules, it will automatically do a get to the correspondent url. For example, if a post was made to domain.com/admin/news/8 then it would be able to, upon parsing of the apache logs, do a get request to domain.com/news/8, generating the cache for the next user that comes along instead of waiting for the next user to generate a fresh cache - keeping him waiting . It would just increase the cache hit ratio per user. It would of course run as a cron job.

I like this solution because it really keeps the caching code (if cache exists, expire cache, use cache, etc) outside the app, becoming simply another layer, where it really should be.

I really think that this makes sense from a rest perspective so i suspect it’s already out there..

So anyone know of anything? Preferably in php, but python or ruby is ok too.

Thanx :=)


data visualization for everyone

Posted: July 14th, 2008 | Author: sofia | Filed under: Uncategorized | Tags: | No Comments »

Just found out about the wonderful project by IBM Many Eyes that alows anyone to create data visualizations without any programming knowledge (the usual suspects are flash/actionscript and processing) whatsoever.

In their own words:

Many Eyes is a bet on the power of human visual intelligence to find patterns. Our goal is to ‘democratize’ visualization and to enable a new social kind of data analysis.

On the usability side, the creators are to be congratulated on the simplicity of the whole interface. The user chooses a dataset - it is also possible to upload datasets - and then chooses a visualization type, eg. tagcloud, line graph, etc, previews, and publishes it! Really simple :=)

I actually created two visualizations, one on Obama’s speeches and another on Alice in Wonderland. Since the applets are interactive you can change the words and a new visualization will pop up.

note: may take a little while to load

Obama’s ‘We’

Alice in Wonderland - you, won’t

playing with alice’s playful dialogue


google’s keeping it simple

Posted: July 10th, 2008 | Author: sofia | Filed under: simple | Tags: | No Comments »

It’s just sweet and revealing of google’s dedication to usability that google keeps a tab on the number of words the classic homepage has. If one goes in another goes out. So with an eye on privacy the current count is still 28 :=)

Just read it: 13,33,53..


htaccess - display cached html version

Posted: July 9th, 2008 | Author: sofia | Filed under: performance, scalability, useful | Tags: , | No Comments »

There are several caching strategies in web application development:

  • database caching
  • fragment caching - parts of the rendered output are cached.
  • page caching - the whole output is cached. Useful when the page requires no authentication and has no personalization (eg. no Hello John if he’s logged in).

Page caching is the fastest method since the webserver can serve the html page directly allowing for the web app to be totally bypassed.

The idea is if a page is cached then the webserver detects that it exists in the cache folder and displays it, if the page doesn’t exist then the request continues to the web application which will render the page and save it to cache.

So how can the webserver detect this? Well, modrewrite to the rescue :=)

Here’s the snippet that makes it all possible:
[code]

IndexIgnore *
DirectoryIndex index.php index.html
Options +FollowSymLinks

RewriteEngine On

#if the request is domain.com/about, it will check if domain.com/cachehtml/about.html exists, if so displays it
RewriteCond %{DOCUMENT_ROOT}/cachehtml/$1.html -f
RewriteRule ^([a-z_-]+)$ cachehtml/$1.html [NC,QSA,L]

#goes one level deep
#if the request is domain.com/blog/hello-world, it will check if domain.com/cachehtml/blog/hello-world.html exists, if so displays it
RewriteCond %{DOCUMENT_ROOT}/cachehtml/$1/$2.html -f
RewriteRule ^([a-z_-]+)/([a-z_-]+)$ cachehtml/$1/$2.html [NC,QSA,L]

[/code]

The code above implies we are at the root. If not just add the folder name so that
RewriteCond %{DOCUMENT_ROOT}/cachehtml/$1.html -f
becomes
RewriteCond %{DOCUMENT_ROOT}/mysite/cachehtml/$1.html -f