Posted: July 27th, 2008 | Author: sofia | Filed under: ask questions, curious, performance | Tags: performance | 8 Comments »
Maybe this already exists but since my searches didn’t turn up anything, i thought i’d post this.
You have an app coded more or less rest style. Every post request implies there was a data change (-> cache becomes stale), every get request implies there was no change in the data (-> cache stays fresh). So you know that if a post request was made to domain.com/admin/news, the news cache becomes stale. I won’t go really deep here, in that if you change item 8 of the news table, you might only have 2 stale caches, the one that lists the news and the one that shows item 8 of the news table ( ie. domain.com/news and domain.com/news/8 or domain.com/news/title-of-article) and not every cache belonging to the news group but let’s keep it simple here.
I would like to know if there’s anything out there that parses the apache logs for post requests and if there was a post/put/delete in any url, according to a few configurable rules, it will automatically do a get to the correspondent url. For example, if a post was made to domain.com/admin/news/8 then it would be able to, upon parsing of the apache logs, do a get request to domain.com/news/8, generating the cache for the next user that comes along instead of waiting for the next user to generate a fresh cache - keeping him waiting . It would just increase the cache hit ratio per user. It would of course run as a cron job.
I like this solution because it really keeps the caching code (if cache exists, expire cache, use cache, etc) outside the app, becoming simply another layer, where it really should be.
I really think that this makes sense from a rest perspective so i suspect it’s already out there..
So anyone know of anything? Preferably in php, but python or ruby is ok too.
Thanx :=)
Posted: July 9th, 2008 | Author: sofia | Filed under: performance, scalability, useful | Tags: cache, htaccess | No Comments »
There are several caching strategies in web application development:
- database caching
- fragment caching - parts of the rendered output are cached.
- page caching - the whole output is cached. Useful when the page requires no authentication and has no personalization (eg. no Hello John if he’s logged in).
Page caching is the fastest method since the webserver can serve the html page directly allowing for the web app to be totally bypassed.
The idea is if a page is cached then the webserver detects that it exists in the cache folder and displays it, if the page doesn’t exist then the request continues to the web application which will render the page and save it to cache.
So how can the webserver detect this? Well, modrewrite to the rescue :=)
Here’s the snippet that makes it all possible:
[code]
IndexIgnore *
DirectoryIndex index.php index.html
Options +FollowSymLinks
RewriteEngine On
#if the request is domain.com/about, it will check if domain.com/cachehtml/about.html exists, if so displays it
RewriteCond %{DOCUMENT_ROOT}/cachehtml/$1.html -f
RewriteRule ^([a-z_-]+)$ cachehtml/$1.html [NC,QSA,L]
#goes one level deep
#if the request is domain.com/blog/hello-world, it will check if domain.com/cachehtml/blog/hello-world.html exists, if so displays it
RewriteCond %{DOCUMENT_ROOT}/cachehtml/$1/$2.html -f
RewriteRule ^([a-z_-]+)/([a-z_-]+)$ cachehtml/$1/$2.html [NC,QSA,L]
[/code]
The code above implies we are at the root. If not just add the folder name so that
RewriteCond %{DOCUMENT_ROOT}/cachehtml/$1.html -f
becomes
RewriteCond %{DOCUMENT_ROOT}/mysite/cachehtml/$1.html -f
Posted: June 7th, 2008 | Author: sofia | Filed under: open, performance, tech | Tags: performance, tools | No Comments »
Http_load is another cool webserver performance tester that gives simple stats on how your webapp is performing.
How to install in OS X
- Download from http://www.acme.com/software/http_load/
- Open terminal, cd to the directory where the archive is and unzip
$ tar xvzf http_load-12mar2006.tar.gz
- Move to that directory
$ cd http_load-12mar2006
- Run
$ make
- Run
$ sudo make install
You’re ready! Open up a text editor and write down the website’s url you want to test (your own preferably), then cd to the directory where the .txt is and run
$ http_load -parallel 5 -fetches 100 name_of_file.txt
which means open 5 concurrent connections and fetch the webpage 100 times.
You’ll get something like this:
100 fetches, 5 max parallel, 1.34237e+07 bytes, in 15.842 seconds
134237 mean bytes/connection
6.31234 fetches/sec, 847351 bytes/sec
msecs/connect: 28.9069 mean, 75.011 max, 14.865 min
msecs/first-response: 435.84 mean, 2484.28 max, 96.082 min
93 bad byte counts
HTTP response codes:
code 200 — 100
I highlighted the important bits. At the moment the webserver is capable of handling 6 requests per second and has a mean average initial latency of 435 milliseconds.
Http_load tells you how your webapp is currently performing allowing you to test it under different conditions, basically it’s a benchmarking tool juts like httperf i covered here. The next step is optimization. Have a look at the 1st part of Getting Rich with PHP 5 (what a crappy title) by rasmus lerdorf for tools you can use to profile your code and some tips on optimization. In the example shown he goes from 17 reqs/sec to 1100 reqs/sec .
Posted: June 7th, 2008 | Author: sofia | Filed under: open, performance | Tags: performance, tools | 6 Comments »
Httperf is a webserver performance tester. There are loads of performance testers out there (take a look here ) but i was up and running with httperf in no time. So here’s a quick get started guide
- Download the latest version from ftp://ftp.hpl.hp.com/pub/httperf/
- Install
- $ tar xvzf httperf-0.9.0.tar.gz
- $ cd httperf-0.9
- $ ./configure
- $ make
- $ sudo make install
Httperf is installed by default in /usr/local/bin/httperf. You then invoke httperf from the command line.
- Have a website to test (lol)
- Here’s a sample command
$ httperf –server hostname –port 80 –ur /test.html –rate 150 –num-conn 27000 –num-call 1 –timeout 5
Example: You have your site on localhost and for now just wanna test that.
- $ httperf –server localhost –ur /about.html –num-conns 1000
- test the page about.html in the localhost server making 1000 concurrent connections
- $ httperf –-server=localhost –-wsess=12,8,2 –-rate=1 –-timeout=5
- The –wsess sets the total number of sessions to generate, the number of calls per session, and the time (in seconds) that separates consecutive calls. If we use –wsess=12,8,2, we’re setting 12 sessions at five calls per session with two seconds between each call.
- The –rate switch specifies the number of HTTP requests/second sent to the Web server — indicates the number of concurrent clients accessing the server. [Update] Actually when used together with –wsess it specifies the number of sessions and not of requests -> see comment by John Wilkinson below
- The –timeout switch sets the maximum number of seconds to wait for a server response before httperf gives up. The default is forever so it’s good practice to set it just in case the server hangs (hangings your resources also). If this timeout expires, httperf considers the corresponding call to have failed.
- The –num-conn sets how many total HTTP connections will be made during the test run - this is a cumulative number, so the higher it is, the longer the test runs
- Analyze the statistics printed to the console.
There are six groups of statistics: overall results, results pertaining to the TCP connections, results for the requests that were sent, results for the replies that were received, CPU and network utilization figures, as well as a summary of the errors that occurred.
Example printout:
“Maximum connect burst length: 1
Total: connections 100 requests 100 replies 100 test-duration 16.385 s
Connection rate: 6.1 conn/s (163.8 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 135.5 avg 163.8 max 406.4 median 159.5 stddev 37.4
Connection time [ms]: connect 19.0
Connection length [replies/conn]: 1.000
Request rate: 6.1 req/s (163.8 ms/req)
Request size [B]: 64.0
Reply rate [replies/s]: min 5.8 avg 6.1 max 6.2 stddev 0.2 (3 samples)
Reply time [ms]: response 74.1 transfer 70.8
Reply size [B]: header 514.0 content 15405.0 footer 1.0 (total 15920.0)
Reply status: 1xx=0 2xx=100 3xx=0 4xx=0 5xx=0
CPU time [s]: user 3.52 system 12.78 (user 21.5% system 78.0% total 99.5%)
Net I/O: 95.3 KB/s (0.8*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
“
The connection rate, the request rate and the reply rate are the ones to look at. The better a website is performing (at the rate requested) the closer the connection and reply rate rate will be to the request rate specified in the initial command (–rate). Normally you do a series of tests, always increasing the request rate until you start to see that the reply and connection rate are no longer keeping up - that’s when you’ve hit your boundary, ie. how many requests per second your webapp is able to handle.
Also check autobench for automation of the testing process, here for an example of how httperf was used to benchmark the evolution of a project, an article from the source httperf—A Tool for Measuring Web Server Performance and finally this peepcode looks interesting.
Anyway, if i’ve missed any important information please say so in the comments.
[Update] Ted Bullock, one of the developers of httperf, was kind enough to point me to his quickstart guide, a six page long doc which has much more detailed information :=)