Using /, ? and & in the path with Apache mod_rewrite and PHP
I came upon a nasty problem the other day: The rel-tag microformat, poorly designed like so many of them, requires the URL to end on the tag name. Not a cleaned up, ANSI compatible version, no, a 1:1 representation. This on the one hand leads to some woes regarding encoding and non-ansi characters, think umlauts!, and on the other hand poses problems with caracters that usually are not allowed within an URL or have a special meaning. Think for example of a tag called “Q&A”.
Unfortunately, as we will see, simple escaping is not enough.
When it comes to routing requests to PHP code, a common approach is to use Apache’s mod_rewrite to redirect everything to a single file. The according code usually looks like this:
This code turns the current path into a query parameter named ‘path’.
However, the rewriting fails if the path contains either ? or &, even if they are url-escaped. For example
/q%26a
where %26 is the escaped ampersand,
with be turned into
index.php?path=q&a=
Note that the escaped ampersand has been turned into a real ampersand, thus dividing query parameters. We can verify this in PHP:
returns
path => q
a =>
Seems like Apache is ubersmart here.
There are two solutions to this problem:
1.) Use the B rewrite flag:
Using the ‘B’ prevents Apache from peeking into the path and from unescaping the ampersand. The path is now correctly turned into:
index.php?path=q%26a
And in PHP we get
path => q&a
The ‘B’ rule has been added in Apache 2. But for some reasons it doesn’t work reliable across distributions. While there are no problems on my local Ubuntu 10.4, there’s no effect on my webhoster’s Gentoo machine.
Luckily, there is a more generic solution
2.) Use double escaping
Escaping ?, & (and even /) twice allows safe usage of any of them. Escaping twice means to turn
? => %3F => %253F
& => %26 => %2526
/ => %2F => %252F
So, if we invoke
/q%2526a
we get
index.php?path=q%26a
And in PHP we get
path => q&a