WTF!? preg_replace() returns null?

| January 25th, 2008 |

On one of our sites were were running into a problem when we tried to pass HTML content from a database through an email obfuscation function to prevent spiders from scraping our clients’ email addresses. We quickly discovered that some of the longer pages were showing up completely blank. The preg_replace() function we were using to run the obfuscation code on email addresses was returning null. After some hunting I found the answer.

In PHP 5.2, Perl Compatible Regular Expressions (PCRE) introduced with little fanfare a PHP setting called backtrack_limit, which, for the first time, set a limit on the number of backtracks a regular expression could perform before it stops operating and reports an error. Unfortunately, when PCRE encounters an error of this type, it doesn’t report a notice or warning or error. All it does is return NULL, something that the preg family of functions typically never does. There were a lot of entries on the PHP.net site reporting this behavior as a bug, and sites that are regex heavy (like Wiki sites) scrambled to figure out WTF was going on.

The only way to actually determine that this type of PCRE error took place in your code is to call preg_last_error() after you’ve tried to run your regex. Of course, before PHP 5.2, backtrack errors were handled much more gracefully (if they were even triggered), by returning the original string that was passed to the regex function.

To get around this backtrack limit, if you’re running regex on large pages (or really long strings) is to increase the backtrack limit in your PHP.ini settings. I increased ours from 100,000 to 1,000,000. Of course, you still run the risk of producing an error on really, really long strings, and that’s why a second step you should take is to add better error handling any place where you might run a PCRE function on a really long string. Should an error be produced, it’s up to you how to handle it, whether that be returning the original string, or breaking your string up into smaller pieces and running them separately.

Ultimately the best thing one can do (and should always do) is optimize your regex as much as possible, and for some people that just means knowing when to use regex and when a simple str_replace() will suffice.

Intervals blog updates in your inbox!

Lear

Leave a Reply

The Intervals Blog
A collection of useful tips, tales and opinions based on decades of collective experience designing and developing web sites and web-based applications.

What is Intervals?

Intervals is online time, task and project management software built by and for web designers, developers and creatives.
Learn more…

Contributor Profile
John Reeve

John is a co-founder, web designer and developer at Pelago. His blog posts are inspired by everyday encounters with designers, developers, creatives and small businesses in general. John is an avid reader and road cyclist.
» More about John Reeve
» Archived posts by John Reeve

Contributor Profile
Michael Payne

Michael is a co-founder and product architect at Pelago. His contributions stem from experiences managing the development process behind web sites and web-based applications such as Intervals. Michael drives a 1990 Volkswagen Carat with a rebuilt 2.4 liter engine from GoWesty.
» More about Michael Payne
» Archived posts by Michael Payne

help.myintervals.com
Videos, tips & tricks