Cocoa/Obj-C Unicode Regular Expressions

MsgFiler is primarily an AppleScript Studio application. To perform the search on mailboxes, MsgFiler calls several home-grown PHP scripts. These scripts, however, don’t handle Unicode text all that well. As a result, when users search for mailboxes with accented characters, they invariably run into problems with MsgFiler.

I’ve thrown in some custom-subclasses and functions into the MsgFiler application that I can access from within the AppleScript Studio app. My goal is to rewrite the search algorithm using Cocoa/Objective-C. I’m eager to read up some pointers on how to successfully search Unicode strings under Mac OS X. Any tips from the development community on where to start?

2 thoughts on “Cocoa/Obj-C Unicode Regular Expressions

  1. Probably not any specific help as you’ve probably already been through this, but PHP uses PCRE (Perl Compatible Regular Expressions) for regular expressions which sounds like it support Unicode fairly well (although I’ve never tested its unicode support). Of course, PCRE can be used in Cocoa using AGregex (a wrapper for PCRE).

    I also bumped into Unicode’s document describing how to adapt regular expressions to handle unicode, if that’s helpful.

    HTH,

    Morgan

  2. Morgan:

    I am using PHP’s preg_match_all function to perform the string matching. Perhaps my implementation of it needs reworking:

    	// Escape any grep-specific characters
    
    $search = preg_quote($search);
    $search = str_replace("/", "/", $search);
    
    	// Create the pattern
    
    $pattern = "/(.*)*!(.*)" . str_replace(" ", "(.*)", $search) . "(.*)/i";
    $out = array();
    
    
    	// Get all the matches and stuff them into the out array
    
    preg_match_all($pattern, $mboxes, $out);
    

    Where $mboxes is a string containing all of the mailboxes to search. There is a /u option in preg_match_all, but that doesn’t seem to work. Any additional thoughts?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: