Alternate methods of MPhone algorithms implementation

Vitalii Symon, 10 December 2010

Problem statement

Initially the mPhone library was created when we faced a problem to create a method to parse a database of European phone numbers which has lots of different and sometimes terrible formats.

Alternate methods of MPhone algorithms implementation*

In the beginning, on day one we had eight number formats which were looking pretty easy to format with a small c# method and that was done. Writing code took about 20 minutes and the method was parsing phone numbers easily and quickly… until we got 10 numbers more. And then 28 numbers more – each was in different format. Since predicting content of database with thousands of records was hard, we decided to make simple and efficient mechanism of parsing numbers which could be changed by user on the fly.

Round-trip from simple to complex and back

Then first version of the library was interpreting expressions in extremely simple way where one letter was one digits but we realized soon that this way of parsing will need heaps of translate patterns to process those numbers. So we added:

A[2,4] – A means group of numbers from 2 to 4 numbers long

_ - optional space, so A_B would match 00 and 0 0 records

{ } – optional brackets, so +49{0}2AB would accept both +490200 and +49(0)200

Then these expressions were translated to regular expressions and then expressions were applied to input strings. Using these additions we reduced number of patterns but got pretty complex expressions which couldn’t be understood even for advanced users, calculating expected behaviour took time sometimes, so we have made it … even more complex.

To make number of patterns small and more easy to predict we have split number into 3 parts – country code (optional), area code (mandatory) and phone number (mandatory). Every parts of number had own list of patterns. E.g. country code could be 00XX or +XX, code area was (0) YY, YY, 0YY. Only 8 patterns were enough to parse all available email but mark-up was very hard to understand and predict.

Finally we decided to return to initial model with one letter – one number. Somebody might not like that solution needs big number of patterns but it is easy to understand and use, you can create a new rule in seconds to cover an unrecognized number and no need to learn complex syntax for that.

*Image 1 from: http://www.bbc.co.uk/blogs/bryanburnett/2008/10/