Valid Username Regular Expression

  • + 3 comments

    Warning about [aA-zZ]

    I was wondering where the unusual looking syntax in the character class came from, and why it wasn't documented anywhere (in java or another language). Turns out that it is not the same thing as [a-zA-Z]! Instead, it's parsed literally, meaning "the set of characters including 'a', plus everything from 'A' to 'z' (inclusive), plus the character 'Z'". And an ASCII table (or the wiki page) shows you what's wrong:

    Char| Dec | Hex
    ---------------
     A  | 65  | 41 
     B  | 66  | 42 
    ... |     | 
     Y  | 89  | 59 
     Z  | 90  | 5A 
     [  | 91  | 5B // Bracket 
     \  | 92  | 5C // Backslash 
     ]  | 93  | 5D // Bracket 
     ^  | 94  | 5E // Caret 
     _  | 95  | 5F // Underscore 
     `  | 96  | 60 // Backtick 
     a  | 97  | 61 
     b  | 98  | 62 
    ... |     | 
     y  | 121 | 79 
     z  | 122 | 7A 
    

    There are six symbol characters unintentionally included! The trick seems to work, because it has all 52 (both lower and upper case) characters of the roman alphabet. Also, you can see that the 'a' and 'Z' are silently redundant, thus [aA-zZ] is identical to [A-z].

    But yes, this can easily lead to faulty regular expressions. So I wanted to throw up this caution for everyone who is enjoying the (otherwise very nicely documented) pattern that sinithwar has shared.