History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: BOO-633
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Doug H
Reporter: Hans-Christian Holm
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Boo

Allow Unicode identifiers

Created: 06/Dec/05 07:47 AM   Updated: 08/Feb/06 07:31 PM
Component/s: Parser
Affects Version/s: 0.7.5
Fix Version/s: 0.7.6

Time Tracking:
Not Specified

File Attachments: 1. Text File Babel.cs (2 kb)
2. Text File output.txt (3 kb)



 Description  « Hide
Boo should allow non-ASCII identifiers, in order to comply with the CLS specification (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconWhatIsCommonLanguageSpecification.asp), which uses the identifier format specified in Unicode Technical Report #15, Annex 7 (http://www.unicode.org/unicode/reports/tr15/tr15-18.html).

The attached C# class (Babel.cs) compiles and runs fine, but it can't be used in Boo code (or ported to Boo). The output.txt file is the output of a client calling each public method in turn.

I guess most programmers avoid non-ASCII identifiers, but that decision should be taken by the programmer, not the programming language. I sometimes use the special Norwegian letters (ÆØÅæøå), escpecially when working with databases, where they are not uncommon.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Doug H - 06/Dec/05 09:19 AM
Try changing the ID_LETTER rule at the bottom of boo.g and wsaboo.g to this and see how it works for you:

protected
ID_LETTER : ('_' | 'a'..'z' | 'A'..'Z' | {System.Char.IsLetter(LA(1))}? '\u0080'..'\uFFFE');

Your boo scripts may require utf encoding.


Doug H - 12/Dec/05 10:09 AM
It adds about 1000 more lines to the lexer, but the C# grammar in the file sharing section here: http://www.antlr.org/
shows a way to do unicode identifiers that more closely follows the C# standard. It even allows escaped unicode characters in identifiers like \u0000 or \U00000000. The license is BSD.