Showing posts with label unicode. Show all posts
Showing posts with label unicode. Show all posts

Tuesday, January 24, 2012

Umlauts, pound signs and more!

Until I learnt this tip, whenever I needed some fancy character (£,°,á,ß, etc.) I went hunting for somewhere to copy and paste it from. Not any more!

I assigned my previously useless windows key to be my Compose key. To type the above pound sign I pressed the windows key, then pressed the L key, then pressed the minus key. Scharfes-S (ß) is simply windows key then press the S key twice. To type á (a with an accent) you press ' (shift-7) and a after the window key.

(Note: until I wrote this post I thought you had to hold the Compose key down while pressing the other keys. That works too, but it is easier to treat it as a modal switch. In other words, you press the Compose key to enter Compose Mode, then the next two characters you type are interpreted together. And if the two characters you type have no special meaning then nothing happens. Either way you exit Compose Mode after your two keypresses.)

See here for the full list of all the compose options

Naturally the above is for linux. This page answers how to do it on Windows and here is the same concept on a Mac



Bonus Tip. Ever wanted to write a Japanese post office sign ( 〒 ) ? Enter your Japanese IME and type yuubin (ゆうびん) (then press space). The ~ symbol can be done by typing kara (から).


˙unɟ ɹoɟ ʇsnɾ ˙uʍop ǝpısdn ʇxǝʇ ɹnoʎ suɹnʇ ʇɐɥʇ ǝʇıs ɐ puıɟ :dıʇ snuoq ɹǝɥʇouɐ
(Yes, this is part of Unicode too.)

Monday, June 2, 2008

Bidirectional unicode codes and Arabic

I have been importing Arabic data into MLSN (http://dcook.org/mlsn/) (incidentally, almost all data currently comes from the AWN project). We have a csv list of synonyms, and each synonym optionally has the Arabic root in square brackets. The whole list is an SQL string, surrounded by single quotes.

When I view in SciTE it looks exactly as I'd expect: the arabic is right-to-left, the square brackets are part of the flow, and then at the apostrophe we're back to left to right. (By the way, vi appearance is the same as SciTE)
But in gedit, firefox and open office writer, when the last synonym has a square bracket it gets jumbled up.

Here is how it looks in scite:


Here is how it looks in gedit:


So, I tried adding the unicode RLE (0x202B) character before the opening square bracket and before the closing square bracket. (Incidentally, in the tab-separated file this fixed the problem in all editors.)
In SciTE no change, which is good.
In gedit et al it now has the square brackets in the flow correctly, but it has been moved to the end of the line and the following SQL clauses are now right-to-left.
Putting a unicode LRE (0x202A) before the following comma did not help! (More precisely it moved the ",'awn');" part back into the left-to-right flow, but still left the Arabic string stranded on the right; but anyway the LRE on the comma causes MySQL problems, see below.)

Is this is a bug in all of gedit, firefox and open office? Could it be a linux or gnome bug and all those applications happen to use it, while SciTE/vi do not? Or am I doing something wrong? Any advice gratefully received!

Here is how it looks in gedit with the explicit RLE codes:



MySQL and Bidirectional Codes

It seems MySQL (after doing "SET NAMES UTF8;" of course), can cope with LRE/LRE inside a quoted string, but they cannot be in the SQL string itself. E.g. on a comma.

Firefox, IE6 and Bidirectional Codes

Without the RLE codes firefox messed up showing the square brackets in both normal display and in an edit form. With the RLE codes, just before each open and each close square bracket, it shows it correctly in normal display, but still gets it wrong in the HTML form input box.
(As an aside, IE6 on Windows XP is the opposite of firefox! The main table is (very) wrong but the edit box is correct!) (And as an aside to my aside, if there is one thing Windows does well it is i18n fonts: the Arabic looks beautiful.)

phpmyadmin has a textarea that shows it correctly (with firefox). They use an explicit dir="ltr" (?!). Using that did not work for me.

So, my solution was when the edit form is being used for Arabic is to dynamically set dir to "rtl", and "ltr" the rest of the time. And (for the IE users) also set dir="rtl" on the Arabic cells, and dir="ltr" on other cells, in the main table. There is no Arabic UI currently, but when there is the dir="rtl" will be set globally via a style sheet (which is why I set the default dir="ltr" explicitly on data cells for non-Arabic languages).

See it in action by doing a search on MLSN. Here is one example: http://two.dcook.org/software/mlsn/main.php?c=06car0
(mouseover the table cells, then clicking the cell will make the edit form active; compare Arabic with the other languages.)

I am open to suggestions for alternative solutions, but I believe this is the "proper" standards, cross-browser solution.