Post by Vincent van RavesteijnPost by Jean-Marc LasgouttesOK, I have an idea for having correct selections without loosing the
Color_selectiontext enum: we could draw the complete string as
selected and non-selected, but use clipping to make sure that only the
right part of the selection is visible. It will be a bit tricky, but
it is doable.
In LyX 2.0.7 coloring parts of arabic strings works ok. So, I'm not sure
why there is a problem here now. Ok, ligatures that should have
different parts colored differently is a bit difficult. My feeling is
that it is ok to split the ligatures in this exceptional case. The
contextual forms in arabic though are not ligatures and can be painted
in different colors without problems.
Actually, in master, the composition of character is done also by looking
forward and therefore by using characters beyond the ones we are interested
in. However, this is all hardcoded stuff, and I would like instead to rely
on whatever information Qt can give me.
Have you had a look on QFontMetrics::width(QString const & str, int n =
-1). This function interprets the whole string str, and computes the width
of the string up to the nth character. This gives you the correct positions
for the arabic contextual forms.
Post by Vincent van RavesteijnPost by Jean-Marc LasgouttesThe bigger problem will be cursor positioning, but I need more
information from people who understand Arabic writing to progress on
What is the difference between a ligature and a contextual form?
According to: http://en.wikipedia.org/wiki/Arabic_alphabet#Ligatures, there
is only one compulsory ligature (having two forms, see later), and that's
the one I showed in a previous mail.
The contextual forms means that in general, each character has four
different presentation forms. These are the unicode points in the
arabic_table in src/Encoding.cpp. The unicode points are located in the
"Arabic Presentation Forms-B" unicode table.
In the four columns you can see the:
- isolated form
- end form; when the character is only connected to the character in front,
- initial form; when the character is only connected to the character
behind.
- mid form; when the character is connected to both the character in front
as behind,
(i hope you can see the figures)
An example of ha (0x0647):
Ha has four different representation forms.
Here an example of meem (0x0645):
Although the character looks pretty much the same in the first/second and
third/fourth form, they are different forms and have therefore different
unicode points.
The case is different when considering for example waw (0X0648):
This character can only be connected to the character before, but never to
the character behind. This means that the first and third form have the
same unicode point, as well as the second and fourth form. The reader can
confirm this in the arabic_table that this is indeed the case.
See also: QChar::joining().
Are there in arabic Compose character that do not really have their own
width (like accents in latin scripts), but decorate another character?
Most important compose chars are the "accents" that indicate the vowel
sounds and a few more. The range as defined by Encodings::arabicComposeChar
follows exactly what is defined by ISO-8859-6.
I think that the chars are recognized from Qt by QChar::category() ==
Qt::Mark_NonSpacing.
I want to have some feeling of how this works. If you have a web page for
newbies describing these features, this would be perfect. Also, what
program is supposed to have a sound implementation of these languages in
terms of behavior? Word? LibreOffice?
I used Wordpad during my learning, but don't know whether it is sound
enough.
I am not sure when I will have time to continue, but I want to understand
all these things.
And first I will probably try to implement your idea of using Qt to place
cursor.
Ah that answers my first question.
JMarc
Vincent