trinity-devel@lists.pearsoncomputing.net

Message: previous - next
Month: June 2016

Re: [trinity-devel] Re: Re: My_Documents mangled UTF8

From: Fat-Zer <fatzer2@...>
Date: Wed, 22 Jun 2016 17:26:14 +0300
2016-06-22 1:06 GMT+03:00 deloptes <deloptes@...>:
> Fat-Zer wrote:
>
>> 2016-06-21 9:31 GMT+03:00 deloptes
>> <deloptes@...>:
>>> Fat-Zer wrote:
>>>
>>>> 2016-06-21 1:38 GMT+03:00 Slávek Banko
>>>> <slavek.banko@...>:
>>>>> On Tuesday 21 of June 2016 00:24:56 Fat-Zer wrote:
>>>>>
>>>>> As far as I know, this was fixed sometime in year 2011...
>>>>>
>>>>
>>>> Seems somebody missed a spot ;)
>>>>
>>>> Best results I've got:
>>>>
>>>> cat My_Documents | grep '\[b[egn]\]' | iconv -t cp1252 -c
>>>> Name[be]=Т�чка дл� дакументаў
>>>> Name[bg]=Директори� � документи
>>>> Name[bn]=ডক��মেন��ট ফোল��ডার
>>>>
>>>> So they all should be removed or fixed by native speakers (I suppose
>>>> we have a Bulgarian one here).
>>>> For Belarussian I suspect it should be "Точка для дакументаў" (not
>>>> 100% positive)...
>>>> Got no damn clue, how it should looks on Bengali...
>>>
>>> Haha, thanks yes. This is exactly what I mean. According my experience
>>> after correcting it, it looks fine until it gets reloaded. Something I
>>> noticed about KSaveFile - it does not set the encoding to the stream.
>>> I think the original files should be OK and when they get read for first
>>> time it also looks OK, but after this they get mangled.
>>> In my case it does not look like latin1 but utf8 mangling. This might be,
>>> because you run it via iconv -t cp1252
>>>
>>> cat /opt/trinity/share/apps/kdesktop/Desktop/Printers | grep '\[b[egn]\]'
>>> Name[be]=Друкаркі
>>> Name[bg]=Принтери
>>> Name[bn]=মদরণ
>>>
>>>
>>> Anyway, thank you for looking into it. As I wrote before the workaround
>>> for me was to remove the write permissions on the file after changing it.
>>>
>>> regards
>>>
>>
>> Are you positive, that the fixes reverts with the logins — that sounds
>> mostly impossible... You are likely just reinstalled the kdesktop
>> package.
>> And what's the correct Bulgarian variant, if you may provide such?
>
> I updated the My_Documents file and saved. It appears correct on the screen,
> but after some time or login - don't recall exactly it appears mangled on
> the screen. I check the file and the mangled is there - definitely UTF.
> I observed the same with the KSaveFile as stated above.
>
> The correct one is
> مستنداتي
> Name[bg]=Документи
>
> Oh, I just see here (from git):
> tdebase/kdesktop/init/My_Documents
>
> [Desktop Entry]
> Encoding=UTF-8
> Icon=folder_wordprocessing
> Name=My Documents
> Name[af]=Dokument Gids
> Name[ar]=مستنداتي
> Name[be]=ТÑчка Ð´Ð»Ñ Ð´Ð°ÐºÑƒÐ¼ÐµÐ½Ñ‚Ð°Ñž
> Name[bg]=Ð”Ð¸Ñ€ÐµÐºÑ‚Ð¾Ñ€Ð¸Ñ Ñ Ð´Ð¾ÐºÑƒÐ¼ÐµÐ½Ñ‚Ð¸
> Name[bn]=ডকà§à¦®à§‡à¦¨à§à¦Ÿ ফোলà§à¦¡à¦¾à¦°
>
> So it's broken in git - perhaps one could find the original file and it is
> correct there - what a mess - I always hated this encoding stuff. Each time
> you have to write some code dealing with text ... it was such a pain ... it
> will last another 10-20y before its gone.
>
> regards
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: trinity-devel-unsubscribe@...
> For additional commands, e-mail: trinity-devel-help@...
> Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/
> Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting
>

Ok, I've found a way to demangle those locales completely. The reason
the iconv failed on some chars is that the encoding is mess of cp1252
and latin1 (the first one got non-leter symbols in place of some
control sequences)

Here is the patch.

Attachments: