help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Polish characters in emacs


From: Peter Dyballa
Subject: Re: Polish characters in emacs
Date: Fri, 12 Oct 2007 12:20:22 +0200


Am 11.10.2007 um 23:09 schrieb Wojtek:

Could someone point me to an explanation of settings so that Polish
characters are displayed correctly in an emacs buffer and whether this
has to do with the environment outside of emacs.

There are three 8 bit ISO Latin encodings that support Polish: ISO 8859-2, ISO 8859-13, and ISO 8859-16, the 8 bit MS encoding Code Page 1250, and finally UTF-8. The all have ÓóĄąĆćĘꣳŃńŚśŹźżż.

When I open up a connection to my account and run emacs from Fedora 7
the characters do not show up when viewing a mail message encoded as
utf-8.

*How* does this happen? Does it happen that you only see empty boxes? Then you're using a font that does not have the Polish characters. Change, for example, to Lucida Sans Typewriter from Java SDK!

However I can toggle the input method to polish-slash and
enter polish characters and they do show up.

This can be something completely different. (I never use an "input method." At least not by conscience.) And it's no proof, except that this GNU Emacs can display the chosen item properly. So their might be some mis-understanding come from the eMail client used to retrieve the input data for that buffer.

When connecting to the same account from a Windows machine using Cygwin-X, the characters in
the mail message show up without a problem.

Ahh! So you are writing the whole time about eMails and their textual presentation? Which eMail client do you use to read the eMails? Can you make some of the header lines of the eMails appear in your eMail client, particularly those that describe the way the message was encoded for the transport through the Internet? The eMail client can have its own ideas of representing an eMail's contents ...

Since the emacs I am running is starting up with the same parameters, what controlsl the display of characters?

It's definitely the encoding used in the buffer. It's indicated at the beginning of the mode-line (left-most characters).

        -*: for MS CP1250 or CP1252
        -2: for ISO 8859-2   (Latin  2)
        -l: for ISO 8859-13  (Latin  7)
        -r: for ISO 8859-16  (Latin 10)
        -u: for UTF-8

BTW, with the mouse cursor you can select that character and a *Help* buffer with explanation opens.


A good method to check where the error can come from is to use a "neutral" simple and pure text file like this one:

;;; -*- mode: Text; coding: iso-8859-2; -*-
;
;       Time-stamp: <2005-05-11 23:52:49 pete>
;
;   Central and Eastern European Glyphs (Latin 2)
;
;   oct   dec   hex    UCS2    UTF-8
;=====================================
  = 240 = 160 = A0 = U+00A0 =    C2 A0 : NO-BREAK SPACE
Ą = 241 = 161 = A1 = U+0104 = C4 84 : LATIN CAPITAL LETTER A WITH OGONEK
˘ = 242 = 162 = A2 = U+02D8 =    CB 98 : BREVE
Ł = 243 = 163 = A3 = U+0141 = C5 81 : LATIN CAPITAL LETTER L WITH STROKE
¤ = 244 = 164 = A4 = U+00A4 =    C2 A4 : CURRENCY SIGN
Ľ = 245 = 165 = A5 = U+013D = C4 BD : LATIN CAPITAL LETTER L WITH CARON Ś = 246 = 166 = A6 = U+015A = C5 9A : LATIN CAPITAL LETTER S WITH ACUTE
§ = 247 = 167 = A7 = U+00A7 =    C2 A7 : SECTION SIGN
¨ = 250 = 168 = A8 = U+00A8 =    C2 A8 : DIAERESIS
Š = 251 = 169 = A9 = U+0160 = C5 A0 : LATIN CAPITAL LETTER S WITH CARON Ş = 252 = 170 = AA = U+015E = C5 9E : LATIN CAPITAL LETTER S WITH CEDILLA Ť = 253 = 171 = AB = U+0164 = C5 A4 : LATIN CAPITAL LETTER T WITH CARON Ź = 254 = 172 = AC = U+0179 = C5 B9 : LATIN CAPITAL LETTER Z WITH ACUTE
- = 255 = 173 = AD = U+00AD =    C2 AD : HYPHEN-MINUS
Ž = 256 = 174 = AE = U+017D = C5 BD : LATIN CAPITAL LETTER Z WITH CARON Ż = 257 = 175 = AF = U+017B = C5 BB : LATIN CAPITAL LETTER Z WITH DOT ABOVE
° = 260 = 176 = B0 = U+00B0 =    C2 B0 : DEGREE SIGN
ą = 261 = 177 = B1 = U+0105 = C4 85 : LATIN SMALL LETTER A WITH OGONEK
˛ = 262 = 178 = B2 = U+02DB =    CB 9B : OGONEK
ł = 263 = 179 = B3 = U+0142 = C5 82 : LATIN SMALL LETTER L WITH STROKE
´ = 264 = 180 = B4 = U+00B4 =    C2 B4 : ACUTE ACCENT
ľ = 265 = 181 = B5 = U+013E = C4 BE : LATIN SMALL LETTER L WITH CARON ś = 266 = 182 = B6 = U+015B = C5 9B : LATIN SMALL LETTER S WITH ACUTE
ˇ = 267 = 183 = B7 = U+02C7 =    CB 87 : CARON
¸ = 270 = 184 = B8 = U+00B8 =    C2 B8 : CEDILLA
š = 271 = 185 = B9 = U+0161 = C5 A1 : LATIN SMALL LETTER S WITH CARON ş = 272 = 186 = BA = U+015F = C5 9F : LATIN SMALL LETTER S WITH CEDILLA ť = 273 = 187 = BB = U+0165 = C5 A5 : LATIN SMALL LETTER T WITH CARON ź = 274 = 188 = BC = U+017A = C5 BA : LATIN SMALL LETTER Z WITH ACUTE
˝ = 275 = 189 = BD = U+02DD =    CB 9D : DOUBLE ACUTE ACCENT
ž = 276 = 190 = BE = U+017E = C5 BE : LATIN SMALL LETTER Z WITH CARON ż = 277 = 191 = BF = U+017C = C5 BC : LATIN SMALL LETTER Z WITH DOT ABOVE Ŕ = 300 = 192 = C0 = U+0154 = C5 94 : LATIN CAPITAL LETTER R WITH ACUTE Á = 301 = 193 = C1 = U+00C1 = C3 81 : LATIN CAPITAL LETTER A WITH ACUTE Â = 302 = 194 = C2 = U+00C2 = C3 82 : LATIN CAPITAL LETTER A WITH CIRCUMFLEX Ă = 303 = 195 = C3 = U+0102 = C4 82 : LATIN CAPITAL LETTER A WITH BREVE Ä = 304 = 196 = C4 = U+00C4 = C3 84 : LATIN CAPITAL LETTER A WITH DIAERESIS Ĺ = 305 = 197 = C5 = U+0139 = C4 B9 : LATIN CAPITAL LETTER L WITH ACUTE Ć = 306 = 198 = C6 = U+0106 = C4 86 : LATIN CAPITAL LETTER C WITH ACUTE Ç = 307 = 199 = C7 = U+00C7 = C3 87 : LATIN CAPITAL LETTER C WITH CEDILLA Č = 310 = 200 = C8 = U+010C = C4 8C : LATIN CAPITAL LETTER C WITH CARON É = 311 = 201 = C9 = U+00C9 = C3 89 : LATIN CAPITAL LETTER E WITH ACUTE Ę = 312 = 202 = CA = U+0118 = C4 98 : LATIN CAPITAL LETTER E WITH OGONEK Ë = 313 = 203 = CB = U+00CB = C3 8B : LATIN CAPITAL LETTER E WITH DIAERESIS Ě = 314 = 204 = CC = U+011A = C4 9A : LATIN CAPITAL LETTER E WITH CARON Í = 315 = 205 = CD = U+00CD = C3 8D : LATIN CAPITAL LETTER I WITH ACUTE Î = 316 = 206 = CE = U+00CE = C3 8E : LATIN CAPITAL LETTER I WITH CIRCUMFLEX Ď = 317 = 207 = CF = U+010E = C4 8E : LATIN CAPITAL LETTER D WITH CARON Đ = 320 = 208 = D0 = U+0110 = C4 90 : LATIN CAPITAL LETTER D WITH STROKE Ń = 321 = 209 = D1 = U+0143 = C5 83 : LATIN CAPITAL LETTER N WITH ACUTE Ň = 322 = 210 = D2 = U+0147 = C5 87 : LATIN CAPITAL LETTER N WITH CARON Ó = 323 = 211 = D3 = U+00D3 = C3 93 : LATIN CAPITAL LETTER O WITH ACUTE Ô = 324 = 212 = D4 = U+00D4 = C3 94 : LATIN CAPITAL LETTER O WITH CIRCUMFLEX Ő = 325 = 213 = D5 = U+0150 = C5 90 : LATIN CAPITAL LETTER O WITH DOUBLE ACUTE Ö = 326 = 214 = D6 = U+00D6 = C3 96 : LATIN CAPITAL LETTER O WITH DIAERESIS
× = 327 = 215 = D7 = U+00D7 =    C3 97 : MULTIPLICATION SIGN
Ř = 330 = 216 = D8 = U+0158 = C5 98 : LATIN CAPITAL LETTER R WITH CARON Ů = 331 = 217 = D9 = U+016E = C5 AE : LATIN CAPITAL LETTER U WITH RING ABOVE Ú = 332 = 218 = DA = U+00DA = C3 9A : LATIN CAPITAL LETTER U WITH ACUTE Ű = 333 = 219 = DB = U+0170 = C5 B0 : LATIN CAPITAL LETTER U WITH DOUBLE ACUTE Ü = 334 = 220 = DC = U+00DC = C3 9C : LATIN CAPITAL LETTER U WITH DIAERESIS Ý = 335 = 221 = DD = U+00DD = C3 9D : LATIN CAPITAL LETTER Y WITH ACUTE Ţ = 336 = 222 = DE = U+0162 = C5 A2 : LATIN CAPITAL LETTER T WITH CEDILLA
ß = 337 = 223 = DF = U+00DF =    C3 9F : LATIN SMALL LETTER SHARP S
ŕ = 340 = 224 = E0 = U+0155 = C5 95 : LATIN SMALL LETTER R WITH ACUTE á = 341 = 225 = E1 = U+00E1 = C3 A1 : LATIN SMALL LETTER A WITH ACUTE â = 342 = 226 = E2 = U+00E2 = C3 A2 : LATIN SMALL LETTER A WITH CIRCUMFLEX ă = 343 = 227 = E3 = U+0103 = C4 83 : LATIN SMALL LETTER A WITH BREVE ä = 344 = 228 = E4 = U+00E4 = C3 A4 : LATIN SMALL LETTER A WITH DIAERESIS ĺ = 345 = 229 = E5 = U+013A = C4 BA : LATIN SMALL LETTER L WITH ACUTE ć = 346 = 230 = E6 = U+0107 = C4 87 : LATIN SMALL LETTER C WITH ACUTE ç = 347 = 231 = E7 = U+00E7 = C3 A7 : LATIN SMALL LETTER C WITH CEDILLA č = 350 = 232 = E8 = U+010D = C4 8D : LATIN SMALL LETTER C WITH CARON é = 351 = 233 = E9 = U+00E9 = C3 A9 : LATIN SMALL LETTER E WITH ACUTE ę = 352 = 234 = EA = U+0119 = C4 99 : LATIN SMALL LETTER E WITH OGONEK ë = 353 = 235 = EB = U+00EB = C3 AB : LATIN SMALL LETTER E WITH DIAERESIS ě = 354 = 236 = EC = U+011B = C4 9B : LATIN SMALL LETTER E WITH CARON í = 355 = 237 = ED = U+00ED = C3 AD : LATIN SMALL LETTER I WITH ACUTE î = 356 = 238 = EE = U+00EE = C3 AE : LATIN SMALL LETTER I WITH CIRCUMFLEX ď = 357 = 239 = EF = U+010F = C4 8F : LATIN SMALL LETTER D WITH CARON đ = 360 = 240 = F0 = U+0111 = C4 91 : LATIN SMALL LETTER D WITH STROKE ń = 361 = 241 = F1 = U+0144 = C5 84 : LATIN SMALL LETTER N WITH ACUTE ň = 362 = 242 = F2 = U+0148 = C5 88 : LATIN SMALL LETTER N WITH CARON ó = 363 = 243 = F3 = U+00F3 = C3 B3 : LATIN SMALL LETTER O WITH ACUTE ô = 364 = 244 = F4 = U+00F4 = C3 B4 : LATIN SMALL LETTER O WITH CIRCUMFLEX ő = 365 = 245 = F5 = U+0151 = C5 91 : LATIN SMALL LETTER O WITH DOUBLE ACUTE ö = 366 = 246 = F6 = U+00F6 = C3 B6 : LATIN SMALL LETTER O WITH DIAERESIS
÷ = 367 = 247 = F7 = U+00F7 =    C3 B7 : DIVISION SIGN
ř = 370 = 248 = F8 = U+0159 = C5 99 : LATIN SMALL LETTER R WITH CARON ů = 371 = 249 = F9 = U+016F = C5 AF : LATIN SMALL LETTER U WITH RING ABOVE ú = 372 = 250 = FA = U+00FA = C3 BA : LATIN SMALL LETTER U WITH ACUTE ű = 373 = 251 = FB = U+0171 = C5 B1 : LATIN SMALL LETTER U WITH DOUBLE ACUTE ü = 374 = 252 = FC = U+00FC = C3 BC : LATIN SMALL LETTER U WITH DIAERESIS ý = 375 = 253 = FD = U+00FD = C3 BD : LATIN SMALL LETTER Y WITH ACUTE ţ = 376 = 254 = FE = U+0163 = C5 A3 : LATIN SMALL LETTER T WITH CEDILLA
˙ = 377 = 255 = FF = U+02D9 =    CB 99 : DOT ABOVE

and it in both Emacsen. Run them at the same time and compare mode- lines and other details (encodings, fonts used: C-u C-x = on a glyph, ...). In your user init file you can prepare sections for emacs-major-version or window-system variables.


If you want to have some fun, then change this files first line from iso-8859-2 to, let's say, iso-8859-16 *outside* of GNU Emacs, by for example, cat <file> | sed -e s/iso-8859-2/iso-8859-16/ > <other file>. This will only change exactly *one* byte (the 2 will become 16), but the first column will be totally different in GNU Emacs – and the descriptional text will become untrue for most characters. Just to learn that there is contents somewhere below and you only get some *presentation* of this contents. (As in real life you can't see the reality outside your head.)

--
Greetings

  Pete

Time flies like an error -- but fruit flies like a banana!
                             (almost Groucho Marx)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]