bug-auctex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-AUCTeX] 11.84; Locating image position sometimes fails for shift-ji


From: Ikumi Keita
Subject: [Bug-AUCTeX] 11.84; Locating image position sometimes fails for shift-jis encoded document.
Date: Wed, 07 Mar 2007 23:29:29 +0900

Remember to cover the basics.  Including a minimal LaTeX example
file exhibiting the problem might help.

When editing Japanese LaTeX document encoded in shift-jis, preview-latex
sometimes fails to place images on the right place (wrong place or no
image at all).  Such cases are often accompanied with error messages
like:

    error in process sentinel: Invalid regexp: "Unmatched [ or [^"

and

    error in process sentinel: Invalid regexp: "Trailing backslash"

etc.

How to reproduce:
Standard (La)TeX cannot deal with Japanese document encoded in
shift-jis.  Japanese (La)TeX variants are necessary to investigate the
case, so I don't give an example here.  Instead, I will explain the
detail below.

Background:
I start with a short summary about the shift-jis encoding (SJIS for
short).  SJIS is one of the major encodings for Japanese text.
Basically, it represents one Japanese character by two bytes.  Examples
of such two-byte sequence are, in hexadecimal form:

8E 82

and

81 5B

While the first byte of the sequence is always 8-bit data (MSB on), the
second byte is not necessary so.  In the above two examples, the second
byte of the first example (82) is 8-bit, but the second one (5B) is
7-bit (MSB off).  For historical reasons, SJIS is the standard encoding
in the Japanese Windows and Macintosh.

Problem:
The second byte of SJIS text causes a problem in the process of encoding
-> regexp-quote -> decoding.  The relavant part of preview.el is in the
function preview-error-quote:
----------------------------------------------------------------------
2603    (defun preview-error-quote (string run-coding-system)
2604      "Turn STRING with potential ^^ sequences into a regexp.
2605    To preserve sanity, additional ^ prefixes are matched literally,
2606    so the character represented by ^^^ preceding extended characters
2607    will not get matched, usually."
2608      (let (output case-fold-search)
2609        (when (featurep 'mule)
2610          (setq string (encode-coding-string string run-coding-system)))
2611        (while (string-match 
"\\^\\{2,\\}\\(\\(address@hidden)\\|[8-9a-f][0-9a-f]\\)"
2612                             string)
2613          (setq output
2614                (concat output
2615                        (regexp-quote (substring string
2616                                                 0
2617                                                 (- (match-beginning 1) 2)))
[...]
2631        (setq output (concat output (regexp-quote string)))
2632        (if (featurep 'mule)
2633            (decode-coding-string output
2634                                  (or (and (boundp 
'TeX-japanese-process-output-coding-system)
2635                                           
TeX-japanese-process-output-coding-system)
2636                                      buffer-file-coding-system))
2637          output)))
----------------------------------------------------------------------
On Japanese Windows, this function is called with both
run-coding-system and TeX-japanese-process-output-coding-system bound to
'shift_jis-dos, according to the setting in tex-jp.el.

On lines 2609-2610, the multibyte Japanese string is turned into byte
sequence of SJIS encoding.  On lines 2615-1617 and 2631, this byte
sequence is transformed through regexp-quote.  On lines 2633-2636, the
byte squence is turned back into multibyte string, assuming the sequence
is encoded in SJIS.

However, the second byte of SJIS encoding is sometimes 7-bit data and,
quite unfortunately, sometimes happens to be a meta character of regexp.
In the above example, the second byte of the sequence 81 5B is actually
the char `['.  Thus, after the regexp-quote operation, the sequence no
longer turns back into the expected multibyte string, leaving `['
alone. Later, this causes an error 'Invalid regexp: "Unmatched [ or
[^"', not showing the corresponding image in the document buffer.

The second byte of SJIS of another character can be the backslash `\',
which sometimes leads to 'Invalid regexp: "Trailing backslash"'.  In
another case it is `^' and the image is placed at the beginning of the
line, not at the right place in the document buffer.

The following example illustrates what is going on:
(let* ((s1 (char-to-string (make-char 'japanese-jisx0208 37 63)))
      ;; s1 is multybyte Japanese string.
      ;; Encode s1 with SJIS.
       (s2 (encode-coding-string s1 'shift_jis)))
  ;; At this point s2 is equal to "\201[", being byte sequence of 81 5B.
  (setq s2 (regexp-quote s2))
  ;; Now s2 is "\201\\[".
  (setq s2 (decode-coding-string s2 'shift_jis))
  ;; Then decode back assuming SJIS encoding.
  (string-equal s1 s2))
  => nil ;; no longer goes back to the original string s1.

Summary:
We cannot assume that multibyte characters are always encoded with 8-bit
byte sequences only.  To cope with the encodings like SJIS, which
contains 7-bit bytes, regexp-quote must not be applied to encoded
strings.  It should operate on decoded strings only.

I confirmed that after removing the encoding and decoding operation in
preview-error-quote, the error does not occur and the images are
displayed in the right place for my test case (with the help of
Japanese-capable dvips variant and ghostscript tuned to handle Japanese
postscript files correctly).  Of course I understand such a rough
modification is not acceptable.  I'm just illustrating one aspect of the
problem.

N.B. In the following log, Japanese texts are replaced with `*'s.

Emacs  : GNU Emacs 21.4.1 (i386-mingw-nt5.0.2195)
 of 2005-08-28 on CUBE
Package: 11.84

Run buffer contents:

Running `Preview-LaTeX' on `bbb' with ``platex  
"\nonstopmode\nofiles\PassOptionsToPackage{active,tightpage,auctex}{preview}\AtBeginDocument{\ifx\ifPreview\undefined\RequirePackage[displaymath,floats,graphics,textmath,sections,footnotes]{preview}[2004/11/05]\fi}"
 "\input" "bbb.tex"''
This is pTeX, Version 3.141592-p3.1.9 (sjis) (Web2C 7.5.5)
pLaTeX2e <2005/01/04>+0 (based on LaTeX2e <2003/12/01> patch level 0)
Babel <v3.8g> and hyphenation patterns for english, usenglishmax, ukenglish, ba
sque, bulgarian, coptic, welsh, czech, slovak, german, ngerman, danish, spanish
, catalan, estonian, finnish, french, irish, polygreek, monogreek, ancientgreek
, croatian, hungarian, interlingua, ibycus, bahasa, icelandic, italian, latin, 
mongolian, dutch, norsk, polish, portuguese, pinyin, romanian, russian, samin, 
slovene, usorbian, serbian, swedish, turkish, ukrainian, dumylang, nohyphenatio
n, loaded.

No auxiliary output files.

(./bbb.tex (c:/usr/share/texmf/ptex/platex/base/jarticle.cls
Document Class: jarticle 2002/04/09 v1.4 Standard pLaTeX class
(c:/usr/share/texmf/ptex/platex/base/jsize10.clo))
No file bbb.aux.
(c:/usr/share/texmf/tex/latex/preview/preview.sty
(c:/usr/share/texmf/tex/latex/preview/prtightpage.def)
(c:/usr/share/texmf/tex/latex/preview/prauctex.def
No auxiliary output files.


(c:/usr/share/texmf/tex/latex/preview/prauctex.cfg))
(c:/usr/share/texmf/tex/latex/preview/prfootnotes.def)
Preview: Fontsize 10pt
)
! Preview: Snippet 1 started.
<-><->
      
l.3 ************************ \(
                               l\)
Preview: Tightpage -32891 -32891 32891 32891
! Preview: Snippet 1 ended.(455111+0x208442).
<-><->
      
l.3 ************************ \(l\)
                                  
[1] )
(see the transcript file for additional information)
Output written on bbb.dvi (1 page, 1584 bytes).
Transcript written on bbb.log.

Preview-LaTeX exited as expected with code 1 at Wed Mar 07 17:36:09
Running `Preview-DviPS' with ``dvipsk -Pdl "bbb.dvi" -o 
"bbb.prv/tmp440Xaa"/preview.ps''

Preview-DviPS unknown at Wed Mar 07 17:36:09
LaTeX: Invalid regexp: "Unmatched [ or [^"


current state:
==============

Output from running `GSWIN32C.EXE -h':
GNU Ghostscript 7.07 (2003-05-17)
Copyright (C) 2003 artofcode LLC, Benicia, CA.  All rights reserved.
Usage: gs [switches] [file1.ps file2.ps ...]
Most frequently used switches: (you can use # in place of =)
 -dNOPAUSE           no pause after page   | -q       `quiet', fewer messages
 -g<width>x<height>  page size in pixels   | -r<res>  pixels/inch resolution
 -sDEVICE=<devname>  select device         | -dBATCH  exit after last file
 -sOutputFile=<file> select output file: - for stdout, |command for pipe,
                                         embed %d or %ld for page #
Input formats: PostScript PostScriptLevel1 PostScriptLevel2 PDF
Default output device: display
Available devices:
   bbox bit bitcmyk bitrgb bj10e bj200 bjc600 bjc800 bmp16 bmp16m bmp256
   bmpgray bmpmono cdeskjet cdj550 cdjcolor cdjmono declj250 deskjet display
   djet500 djet500c eps9high eps9mid epson epsonc epswrite ibmpro ijs
   jetp3852 jpeg jpeggray laserjet lbp8 lj250 ljet2p ljet3 ljet3d ljet4
   ljet4d ljetplus m8510 mswindll mswinpr2 necp6 nullpage pbm pbmraw pcx16
   pcx24b pcx256 pcxcmyk pcxgray pcxmono pdfwrite pgm pgmraw pgnm pgnmraw pj
   pjxl pjxl300 png16 png16m png256 pngalpha pnggray pngmono pnm pnmraw ppm
   ppmraw psmono pswrite pxlcolor pxlmono r4081 st800 stcolor t4693d2
   t4693d4 t4693d8 tek4696 tiff12nc tiff24nc tiffcrle tiffg3 tiffg32d tiffg4
   tifflzw tiffpack uniprint
Search path:
   . ; C:\gs\gs7.07\lib ; C:\gs\gs7.07\kanji ; C:\gs\fonts ;
   c:/gs/gs7.07/lib ; c:/gs/gs7.07/kanji ; c:/gs/fonts ; c:/winnt/fonts ;
   c:/usr/sysfonts ; c:/windows/fonts ; c:/winnt35/fonts
For more information, see c:/gs/gs7.07/doc/Use.htm.
Report bugs to address@hidden, using the form in Bug-form.htm.

(setq
 AUC-TeX-version "11.84"
 LaTeX-command-style '(("^j-\\(article\\|report\\|book\\)$" "%(PDF)jlatex 
%S%(PDFout)")
                       ("^[jt]s?\\(article\\|report\\|book\\)$"
                        "%(PDF)platex %S%(PDFout)")
                       ("" "%(PDF)%(latex) %S%(PDFout)"))
 image-types '(YUV YCbCrA YCbCr Y XWD XV xpm XCF XC xbm X WPG WMZ WMFWIN32 WMF 
WBMP VST
               VIFF VID VICAR VDA UYVY UIL TXT TTF TTC TIM TILE tiff TIF TGA 
TEXT SVGZ
               SVG SUN STEGANO SHTML SGI SFW SCT SCR RLE RLA RGBO RGBA RGB RAS 
R PWP
               PTIF PSD PS3 PS2 postscript PREVIEW PPM PNM PNG8 PNG32 PNG24 png 
PLASMA
               PJPEG PIX PICT PICON PGX PGM PFB PFA PDF PDB PCX PCT PCL PCDS 
PCD pbm
               PATTERN PALM PAL P7 OTB O NULL MVG MTV MSL MPG MPEG MPC MONO MNG 
MIFF
               MATTE MAT MAP M2V M LABEL K JPX JPG jpeg JPC JP2 JNG JBIG JBG 
INFO ICON
               ICO ICB HTML HTM HISTOGRAM HDF GRAY GRADIENT GIF87 gif G3 G 
FRACTAL FPX
               FITS FAX EPT3 EPT2 EPT EPSI EPSF EPS3 EPS2 EPS EPI EPDF EMF DPX 
DPS DOT
               DNG DCX DCM CUT CUR CMYKA CMYK CLIPBOARD CLIP CIP CIN CAPTION 
CACHE C
               BMP3 BMP2 BMP BIE B AVS AVI ART A bmp)
 preview-image-type 'png
 preview-image-creators '((dvipng (open preview-gs-open 
preview-dvipng-process-setup)
                           (place preview-gs-place) (close 
preview-dvipng-close))
                          (png (open preview-gs-open) (place preview-gs-place)
                           (close preview-gs-close))
                          (jpeg (open preview-gs-open) (place preview-gs-place)
                           (close preview-gs-close))
                          (pnm (open preview-gs-open) (place preview-gs-place)
                           (close preview-gs-close))
                          (tiff (open preview-gs-open) (place preview-gs-place)
                           (close preview-gs-close))
                          )
 preview-dvipng-image-type 'png
 preview-dvipng-command "dvipng -picky -noghostscript %d -o 
\"%m/prev%%03d.png\""
 preview-pdf2dsc-command "pdf2dsc %s.pdf %m/preview.dsc"
 preview-gs-command "GSWIN32C.EXE"
 preview-gs-options '("-q" "-dSAFER" "-dNOPAUSE" "-DNOPLATFONTS" "-dPrinted"
                      "-dTextAlphaBits=4" "-dGraphicsAlphaBits=4" "-dWINKANJI")
 preview-gs-image-type-alist '((png png "-sDEVICE=png16m")
                               (dvipng png "-sDEVICE=png16m")
                               (jpeg jpeg "-sDEVICE=jpeg") (pnm pbm 
"-sDEVICE=pnmraw")
                               (tiff tiff "-sDEVICE=tiff12nc"))
 preview-fast-conversion t
 preview-prefer-TeX-bb nil
 preview-dvips-command "dvipsk -Pdl -i -E %d -o %m/preview.000"
 preview-fast-dvips-command "dvipsk -Pdl %d -o %m/preview.ps"
 preview-scale-function 'preview-scale-from-face
 preview-LaTeX-command '("%`%l \"\\nonstopmode\\nofiles\\PassOptionsToPackage{"
                         ("," . preview-required-option-list)
                         
"}{preview}\\AtBeginDocument{\\ifx\\ifPreview\\undefined"
                         preview-default-preamble "\\fi}\"%' %t")
 preview-required-option-list '("active" "tightpage" "auctex"
                                (preview-preserve-counters "counters"))
 preview-preserve-counters nil
 preview-default-option-list '("displaymath" "floats" "graphics" "textmath" 
"sections"
                               "footnotes")
 preview-default-preamble '("\\RequirePackage[" ("," . 
preview-default-option-list)
                            "]{preview}[2004/11/05]")
 preview-LaTeX-command-replacements nil
 preview-dump-replacements '(preview-LaTeX-command-replacements
                             ("\\`\\([^ ]+\\)\\(\\( +-\\([^ 
\\\\\"]\\|\\\\\\.\\|\"[^\"]*\"\\)*\\)*\\)\\(.*\\)\\'" "\\1 -ini 
-interaction=nonstopmode \"&\\1\" " preview-format-name ".ini \\5")
                             )
 preview-undump-replacements '(("\\`\\([^ ]+\\) .*? \"\\\\input\" \\(.*\\)\\'"
                                "\\1 -interaction=nonstopmode \"&" 
preview-format-name
                                "\" \\2")
                               )
 preview-auto-cache-preamble 'ask
 preview-TeX-style-dir nil
 )




reply via email to

[Prev in Thread] Current Thread [Next in Thread]