[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nano-devel] searching for certain UTF8 characters also finds ghosts
From: |
Benno Schulenberg |
Subject: |
Re: [Nano-devel] searching for certain UTF8 characters also finds ghosts |
Date: |
Sat, 21 Mar 2015 21:45:40 +0100 |
On Sat, Mar 21, 2015, at 20:54, Mark Majeres wrote:
> >> > Strangely this happens only for characters in the range
> >> > U+00A0 to U+00BF, which baffles me. (I've tested it with
> >> > ?, ?, ?, ?, ? -- they all get doubly found -- but also
> >> > with ?, ?, ?, ?, ?, ?, and ? -- they all are found once.)
> >
> > Hmm. Your mail is UTF8-encoded, but these characters haven't
> > come across properly. Did you see them okay?
>
> They came in the first time just like they look now :)
How strange. You are sending now via gmail, but maybe you use
a non-utf8 locale on your machine?
> Yes, shifting one multi-byte char to the left is a little bit of work.
> It's not very pretty.
Well, I have a patch for that. Attached. I can't notice any
performance improvement anywhere, though. It could still
be made quite a bit faster by looking at the bytes themselves,
but that requires some thinking and won't look pretty.
> IMO, strstrwrapper() should return the position to the left of
> haystack if the search is not found. I think the check for
> openfile->current_x == 0 could then be tossed out of your patch.
That's what I tried at first. But move_mbleft() doesn't like to step
over the left edge of fileptr->data: the assert in that function then
fails.
Benno
--
http://www.fastmail.com - Accessible with your email software
or over the web
fasterleft.patch
Description: Text Data