pan-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-devel] Re: I see problem with Pan’s “http url detector”


From: Duncan
Subject: [Pan-devel] Re: I see problem with Pan’s “http url detector”
Date: Mon, 7 Feb 2011 23:06:24 +0000 (UTC)
User-agent: Pan/0.133 (House of Butterflies; GIT 25ed40d branch-testing)

SciFi posted on Mon, 07 Feb 2011 09:24:23 +0000 as excerpted:

> Hi,
> 
> We have been noticing that Pan’s “http detector” has a slight problem.
> 
> When a URL begins with the less–than sign ‘<’, but does *not* end with a
> greater–than sign ‘>’, the entire balance of the message is being
> hilited as if it belongs to the URL — including multiple lines and
> paragraphs etc.
> 
> This works: <http://www.url.com/>
> None of this very text is part of the URL.
> 
> But this causes the bug: <http://www.url.com/
> Note there is no ending greater–than sign here
> and all this text is being “sucked into” the Pan URL detector
> up until we put the actual ‘>’ there.
> See?

Yes.

> I think what we need to do is automatically “end” the URL at
> the same place(s) that a *non*–<> URL string is doing,
> i.e. end it at any white-space, CR, LF, etc, even if the
> balance of the same line contains some “junk” anyway,
> but surely not including the entire multi-line paragraph etc.

Not quite.  The whole idea of enclosing URLS in <> is to protect them from 
line-wrap, otherwise those more than 70-whatever chars long cause various 
wrapping issues depending on both the sending and viewing clients and 
whatever wrapping rules they use.

Of course, lines by Internet message (mail/news) spec are terminated by 
the CRLF pair (not just one or the other and in that order), so at least 
in theory (I'm not reading pan's code here to see the actual case), it 
should be reasonably easy to detect single CRLF pairs and delete them from 
the URL string, continuing it past them, while treating other whitespace 
including double occurrences (CRLFCRLF , blank lines) as URL terminators.  
(Of course, non-paired CR or LF or reverse-paired LFCRs... aren't RFC-spec 
line terminators, but shouldn't occur in normal text, URL or not, MIME or 
UUE, IDR what yEnc's 8-bit-minus encoding does.  I don't believe the RFCs 
allow them at all and don't know how pan treats them now, but perhaps that 
should remain the same?)

Meanwhile, something else URL related that /used/ to annoy me, tho I've 
not noticed it recently so maybe it's fixed (?), is unspaced commas or the 
like, terminating a URL.  Here's testing it:

http://example.com, Does the URL include the comma?

http://example.com. What about the terminating dot?

http://example.com? Question mark?

"http://example.com"; Double-quote?

'http://example.com' Single-quote?

http://example.com: Colon?

Those of us using pan to follow this list, thru gmane or whatever, should 
get pan's behavior with the above tested directly.  I guess I'll post a 
followup with the results for anyone using a standard mail client.

Of course it's pointless to test the <> versions ATM since we know those 
don't terminate until a '>', but by convention, they should be treated as 
part of the URL if included within <>.  Really, spaces should be as well, 
with them recoded in %escaping, but as you've observed, that's probably 
wise to ignore, due to broken clients or manual errors omitting the url-
closing '>'.

> I also think we need to be able to “turn off” this kind of “detector”
> logic, similar to the smileys & bold/italics/etc
> options we already have in Pan.  I usually leave–off these
> other options, but Pan still shows me clickable URLs anyway.

That makes sense.  However, I'll insist (to the extent that I can as a non-
coder but senior-status list participant, while recognizing that he who 
codes, ultimately decides) that it should be a /separate/ option, as I 
prefer seeing the original/literal "smiley-code" myself, and thus keep 
that off, but certainly take advantage of the URL detection.  (It's not so 
much the active-URLs I like, tho I use them, as in any case I could simply 
select the URL and KDE's clip-helper, klipper, would popup, giving me 
multiple possible action choices not just one, but the color-coding, that 
I'd miss.  Once you get used to colored quote levels, URLs, sigs, etc, you 
don't want to do without! =:^)

> Can anyone else see what I am seeing, here?

Yes.  It's an over-looked corner-case that I'd call a bug, that needs 
fixed.  I guess we'll see about the punctuation cases above, after I post 
this...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]