[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Another issue with thingatpt
From: |
Andreas Roehler |
Subject: |
Re: Another issue with thingatpt |
Date: |
Tue, 02 Jan 2007 14:34:46 +0100 |
User-agent: |
Thunderbird 1.5.0.4 (X11/20060516) |
Bob Rogers schrieb:
From: Andreas Roehler <address@hidden>
Date: Sun, 31 Dec 2006 10:25:35 +0100
> Both interfaces (ffap and thing-at-point) are already customizable,
> though in different ways.
There is no `defcustom'-form in thingatpt.el,
it's done mostly with `defvar'. Wouldn't conceive that
as customizable.
Not in the sense of defcustom, no. But someone who can't "customize" it
themselves via setq is probably not going to be able to change these
hairy regexps and/or char-classes without shooting themselves in the
foot. It's not just a matter of understanding Emacs regexps, but
understanding how thing-at-point uses them.
Probably you are right.
In any case, it seems to me that users shouldn't need to change the
regexp proper, since that is defined by RFC3986, just the set of
punctuation characters to drop at the end.
Maybe I miss something, but AFAIS the regexp in question is not derived
in a strict sense. I give the description from RFC
here:
;;;;;;;;;;;;;;
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
...
Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
;;;;;;;;;;;;;;;
Thats basically what I detect concerning the matter there.
The only thing that needs to
be customized is just the "lose the punctuation" heuristic, IMHO. And
the definition of "punctuation" should be enlarged so that it addresses
Slawomir's issue with parens, which are not even allowed internally.
The problem mentioned originally however shouldn't occur, as
,----
| (defvar thing-at-point-url-path-regexp
| "[^]\t\n \"'()<>[^`{}]*[^]\t\n \"'()<>[^`{}.,;]+"
| "A regular expression probably matching the host and filename or
e-mail part of a URL.")
`----
includes that char. The error must reside elsewhere.
Regards,
Andreas Roehler
It does include a ";" in the second character class, but both are
inverted. The second set is the same as the first set with the addition
of ".,;", which is why it refuses to match any of these characters at
the end of the URL. This would be easier to see if the regexp were
written this way:
(defvar thing-at-point-url-path-regexp
(concat "[^]\t\n \"'()<>[^`{}]*"
"[^]\t\n \"'()<>[^`{}.,;]+")
"A regular expression probably matching the host and filename or e-mail
part of a URL.")
-- Bob
Now I see it, thanks a lot.
BTW: What about to drop the `;' from the regexp?
Maybe together with the comma-sign, as this char is mentioned too as a
sub-delimiter.
Other problems:
- Char ' (39, #o47, #x27) now seems excluded, whereas RFC mentiones it as a
sub-delimiter too.
- (defvar thing-at-point-short-url-regexp
(concat "[-A-Za-z0-9.]+" thing-at-point-url-path-regexp)
misses the underscore in its bracket. (unreserved after RFC)
Andreas
- Re: Another issue with thingatpt,
Andreas Roehler <=