[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Smart quotes via finite state machine
From: |
onf |
Subject: |
Re: Smart quotes via finite state machine |
Date: |
Tue, 24 Sep 2024 00:37:42 +0200 |
On Mon Sep 23, 2024 at 4:54 PM CEST, Douglas McIlroy wrote:
> Because the state diagram is complex, it may be hard for a user to see
> what's up when it does make a mistake.
>
> I infer that open-quot-space has two purposes: (1) to allow a single space
> to be quoted, as might happen in a language manual, and (2) to cater for
> the inch sign attached to a number. Back-transitions to this state seem to
> let in some strange behaviors.
Thank you for your input, Doug.
The rationale behind splitting open_quot into open_quot_no_space and
open_quot_space was that it's hard to decide in cases like this:
He said "It should be no more than 17" wide"
which would indeed require an AI. So instead the state machine
throws an error, requiring the user to decide himself, hence
the error_ambiguous state.[1]
The idea was that such ambiguity happens whenever a quotation is open
and a digit occurs before a quotation mark, UNLESS there haven't been
any spaces yet:
It simply said "20" without any hints as to what that might mean.
The effort to account for this edge case and to not fail silently makes
the diagram much more complex than would otherwise be required. However,
I notice now that the condition is in fact incorrect:
The sign said "20" max" without any hints as to what it might mean.
In fact, I haven't actually realized the possibility of a single space
within quotation marks either, and so if you study the diagram more
closely, you will notice that the input:
Something " "
would make the state machine go like so:
Something : no_quot
space : maybe_open1
" : maybe_open2
space : no_quot
" : no_quot
which is clearly wrong. I also notice that it doesn't work correctly
with empty quotes ("") or repeated quotes ("""Something""").
As a result I have greatly simplified the state machine so that it
embodies only the following rules:
(1) a quotation mark must be preceded by whitespace or be the first
character on a line to be considered the beginning of a quotation
(2) it is an error to follow an opening quotation mark by another
(3) it is an error if the closing quotation mark follows a digit
This should hopefully ensure that on any sane text it either produces
correct results, or reports an error which the user will be required
to fix. (I initially intended to include some sort of error recovery
so that it would produce a list of errors rather than just the first,
but it seems to me that simply running:
$ grep -Ev'^.' FILE.tr | grep -En '[0-9]"[^]]'
is more likely to help find the issue(s).)
A diagram of the reworked state machine is attached.
> A word of caution. AI is a dangerous tool. "Smart" features abound in
>
> Microsoft Word and its imitators. They are neat when they work, but can be
> extremely frustrating when they go wrong. Whenever one bites, you have to
> cook up a special trick--or a whole new layout--to get past it.
>
> A common example is automatic capitalization of the first letter of a
> "paragraph" (i.e. text that follows a newline character). Mere backspacing
> and retyping to get a lowercase letter doesn't solve the problem.
Yeah, I have lots of experience with such "smart features"... in fact,
the common attitude of such programs that they are smarter than me is
part of the reason why I switched to troff in the first place.
A register to toggle this feature on and off (set to off by default)
would solve this, in my opinion.
~ onf
[1] This would be fixed by replacing the quotation mark which isn't in
fact a quotation mark by the correct (groff or Unicode) character.
quot.png
Description: PNG image