groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] src/: ceil_prime(): Add function to get the lowest prime


From: G. Branden Robinson
Subject: Re: [PATCH v2] src/: ceil_prime(): Add function to get the lowest prime not smaller than n
Date: Wed, 13 Mar 2024 06:52:10 -0500

Hi Alex,

At 2024-03-13T12:01:32+0100, Alejandro Colomar wrote:
> I didn't write an exploit, as that would be a little bit more work
> (including learning what is indxbib at all).

Go ahead!  This, I want to see!  3:)

> But by reading the code, I'd say yes.
> 
> $ git show master:src/utils/indxbib/indxbib.cpp \
> | grepc main \
> | grep -C5 is_prime;
>       int requested_hash_table_size;
>       check_integer_arg('h', optarg, 1, &requested_hash_table_size);
>       hash_table_size = requested_hash_table_size;
>       if ((hash_table_size > 2) && (hash_table_size % 2) == 0)
>               hash_table_size++;
>       while (!is_prime(hash_table_size))
>         hash_table_size += 2;
>       if (hash_table_size != requested_hash_table_size)
>         warning("requested hash table size %1 is not prime: using %2"
>                 " instead", optarg, hash_table_size);
>       }
> 
> Let's imagine 'requested_hash_table_size' is 1.

You missed the change I made, and mentioned at the end of
<https://lists.gnu.org/archive/html/groff/2024-03/msg00060.html>.

> You had is_prime() with the same exact number of call sites: 2.

Well, "you" is kind of "the vague collective of groff developers" here.

> If you want to misbehave for an input of 0 from a library call, by
> adding an assertion, well that's not too bad, since you're the only
> user.  In fact, is_prime() has such an assertion, even being a library
> call.

I do.  I love assertions.  I believe that misbehaving programmers who
slouch their way through software engineering without checking their
assumptions should be punished at every occasion.

Because eventually, those assumptions _will_ be tested in the field.

Possibly in flight or while reactor core temperature is rising rapidly.

> *Mis*quoting someone (was it Torvalds who wrote the Linux coding
> style?  I guess he was):
> 
>       First off, I'd suggest printing out a copy of the _C++_
>       standard, and NOT read it.  Burn it, it's a great symbolic
>       gesture.

Linus's fans and C++ deserve each other.  :-P

> I don't like idiomatic C++.  In fact, the only thing I like from C++
> is that I can write idiomatic C in it (or mostly; some things don't
> compile, like using VLA notation in function parameters that are
> pointers).

C++ does have a problem in that its idioms change--with greater velocity
than many software practitioners would like.

That doesn't change the fact that groff is mostly implemented in it, and
the C++98 standardized version of the language is more widely understood
than the circa-1990 Annotated Reference Manual version of it.

The advantage to switching to STL containers for these hash tables would
be to get groff out of the business of maintaining its own
implementations in favor of those that have been more rigorously tested
and have a chance of being considerably more performant (because
optimization-happy lunatics like those in Boost have been chasing the
bleeding edge for decades, whereas groff's containers have only
stability to recommend them).

Obviously I'm somewhat lukewarm on that point.  The real advantage to me
is the prospect of deleting a lot of code that doesn't directly serve a
typesetting objective, making swaths of code Someone Else's Problem.

C has gone through a similar thing.  Slowly, slowly, its practitioners
have decided that, huh, maybe every project shouldn't carry around its
own implementations of elementary data structures.  Hence BSD's
sys/queue.h and GNU's glib.

It seems that, eventually, kids in industry got tired of doing on their
employer's dime the Data Structures and Algorithms homework assignments
they plagiarized in their sophomore year.  Or their managers finally
cottoned on to what was happening and learned what "software modularity"
was, apart from a buzzword.

(I remember those days.  What actually happened is that they and their
managers took turns counting their unvested stock options and taking
long hits from a huge crack pipe labelled "J2EE".)

Anyway, my point is that I'm not willing to do violence to groff's
codebase as a vehicle for expressing my dissatisfaction with C++,[1] and
I don't think anyone serving as its maintainer or lead developer should
allow that, either.  People are still getting work done with groff.

At 2024-03-13T12:05:37+0100, Alejandro Colomar wrote:
> On Wed, Mar 13, 2024 at 05:46:38AM -0500, G. Branden Robinson wrote:
> > Forgot to attach my working copy, of course.
> 
> You could have attached/inlined a diff; it would be smaller.  :)

Sorry.  I was in haste.

> This is a breaking change.

I disagree.  It's tighter validation, now rejecting an input value that
it should have been rejecting all along.

> My suggestion keeps accepting 1, with a warning --as before--, and
> translates it to 2 (instead of 3, which was being done previously).
> 
> So, I don't like this change, I think, unless you *want* to break 1,
> which I'm neutral to, since I don't know what this program does at
> all.

A hash table (or map) of size 1 is not a hash table, it's a list.

Creating a hashed index of bibliographic references that doesn't speed
lookups by being O(n) instead of O(1) in a linear search, but is instead
O(n) itself, complete with the overhead of additional storage (the
database file on disk) and I/O operations, is not an optimization, but a
pessimization.  Recalling indxbib's purpose, one might as well let
refer(1) read the text files in the first place, instead of diverting
its attention to binary database files that will be unhelpful.

Maurice J. Bach's _The Design of the Unix Operating System_ (1986), the
first book-length exposition of the Unix kernel that AT&T tolerated
sale of on the open market, contains a practical-minded exploration of
this subject in its discussion of the System V kernel's "buffer heads".

That's where I first learned about hash maps, because I tend to come at
things the wrong way around.

The BSD and Linux kernels do similar things.

Regards,
Branden

[1] Outside of source code comments, where it's fine to be opinionated.
    I don't regard those as "violence to the code base"; the code will
    work or it won't, irrespective of the intemperance of code comments.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]