[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bytewise u??_conv_from_encoding
From: |
Marc Nieper-Wißkirchen |
Subject: |
Re: Bytewise u??_conv_from_encoding |
Date: |
Thu, 6 Jan 2022 10:30:36 +0100 |
Am Mi., 5. Jan. 2022 um 21:59 Uhr schrieb Bruno Haible <bruno@clisp.org>:
>
> Hello Marc,
>
[...]
> > If I understand your classification correctly, I meant something more
> > like (E) than (D), I think. As an interface, I would propose would be
> > something along the following lines:
> >
> > decoder_t d = decoder_create (iconveh_t *cd);
> > switch (decoder_push (d, byte))
> > {
> > case DECODER_BYTE_READ:
> > char *res = decoder_result (d);
> > size_t len = decoder_length (d);
> > ...
>
> What does the programmer do here with res and len? This is where things
> get complex.
RES will point to an array of bytes of length LEN that holds a decoded
string (of multibyte characters). The consumer could then process the
result and call `decoder_push` again when the previous result has been
fully processed.
>
> > case DECODER_EOF:
> > ...
> > case DECODER_INCOMPLETE:
> > ...
> > case DECODER_ERROR:
> > ...
> > }
> > ...
> > decoder_destroy (d);
>
> What you describe here is (D), in my view.
>
> (E) would look like this:
>
> extern decoder_t create_decoder_context (void);
> extern void push_bytes_into_decoder (const char *p, size_t n, decoder_t);
> extern void free_decoder_context (decoder_t);
Isn't this my solution from above? Only that in my interface,
`decoder_push` only takes a single byte (but that could be changed, of
course). How would one get the decoded result in your API?
> > > (B) means to use a different programming language. I can't recommend C++
> > > [1].
> >
> > The main problem I see with C++'s coroutines is that they are
> > stackless coroutines; their expressiveness is tiny compared to
> > languages with full coroutine support, to say nothing of programming
> > languages like Scheme with its first-class continuations.
>
> It doesn't surprise me. 'constexpr', another new addition to C++, similarly
> does only a fraction of what would be useful.
As a side note, constexpr may also become part of C23, but I haven't
looked into the details. Could you briefly say for someone who doesn't
know much about constexpr where the shortcomings of C++'s constexpr
are?
> > > (C) is possible, but complex. See e.g. gnulib's pipe-filter-ii.c or
> > > pipe-filter-gi.c. Generally, threads are overkill when all you need are
> > > coroutines.
> >
> > I agree. Unfortunately, Posix's response to dropping makecontext and
> > friends seems to be to use threads. It would be great if C had a
> > lightweight context-swapping mechanism.
>
> Maybe. I think setcontext() has a severe problem; see
> <https://www.gnu.org/software/gnulib/manual/html_node/setcontext.html>.
makecontext also has the problem that it only takes integer variables
as arguments. This wasn't a problem on platforms where the size of an
int is the size of a pointer.
Another problem is that the context switching operations are not
necessarily faster than the threading API because they also have to go
through kernel code as the signal mask is part of the context. This
makes them unusable for most coroutines when they are supposed to be
fast.
That's why I have been advocating a *lightweight* context-swapping
mechanism that just saves and restores the general registers
(including the stack pointer).
> > By the way, libunistring's u??_conv_from_encoding does not seem to be
> > adapted to consuming buffers. The problem is that one doesn't know in
> > advance where boundaries of multi-byte sequences are so
> > u??_conv_from_encoding will likely signal a decoding error.
>
> Yes, u??_conv_from_encoding is made for converting entire strings.
> If you want to restart conversion after some bytes that are part of
> a multibyte character, you need the low-level iconv().
Okay, thanks.
Marc