help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to calculate the size of string in bytes?


From: tomas
Subject: Re: how to calculate the size of string in bytes?
Date: Tue, 18 Aug 2015 12:13:52 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Aug 18, 2015 at 02:11:54AM -0700, Sam Halliday wrote:
> Hi all,
> 
> We've had to change the ENSIME protocol to be more friendly to other editors 
> and this has meant changing how we frame TCP messages.
> 
> We used to have a 6 character hex number at the start of each message that 
> counted the number of multibyte characters, but we'd like to change it to be 
> the number of bytes in the message.
> 
> We're sending the string to `process-send-string' and `read'ing from the 
> associated network buffer. But when calculating the outgoing length of the 
> string that we want to send, we use `length' --- but we need this to be 
> `length-in-bytes' not the number of multibyte chars. Is there a built in 
> function to do this or am I going to have to iterate the string and count the 
> byte size of each character?
> 
> A quick test shows that
> 
>   (length (encode-coding-string "EURO" 'raw-text))
> 
> seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 for 
> Euro), but I am not 100% sure if this is correct.

Raw is, afaik, Emacs's internal coding system. You don't want traces of it
in the network :-)

I'd expect you to use whichever coding system the network protocol prescribes
(these days it'd be UTF-8 by default). Things will (mostly) work for raw-text
since it's nearly UTF-8.

The really correct way to do this (AFAICS) would be to find out which encoding
process-send-string is going to use (via process-coding-system) and use *that*
in the length calculation -- this way you won't lie :-)

So I'd try this (slightly reordering the let*)

  (let* ((msg (concat (ensime-prin1-to-string sexp) "\n"))
         (coding-system (cdr (process-coding-system proc)))
         (string (concat (ensime-net-encode-length (length encode-coding-string 
msg coding-system)) msg))
    ...


It seems somewhat wasteful to encode msg (to find its length) just
to let process-send-string encode again -- perhaps there's a better
idiom around for that. The use case seems common enough. Anyone?

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlXTBWAACgkQBcgs9XrR2kYjzACfVd/+R0wNKqWVt5sXxX/9WVj2
OjQAnRRuUdorjnIjd+tpL4z7frx1JGYZ
=yjMt
-----END PGP SIGNATURE-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]