[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lynx-dev] Ot: how email size is established?
From: |
Mouse |
Subject: |
Re: [Lynx-dev] Ot: how email size is established? |
Date: |
Mon, 8 May 2023 23:17:37 -0400 (EDT) |
This isn't really directly related to lynx, but many of the encoding
mechanisms can be (though in my experience usually aren't) used for Web
content.
> I am sending a file that is slightly larger than 7 meg.
> However I am getting a users.shellworld.net error claiming that the
> file exceeds the slightly above 10 meg size restriction.
If it said the _file_ size exceeded a 10M limit, it was badly worded.
More correctly, it would say the _message_ size exceeded a 10M limit,
or perhaps the _encoded_ file size did.
The 7M is probably the size on disk; the latter is probably the size of
the email, including the _encoded_ file. All ways of sending
non-plain-text files by email involve some kind of encoding. The
commonest is probably what is called `base64', which enlarges the file
by a factor of approximately four to three (ie, every three bytes of
file are four bytes of email - "approximately" because there is slight
additional overhead, roughly three percent, for line breaks).
7 megs times 4/2 turns into 4*(7/3) or some 9.333 megs. As for where
the rest comes from, the most likely thing that occurs to me is that
(a) your 7 megs is binary megs, 1048576 (2^20) bytes each, but the 10
megs is decimal megs, 1000000 bytes each. 7 binary megs, times 4/3,
plus the end-of-line overhead, is about half a percent over 10 decimal
megs: 7*1048576 is 7340032 bytes; times 4/3 gives 9786709.333... bytes,
times 74/72 (the approximate end-of-line overhead: one CRLF inserted
every 72 octets) is 10058562.37+ octets.
The size of a megabyte is a contentious issue. As the Jargon File
notes, when counting things (like bytes on disk) that naturally occur
in units of powers of two, prefixes like "kilo" and "mega" naturally
attract power-of-two meanings (1024, 1048576, etc). But in datacomm,
they traditionally use power-of-ten meanings (1000, 1000000, etc).
Compounding the confusion, back in the...late '80s, I think it was?,
disk makers decided they were going to use the decimal meanings when
labeling their disks, because it lets them label disks with artifically
inflated numbers without quite lying enough to get slapped with
misleading-advertising charges. (Personally, I think they still
should; advertising, and in some cases devices, often says, in tiny
print, things like "based on 1GB = 1 billion bytes", which seems to me
like a clear admission that they _know_ they're being misleading.)
There even were, I'm told (I was just getting into geekdom at the
time; I wasn't buying disks), cases where the exact same device with
the exact same capacity was still sold - relabeled with the bigger
number.
Then, further compounding the confusion, someone came up with the
idiotic idea of sticking a marker into the SI prefixes to indicate
binary meanings, leading to things like KiB, MiB, etc. I have no idea
where this came from, unless it was initiated by disk-industry shills,
since as far as I can tell nobody else seriously used decimal units to
label storage. (Not even other parts of the storage industry do that;
you even now see things like a "4G" stick of memory, not a "4.29G"
stick of memory.)
One infamous fail here is the "1.44M" floppy, which is nothing of the
sort by either definition. It is 1474560 bytes, or 1.44 * 1024 * 1000;
the "M" is formed by multiplying one decimal K by one binary K, leading
to a unit nobody uses for anything else as far as I can tell. Perhaps
fortunately, those floppies are pretty much dead anyway by now.
> Odd thing is that I have sent the same file previously, with no error
> at all.
Either it wasn't quite the same file (a smaller version of the same
thing, maybe) or the limit has been lowered in the meantime.
> what I am seeking as a short term solution is a way to understand how
> programs decide email size...
Well, for the full details, read the MIME RFCs. But, most briefly, the
message is assembled from various parts, such as the text you give it
and any attachments. Each part is, potentially, encoded, though
normally only attachments will be encoded in ways that enlarge them
significantly. There is also some overhead, delimiting the parts and
declaring their types and encodings and such, but that is small enough
that, unless you have a lot of tiny attachments or the like, you can
usually ignore it.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B