[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lynx-dev] lynx -dump -width caps out at 1000 characters
From: |
Jay Hacker |
Subject: |
Re: [Lynx-dev] lynx -dump -width caps out at 1000 characters |
Date: |
Thu, 29 Sep 2011 13:45:27 -0400 |
They are normal <p> paragraphs, not under my control, and I probably
don't have a million characters, but I might have 100k, and I don't
know a priori what the maximum might be. I don't need them to
actually render on the screen (which is why I'm using -dump); I wanted
to use Lynx as an HTML-to-text converter in an application where it's
important that paragraphs stay together. Due to Lynx's limitations, I
decided on BoilerPipe, which seems to do fine. Thank you for your
help.
On Tue, Sep 27, 2011 at 5:21 PM, Thomas Dickey <address@hidden> wrote:
> On Tue, 27 Sep 2011, Stefan Caunter wrote:
>
>> On Tue, Sep 27, 2011 at 5:15 PM, Thomas Dickey <address@hidden> wrote:
>>>
>>> On Tue, 27 Sep 2011, Jay Hacker wrote:
>>>
>>>> I'm trying to dump some web pages with very long paragraphs, and it's
>>>> important to maintain correct paragraph boundaries. I tried:
>>>>
>>>> $ lynx -dump -width 1000000 mypage.html
>>>>
>>>> but it seems paragraphs still get wrapped at about 1000 characters.
>>>> Is there a hard limit on the maximum paragraph wrap width? Can it be
>>>> increased?
>
> ...
>>
>> 1000000 as a width seems high, and unlikely to render on any screen
>> I've ever seen. Are the pages on a webserver you control, or are you
>> -dumping out someone else's pages?
>
> It sounded as if he's dumping <pre> text that is using lynx to
> do wrapping. For that sort of case, he's limited by lynx's maximum
> line-length.
>
> --
> Thomas E. Dickey
> http://invisible-island.net
> ftp://invisible-island.net