[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] [RFC] Extend concurrency support
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] [RFC] Extend concurrency support |
Date: |
Tue, 20 May 2014 16:10:49 +0200 |
User-agent: |
KMail/4.12.4 (Linux/3.14-1-amd64; KDE/4.13.1; x86_64; ; ) |
On Tuesday 20 May 2014 12:56:48 Giuseppe Scrivano wrote:
> Tim Ruehsen <address@hidden> writes:
> > most of this is already solved in https://github.com/rockdaboot/mget which
> > was originally thought as a 'modern' Wget. I would like to see Mget and
> > Wget merge into something like 'Wget2'. At least, feel free to move code
> > from Mget into Wget as you wish (I am the author and copyright holder of
> > Mget, both projects have the same license).
>
> I'm afraid that Jure can't copy any existing code for his Summer of Code
> of project but reinvent the wheel if needed...
I already was in fear of this.
> > History...
> > I have been at the same point as you some years ago. And after looking at
> > Wget I found Wget's code has to be redesigned. I had two choices:
> > struggling with grown code or restart from scratch. I did the second
> > because I didn't see a chance to get huge code changes into Wget. Either
> > you have to discuss every little change or you end up with your own code
> > branch, which might become integrated into master during the next few
> > years.
> >
> > It has been asked many times and I do it again: shouldn't we start with
> > Wget2 development, maybe having Jure as "project leader" (if you want). I
> > made a start with Mget (e.g. consequently putting reusable code into a
> > library)... and I would spend some time helping to merge Mget and Wget.
> > Due to the library based character of Mget, I shouldn't be too hard.
>
> ...but on the long term we can avoid that task and re-use existing
> wheels. Not sure what other people think about it, but I think wget2,
> whatever it will be, should be based on libcurl and focus the wget
> development on what wget does better, eg recursive downloads.
Libcurl is one option (and not the worst). At least it would replace the HTTP
and FTP send and receive (plus the underlying TCP network handling - what
about DNS caching ?). This is just a small amount of Wget's code to replace.
You still need (just to name a few):
- basic algorithms like hashmaps (e.g. stringmaps), vectors, lists / queues,
buffers / growables, etc.
- a HTML/XML scanner for Html, Metalink, sitemaps, atom / rss feeds
- a CSS scanner
- file hashing
- locale en/decoding for filenames (IDNA)
- HSTS functions
- Cookie logic (incl. public suffix handling)
- robots.txt handling
- threading abstraction API
- a threading model incl. communication
Mget / libmget has all of these incl. tests. And the functionality is strong
leaning onto what a tool like Wget needs. Libmget is modular in such a way
that it would be easily possible to replace the HTTP/HTTPS and networking with
libcurl.
I assume that one of Wget2's most interesting features will be a library that
gives third parties easy access to Wget's functionality. So why not merge
existing code with libmget - most of it would be Windows and VMS compatibility
code.
> Daniel will probably agree with me :-)
Definitely he will.
Regards, Tim
Re: [Bug-wget] [RFC] Extend concurrency support, Ángel González, 2014/05/21