bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Submitting a patch fixing the --content-disposition optio


From: Vladimír Pýcha
Subject: Re: [Bug-wget] Submitting a patch fixing the --content-disposition option of wget
Date: Fri, 14 Feb 2014 15:59:13 +0100

Ok, attached is the new patch file with all edits mentioned in ChangeLog.

Cheers,
Vlad


On Fri, Feb 14, 2014 at 1:12 PM, Darshit Shah <address@hidden> wrote:

> That seems like a much cleaner patch thanks!!
>
> We usually mention every edit in the ChangeLog. Those missing are trivial.
> It would be great if you could those too. Specifically, cookies.c, and
> other files which you have changed, even of to change the function header.
>
> Else, Giuseppe while applying it may do it.
>
>
> On Fri, Feb 14, 2014 at 12:59 PM, Vladimír Pýcha <address@hidden> wrote:
>
>> Hi Darshit,
>> Thanks for your suggestions, I followed them. Please find attached the
>> improved patch, created with `git format-patch -1`.
>>
>> Cheers,
>> Vlad
>>
>>
>>
>> On Wed, Feb 12, 2014 at 4:33 PM, Darshit Shah <address@hidden> wrote:
>>
>>>
>>> On Wed, Feb 12, 2014 at 9:48 AM, Vladimír Pýcha <address@hidden>wrote:
>>>
>>>> Hello,
>>>> I have created a small patch for wget 1.15. It is related to the
>>>> experimental --content-disposition option. The patch is attached.
>>>>
>>>> This patch ensures that if the filename parameter of Content-Disposition
>>>> header is url-encoded as described in RFC 2231, then wget decodes it
>>>> (the
>>>> character set is ignored).
>>>>
>>>> Besides using the --content-disposition option, there may be also
>>>> needed to
>>>> use the --restrict-file-names=nocontrol option to avoid wget from
>>>> escaping
>>>> some characters of the filename.
>>>>
>>>> I have created a testing URL:
>>>> http://www.vladpride.cz/res/content-disposition-with-unicode.php
>>>>
>>>> The source code of the PHP script at that URL is as follows:
>>>>
>>>> <?php
>>>> $str = 'aácč dďsš zžrř iínň';
>>>> header('Content-Type: text/plain; charset=UTF-8');
>>>> header("Content-Disposition: attachment;
>>>> filename*=UTF-8''".rawurlencode($str.'.txt'));
>>>> echo $str;
>>>> ?>
>>>>
>>>> You can use the following command to test it:
>>>>
>>>> wget --content-disposition --restrict-file-names=nocontrol "
>>>> http://www.vladpride.cz/res/content-disposition-with-unicode.php";
>>>>
>>>> Using the above command, my patched version of wget correctly saves the
>>>> file as:
>>>>
>>>> aácč dďsš zžrř iínň.txt
>>>>
>>>> Using the same command with unpatched wget, the file is incorrectly
>>>> saved
>>>> as:
>>>>
>>>> a%C3%A1c%C4%8D%20d%C4%8Fs%C5%A1%20z%C5%BEr%C5%99%20i%C3%ADn%C5%88.txt
>>>>
>>>> It would be great if my patch could be incorporated into the next
>>>> release
>>>> of wget.
>>>>
>>>> I have created the patch file with the following command:
>>>>
>>>> diff -rupN wget-1.15/src/ wget-1.15-custom/src/ >
>>>> content-disposition.patch
>>>>
>>>> You can apply the patch with the following command, while in the
>>>> directory
>>>> where the source code tarball was extracted to:
>>>>
>>>> patch -p1 < ../content-disposition.patch
>>>>
>>>> Then the output will be like this:
>>>>
>>>> patching file src/http.c
>>>> patching file src/http.h
>>>> patching file src/url.c
>>>> patching file src/url.h
>>>>
>>>> Note that in my system, Ubuntu 12.04, I had to install package
>>>> libgnutls-dev to be able to compile wget.
>>>>
>>>> Cheers,
>>>> Vlad
>>>>
>>>
>>> Hi,
>>>
>>> Thanks for your patch! However, it would be great if you could add an
>>> entry to the ChangeLog in the git tree and recreate the patch file. Also,
>>> it would be easier to simply create the patch using `git format-patch -1`
>>>
>>> You can find the git repository for Wget at:
>>> http://git.savannah.gnu.org/cgit/wget.git
>>>
>>> Some comments about the code inline. (Gmail sometimes mangles lines. So
>>> I'm sorry if this looks ugly)
>>>
>>> iff -rupN wget-1.15/src/http.c wget-1.15-custom/src/http.c
>>> --- wget-1.15/src/http.c        2014-01-07 15:58:49.000000000 +0100
>>> +++ wget-1.15-custom/src/http.c 2014-02-12 08:27:28.713919909 +0100
>>> @@ -1066,7 +1066,20 @@ bool
>>>  extract_param (const char **source, param_token *name, param_token
>>> *value,
>>>                 char separator)
>>>  {
>>> +  return extract_param_new (source, name, value, separator, (bool *) 0);
>>> +}
>>> +
>>> +/* Like extract_param, but with the addition of parameter
>>> is_url_encoded that is set to true if the value is url-encoded (see RFC
>>> 2231 for details). */
>>>
>>> Your comment exceeds the 80 characters per line convention. It would be
>>> great if you could fix this.
>>>
>>> extract_param_new is *not* like extract_param. Because extract_param
>>> simply calls extract_param_new. It is like the old version of
>>> extract_param, but a new reader does not know that. Hence, simply move the
>>> old comment to your new function.
>>>
>>> +
>>> +bool
>>> +extract_param_new (const char **source, param_token *name, param_token
>>> *value,
>>> +               char separator, bool *is_url_encoded)
>>>
>>> I am not sure why you need to define extract_param_new. Why is not
>>> editing the former extract_param() method a better idea? Sure, you would be
>>> required to edit all invokations of the method in the source code, but such
>>> refactoring isn't difficult.
>>>
>>> +{
>>>    const char *p = *source;
>>> +  if (is_url_encoded)
>>> +    {
>>> +      *is_url_encoded = false;
>>> +    }
>>>
>>> Am I correct in interpreting that you are simply forcing the value of
>>> is_url_encoded to be false? Why have a parameter in that case? I might be
>>> wrong, but please explain this.
>>>
>>>    while (c_isspace (*p)) ++p;
>>>    if (!*p)
>>> @@ -1125,6 +1138,9 @@ extract_param (const char **source, para
>>>    int param_type = modify_param_name(name);
>>>    if (NOT_RFC2231 != param_type)
>>>      {
>>> +      if (RFC2231_ENCODING == param_type && is_url_encoded) {
>>> +        *is_url_encoded = true;
>>> +      }
>>>        modify_param_value(value, param_type);
>>>      }
>>>    return true;
>>> @@ -1137,13 +1153,17 @@ extract_param (const char **source, para
>>>  /* Appends the string represented by VALUE to FILENAME */
>>>
>>>  static void
>>> -append_value_to_filename (char **filename, param_token const * const
>>> value)
>>> +append_value_to_filename (char **filename, param_token const * const
>>> value, bool is_url_encoded)
>>>  {
>>>    int original_length = strlen(*filename);
>>>    int new_length = strlen(*filename) + (value->e - value->b);
>>>    *filename = xrealloc (*filename, new_length+1);
>>>    memcpy (*filename + original_length, value->b, (value->e -
>>> value->b));
>>>    (*filename)[new_length] = '\0';
>>> +  if (is_url_encoded)
>>> +    {
>>> +      url_unescape(*filename + original_length);
>>> +    }
>>>  }
>>>
>>>  #undef MAX
>>> @@ -1176,7 +1196,8 @@ parse_content_disposition (const char *h
>>>  {
>>>    param_token name, value;
>>>    *filename = NULL;
>>> -  while (extract_param (&hdr, &name, &value, ';'))
>>> +  bool is_url_encoded;
>>> +  for (is_url_encoded = false; extract_param_new (&hdr, &name, &value,
>>> ';', &is_url_encoded); is_url_encoded = false)
>>>      {
>>>        int isFilename = BOUNDED_EQUAL_NO_CASE ( name.b, name.e,
>>> "filename" );
>>>        if ( isFilename && value.b != NULL)
>>> @@ -1192,9 +1213,15 @@ parse_content_disposition (const char *h
>>>              continue;
>>>
>>>            if (*filename)
>>> -            append_value_to_filename (filename, &value);
>>> +            append_value_to_filename (filename, &value, is_url_encoded);
>>>            else
>>> -            *filename = strdupdelim (value.b, value.e);
>>> +            {
>>> +              *filename = strdupdelim (value.b, value.e);
>>> +              if (is_url_encoded)
>>> +                {
>>> +                  url_unescape (*filename);
>>> +                }
>>> +            }
>>>          }
>>>      }
>>>
>>> diff -rupN wget-1.15/src/http.h wget-1.15-custom/src/http.h
>>> --- wget-1.15/src/http.h        2014-01-04 13:49:47.000000000 +0100
>>> +++ wget-1.15-custom/src/http.h 2014-02-12 08:27:28.725919927 +0100
>>> @@ -44,6 +44,7 @@ typedef struct {
>>>    const char *b, *e;
>>>  } param_token;
>>>  bool extract_param (const char **, param_token *, param_token *, char);
>>> +bool extract_param_new (const char **, param_token *, param_token *,
>>> char, bool *);
>>>
>>>
>>>  #endif /* HTTP_H */
>>> diff -rupN wget-1.15/src/url.c wget-1.15-custom/src/url.c
>>> --- wget-1.15/src/url.c 2014-01-04 13:49:47.000000000 +0100
>>> +++ wget-1.15-custom/src/url.c  2014-02-12 08:27:28.725919927 +0100
>>> @@ -169,7 +169,7 @@ static const unsigned char urlchr_table[
>>>     The transformation is done in place.  If you need the original
>>>     string intact, make a copy before calling this function.  */
>>>
>>> -static void
>>> +void
>>>  url_unescape (char *s)
>>>  {
>>>    char *t = s;                  /* t - tortoise */
>>> diff -rupN wget-1.15/src/url.h wget-1.15-custom/src/url.h
>>> --- wget-1.15/src/url.h 2013-10-21 16:50:12.000000000 +0200
>>> +++ wget-1.15-custom/src/url.h  2014-02-12 08:27:28.725919927 +0100
>>> @@ -100,6 +100,7 @@ struct url
>>>  /* Function declarations */
>>>
>>>  char *url_escape (const char *);
>>> +void url_unescape (char *);
>>>  char *url_escape_unsafe_and_reserved (const char *);
>>>
>>>  struct url *url_parse (const char *, int *, struct iri *iri, bool
>>> percent_encode);
>>>
>>>
>>>
>>> --
>>> Thanking You,
>>> Darshit Shah
>>>
>>>
>>
>
>
> --
> Thanking You,
> Darshit Shah
>
>

Attachment: 0001-URL-decode-the-filename-parameter-of-Content-Disposi.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]