libmicrohttpd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [libmicrohttpd] Post Processing With Spaces


From: Kenneth Mastro
Subject: Re: [libmicrohttpd] Post Processing With Spaces
Date: Wed, 17 Sep 2014 09:18:38 -0400

I did some research on encodings as a follow up.  It seems that both '+' and '%20' are considered valid encodings for spaces.  There are several sources for this information, here are a few:

http://www.w3schools.com/tags/ref_urlencode.asp
http://stackoverflow.com/questions/1634271/url-encoding-the-space-character-or-20
http://en.wikipedia.org/wiki/Percent-encoding

Given this information, I modified MHD's 'MHD_http_unescape' function to accept the '+' sign as a space, and it worked as expected.  It was just an additional 'case' at the top of the switch (see below).  5 lines of code.  It could be shortened a bit if the 'default' clause is changed (so the wpos++ and rpos++ were outside the switch), but I didn't want to be presumptuous.

In 'internal.c':
----------------------------------
size_t
MHD_http_unescape (void *cls,
                   struct MHD_Connection *connection,
                   char *val)
{
  char *rpos = val;
  char *wpos = val;
  char *end;
  unsigned int num;
  char buf3[3];

  while ('\0' != *rpos)
    {
      switch (*rpos)
        {
        case '+':
          *wpos = ' ';
          wpos++;
          rpos++;
          break;
        case '%':
          if ( ('\0' == rpos[1]) ||
               ('\0' == rpos[2]) )
          {
            *wpos = '\0';
            return wpos - val;
          }
          buf3[0] = rpos[1];
          ....
----------------------------------


In  url enoding, +'s are encoded with "%2B", so this solution really should just work all the time.  (i.e., it's not going to inadvertently remove a '+').

That said, I'm not sure this is the correct solution.  Thoughts/comments?  Worthwhile addition to MHD, or is this wrong for some reason?

I can't think of why this would be a bad thing to include, but I'm certainly open to other ideas and/or just not using MHD's post processor at all.


Ken



On Wed, Sep 17, 2014 at 8:44 AM, Kenneth Mastro <address@hidden> wrote:
All,

I'm using MHD's post-processor to process form data and several AJAX requests.  I have noticed that when the encoding is 'application/x-www-form-urlencoded', strings with spaces contain a '+' sign instead of the spaces.

For form data, if I explicitly set the encoding to 'multipart/form-data', the strings are parsed properly and there are no '+'s, which is how I've been getting around the problem (I assumed I was doing something wrong and haven't had time to dig into it).  However, this isn't working for my AJAX requests - setting the encoding to 'multipart/form-data' breaks things in ways I haven't fully investigated, yet.  I consider that a hack anyway, so I don't really want to pursue it.  I need to figure out why 'application/x-www-form-urlencoded' isn't working for me.

In looking at the 'Content-Type' the server is receiving for the AJAX requests, it is 'application/x-www-form-urlencoded; charset=UTF-8'.  I thought the charset might be causing an issue, but I'm having trouble getting jQuery to not use UTF-8.  From the jQuery ajax page: "The W3C XMLHttpRequest specification dictates that the charset is always UTF-8; specifying another charset will not force the browser to change the encoding."  I.e., I'm stuck with UTF-8 because it's the standard, which I'm fine with.  Regardless, MHD successfully creates the post processor, so it's seeing the actual base encoding (this works because it only compares the first chunk of chars of the content type - essentially ignoring the charset part).

MHD does not seem to provide an option for REPLACING a header (i.e., using MHD_set_connection_value only ADDS a header - it won't replace the existing Content-Type header), so even if I actually could be sure the data was ASCII, I can't fix this in the server without doing my own POST processing.  I doubt that would work anyway unless I could get the web page / browser to not do UTF-8 somehow.  (Although I think ASCII is a subset of UTF-8, maybe there are differences even in those low-numbered characters I'm not aware of?)

Anyway - In short - my question is: Is the MHD post processor just failing on 'application/x-www-form-urlencoded' data?  I.e., it's not parsing out the +'s when it should?  Or, does MHD not work with UTF-8 encoded data (despite the all the characters being in the ASCII range) and I need to do my own POST processing?  Or, does this actually work and I'm just doing something wrong?


Thanks much,
Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]