guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.7-73-g419c87


From: Andy Wingo
Subject: [Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.7-73-g419c873
Date: Wed, 30 Jan 2013 14:30:57 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=419c8736243e78a60020f5215ab223af6e9b7bb1

The branch, stable-2.0 has been updated
       via  419c8736243e78a60020f5215ab223af6e9b7bb1 (commit)
       via  e10c250928bc6c4116d6344616d39f3c52edc36b (commit)
      from  1260fd0b2c4ce1d0d7e7b17df924c245f67f9058 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 419c8736243e78a60020f5215ab223af6e9b7bb1
Author: Andy Wingo <address@hidden>
Date:   Wed Jan 30 15:30:31 2013 +0100

    Revert "detect and consume byte-order marks for textual ports"
    
    This reverts commit b2cb557d75e4daf8c7c8cd43313f4cc51d9a3f1b, which was
    pushed accidentally.

commit e10c250928bc6c4116d6344616d39f3c52edc36b
Author: Andy Wingo <address@hidden>
Date:   Wed Jan 30 15:29:18 2013 +0100

    add #:doctype-handler to xml->sxml
    
    * module/sxml/simple.scm (read-internal-doctype-as-string): New helper.
      (xml->sxml): Add #:doctype-handler argument.
    
    * doc/ref/sxml.texi (Reading and Writing XML): Document
      #:doctype-handler.  Fix some other examples, and fix the default value
      of #:declare-namespaces?.
    
    * test-suite/tests/sxml.simple.test: Add all tests from the manual
      here.

-----------------------------------------------------------------------

Summary of changes:
 doc/ref/sxml.texi                 |   52 +++++++++++++++++++----
 libguile/fports.c                 |   35 +++++++---------
 libguile/load.c                   |    3 -
 libguile/ports.c                  |   85 +------------------------------------
 libguile/ports.h                  |    3 +-
 libguile/read.c                   |   14 +++++-
 module/sxml/simple.scm            |   56 +++++++++++++++++++------
 test-suite/tests/filesys.test     |   59 +-------------------------
 test-suite/tests/sxml.simple.test |   85 ++++++++++++++++++++++++++++++++++++-
 9 files changed, 200 insertions(+), 192 deletions(-)

diff --git a/doc/ref/sxml.texi b/doc/ref/sxml.texi
index 66584bf..6dc261f 100644
--- a/doc/ref/sxml.texi
+++ b/doc/ref/sxml.texi
@@ -57,7 +57,8 @@ to text.
 
 @deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
        [#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
-       [#:entities='()] [#:default-entity-handler=#f]
+       [#:entities='()] [#:default-entity-handler=#f] @
+       [#:doctype-handler=#f]
 Use SSAX to parse an XML document into SXML. Takes one optional
 argument, @var{string-or-port}, which defaults to the current input
 port.  Returns the resulting SXML document.  If @var{string-or-port} is
@@ -99,18 +100,19 @@ for certain namespaces with the @code{#:namespaces} 
keyword argument to
 @result{} (*TOP* (foo (ns2:baz)))
 @end example
 
-Passing a true @code{#:declare-namespaces?} argument will cause the
-user-given @code{#:namespaces} to be treated as if they were declared on
-the root element.
+By default, namespaces passed to @code{xml->sxml} are treated as if they
+were declared on the root element.  Passing a false
address@hidden:declare-namespaces?} argument will disable this behavior,
+requiring in-document declarations of namespaces before use..
 
 @example
 (xml->sxml "<foo><ns2:baz/></foo>"
            #:namespaces '((ns2 . "http://example.org/ns2";)))
address@hidden error: undeclared namespace: `bar'
address@hidden (*TOP* (foo (ns2:baz)))
 (xml->sxml "<foo><ns2:baz/></foo>"
            #:namespaces '((ns2 . "http://example.org/ns2";))
-           #:declare-namespaces? #t)
address@hidden (*TOP* (foo (ns2:baz)))
+           #:declare-namespaces? #f)
address@hidden error: undeclared namespace: `bar'
 @end example
 
 By default, all whitespace in XML is significant.  Passing the
@@ -120,10 +122,10 @@ whitespace in front, behind and between elements, 
treating it as
 
 @example
 (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
address@hidden (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n")
address@hidden (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
 (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
            #:trim-whitespace? #t)
address@hidden (*TOP* (foo (bar " Alfie the parrot! "))
address@hidden (*TOP* (foo (bar " Alfie the parrot! ")))
 @end example
 
 Parsed entities may be declared with the @code{#:entities} keyword
@@ -159,6 +161,38 @@ numeric character entities.
 @result{} (*TOP* (foo "\xa0 foo"))
 @end example
 
+By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
+declaration, if any.  This behavior can be overridden with the
address@hidden:doctype-handler} argument, which should be a procedure of three
+arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
+the internal doctype subset (as a string or @code{#f} if not present).
+
+The handler should return keyword arguments as multiple values, as if it
+were calling its continuation with keyword arguments.  The continuation
+accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
+in the same format that @code{xml->sxml} itself takes.  These entities
+and namespaces will be prepended to those given to the @code{xml->sxml}
+invocation.
+
address@hidden
+(define (handle-foo docname systemid internal-subset)
+  (case docname
+    ((foo)
+     (values #:entities '((greets . "<i>Hello, world!</i>"))))
+    (else
+     (values))))
+
+(xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
+           #:doctype-handler handle-foo)
address@hidden (*TOP* (p (i "Hello, world!")))
address@hidden example
+
+If the document has no doctype declaration, the @var{doctype-handler} is
+invoked with @code{#f} for the three arguments.
+
+In the future, the continuation may accept other keyword arguments, for
+example to validate the parsed SXML against the doctype.
+
 @deffn {Scheme Procedure} sxml->xml tree [port]
 Serialize the SXML tree @var{tree} as XML. The output will be written to
 the current output port, unless the optional argument @var{port} is
diff --git a/libguile/fports.c b/libguile/fports.c
index fbc0530..10cf671 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -1,5 +1,5 @@
 /* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
- *   2004, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 Free Software 
Foundation, Inc.
+ *   2004, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, 
Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -399,7 +399,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 #define FUNC_NAME s_scm_open_file
 {
   SCM port;
-  int fdes, flags = 0, scan_for_encoding = 0, consume_bom = 0, binary = 0;
+  int fdes, flags = 0, use_encoding = 1;
   unsigned int retries;
   char *file, *md, *ptr;
 
@@ -415,8 +415,6 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
     {
     case 'r':
       flags |= O_RDONLY;
-      consume_bom = 1;
-      scan_for_encoding = 1;
       break;
     case 'w':
       flags |= O_WRONLY | O_CREAT | O_TRUNC;
@@ -434,12 +432,9 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
        {
        case '+':
          flags = (flags & ~(O_RDONLY | O_WRONLY)) | O_RDWR;
-          consume_bom = 0;
          break;
        case 'b':
-         scan_for_encoding = 0;
-          consume_bom = 0;
-          binary = 1;
+         use_encoding = 0;
 #if defined (O_BINARY)
          flags |= O_BINARY;
 #endif
@@ -478,21 +473,21 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
   port = scm_i_fdes_to_port (fdes, scm_i_mode_bits (mode),
                              fport_canonicalize_filename (filename));
 
-  if (consume_bom) 
-    scm_consume_byte_order_mark (port);
-
-  if (binary)
+  if (use_encoding)
+    {
+      /* If this file has a coding declaration, use that as the port
+        encoding.  */
+      if (SCM_INPUT_PORT_P (port))
+       {
+         char *enc = scm_i_scan_for_encoding (port);
+         if (enc != NULL)
+           scm_i_set_port_encoding_x (port, enc);
+       }
+    }
+  else
     /* If this is a binary file, use the binary-friendly ISO-8859-1
        encoding.  */
     scm_i_set_port_encoding_x (port, NULL);
-  else if (scan_for_encoding)
-    /* If this is an input port and the file has a coding declaration,
-       use that as the port encoding.  */
-    {
-      char *enc = scm_i_scan_for_encoding (port);
-      if (enc != NULL)
-        scm_i_set_port_encoding_x (port, enc);
-    }
 
   scm_dynwind_end ();
 
diff --git a/libguile/load.c b/libguile/load.c
index 476461c..84b6705 100644
--- a/libguile/load.c
+++ b/libguile/load.c
@@ -106,9 +106,6 @@ SCM_DEFINE (scm_primitive_load, "primitive-load", 1, 0, 0,
     scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE);
     scm_i_dynwind_current_load_port (port);
 
-    /* FIXME: For better or for worse, scm_open_file already scans the
-       file for an encoding.  This scans again; necessary for this
-       logic, but unnecessary overall.  */
     encoding = scm_i_scan_for_encoding (port);
     if (encoding)
       scm_i_set_port_encoding_x (port, encoding);
diff --git a/libguile/ports.c b/libguile/ports.c
index 9b1be9b..55808e2 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -1,5 +1,5 @@
 /* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004,
- *   2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 Free Software Foundation, 
Inc.
+ *   2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -2153,89 +2153,6 @@ SCM_DEFINE (scm_set_port_filename_x, 
"set-port-filename!", 2, 0, 0,
 }
 #undef FUNC_NAME
 
-SCM_DEFINE (scm_consume_byte_order_mark, "consume-byte-order-mark", 1, 0, 0,
-            (SCM port),
-            "Peek ahead in @var{port} for a byte-order mark (\\uFEFF) 
encoded\n"
-            "in UTF-8 or in UTF-16.  If found, consume the byte-order mark\n"
-            "and set the port to the indicated encoding.\n"
-            "\n"
-            "As a special case, if the port's encoding is already UTF-16LE\n"
-            "or UTF-16BE (as opposed to UTF-16), we consider that the user\n"
-            "has already asked for an explicit byte order.  In this case no\n"
-            "scan is performed, and the byte-order mark (if any) is left in\n"
-            "the port.\n"
-            "\n"
-            "Return @code{#t} if a byte-order mark was consumed, and\n"
-            "@code{#f} otherwise.")
-#define FUNC_NAME s_scm_consume_byte_order_mark
-{
-  scm_t_port *pt;
-  const char *enc;
-
-  SCM_VALIDATE_PORT (1, port);
-
-  pt = SCM_PTAB_ENTRY (port);
-  enc = pt->encoding;
-
-  if (enc && strcasecmp (enc, "UTF-16BE") == 0)
-    return SCM_BOOL_F;
-
-  if (enc && strcasecmp (enc, "UTF-16LE") == 0)
-    return SCM_BOOL_F;
-
-  switch (scm_peek_byte_or_eof (port))
-    {
-    case 0xEF:
-      scm_get_byte_or_eof (port);
-      switch (scm_peek_byte_or_eof (port))
-        {
-        case 0xBB:
-          scm_get_byte_or_eof (port);
-          switch (scm_peek_byte_or_eof (port))
-            {
-            case 0xBF:
-              scm_get_byte_or_eof (port);
-              scm_i_set_port_encoding_x (port, "UTF-8");
-              return SCM_BOOL_T;
-            default:
-              scm_unget_byte (0xBB, port);
-              scm_unget_byte (0xEF, port);
-              return SCM_BOOL_F;
-            }
-        default:
-          scm_unget_byte (0xEF, port);
-          return SCM_BOOL_F;
-        }
-    case 0xFE:
-      scm_get_byte_or_eof (port);
-      switch (scm_peek_byte_or_eof (port))
-        {
-        case 0xFF:
-          scm_get_byte_or_eof (port);
-          scm_i_set_port_encoding_x (port, "UTF-16BE");
-          return SCM_BOOL_T;
-        default:
-          scm_unget_byte (0xFE, port);
-          return SCM_BOOL_F;
-        }
-    case 0xFF:
-      scm_get_byte_or_eof (port);
-      switch (scm_peek_byte_or_eof (port))
-        {
-        case 0xFE:
-          scm_get_byte_or_eof (port);
-          scm_i_set_port_encoding_x (port, "UTF-16LE");
-          return SCM_BOOL_T;
-        default:
-          scm_unget_byte (0xFF, port);
-          return SCM_BOOL_F;
-        }
-    default:
-      return SCM_BOOL_F;
-    }
-}
-#undef FUNC_NAME
-
 /* A fluid specifying the default encoding for newly created ports.  If it is
    a string, that is the encoding.  If it is #f, it is in the "native"
    (Latin-1) encoding.  */
diff --git a/libguile/ports.h b/libguile/ports.h
index 2f32345..d4d59b7 100644
--- a/libguile/ports.h
+++ b/libguile/ports.h
@@ -4,7 +4,7 @@
 #define SCM_PORTS_H
 
 /* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004,
- *   2006, 2008, 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc.
+ *   2006, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -303,7 +303,6 @@ SCM_API SCM scm_port_column (SCM port);
 SCM_API SCM scm_set_port_column_x (SCM port, SCM line);
 SCM_API SCM scm_port_filename (SCM port);
 SCM_API SCM scm_set_port_filename_x (SCM port, SCM filename);
-SCM_API SCM scm_consume_byte_order_mark (SCM port);
 SCM_INTERNAL const char *scm_i_default_port_encoding (void);
 SCM_INTERNAL void scm_i_set_default_port_encoding (const char *);
 SCM_INTERNAL void scm_i_set_port_encoding_x (SCM port, const char *str);
diff --git a/libguile/read.c b/libguile/read.c
index a8f7744..222891b 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1,5 +1,5 @@
 /* Copyright (C) 1995, 1996, 1997, 1999, 2000, 2001, 2003, 2004, 2006,
- *   2007, 2008, 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc.
+ *   2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -1985,6 +1985,7 @@ scm_i_scan_for_encoding (SCM port)
   char header[SCM_ENCODING_SEARCH_SIZE+1];
   size_t bytes_read, encoding_length, i;
   char *encoding = NULL;
+  int utf8_bom = 0;
   char *pos, *encoding_start;
   int in_comment;
 
@@ -2026,9 +2027,13 @@ scm_i_scan_for_encoding (SCM port)
 
       bytes_read = scm_c_read (port, header, SCM_ENCODING_SEARCH_SIZE);
       header[bytes_read] = '\0';
-      scm_seek (port, scm_from_int (-bytes_read), scm_from_int (SEEK_CUR));
+      scm_seek (port, scm_from_int (0), scm_from_int (SEEK_SET));
     }
 
+  if (bytes_read > 3 
+      && header[0] == '\xef' && header[1] == '\xbb' && header[2] == '\xbf')
+    utf8_bom = 1;
+
   /* search past "coding[:=]" */
   pos = header;
   while (1)
@@ -2097,6 +2102,11 @@ scm_i_scan_for_encoding (SCM port)
     /* This wasn't in a comment */
     return NULL;
 
+  if (utf8_bom && strcmp(encoding, "UTF-8"))
+    scm_misc_error (NULL,
+                   "the port input declares the encoding ~s but is encoded as 
UTF-8",
+                   scm_list_1 (scm_from_locale_string (encoding)));
+
   return encoding;
 }
 
diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm
index 606975d..703ad91 100644
--- a/module/sxml/simple.scm
+++ b/module/sxml/simple.scm
@@ -26,6 +26,7 @@
 ;;; Code:
 
 (define-module (sxml simple)
+  #:use-module (sxml ssax input-parse)
   #:use-module (sxml ssax)
   #:use-module (sxml transform)
   #:use-module (ice-9 match)
@@ -35,10 +36,6 @@
 ;; Helpers from upstream/SSAX.scm.
 ;;
 
-(define (ssax:warn port msg . args)
-  (format (current-ssax-error-port)
-          ";;; SSAX warning: ~a ~a\n" msg args))
-
 ;     ssax:reverse-collect-str LIST-OF-FRAGS -> LIST-OF-FRAGS
 ; given the list of fragments (some of which are text strings)
 ; reverse the list and concatenate adjacent text strings.
@@ -65,6 +62,17 @@
                  (cons (string-concatenate/shared strs) result)))
              '())))))))
 
+(define (read-internal-doctype-as-string port)
+  (string-concatenate/shared
+    (let loop ()
+      (let ((fragment
+            (next-token '() '(#\]) "reading internal DOCTYPE" port)))
+       (if (eqv? #\> (peek-next-char port))
+           (begin
+             (read-char port)
+             (cons fragment '()))
+           (cons* fragment "]" (loop)))))))
+
 ;; Ideas for the future for this interface:
 ;;
 ;;  * Allow doctypes to provide parsed entities
@@ -81,7 +89,8 @@
                     (declare-namespaces? #t)
                     (trim-whitespace? #f)
                     (entities '())
-                    (default-entity-handler #f))
+                    (default-entity-handler #f)
+                    (doctype-handler #f))
   "Use SSAX to parse an XML document into SXML. Takes one optional
 argument, @var{string-or-port}, which defaults to the current input
 port."
@@ -96,7 +105,7 @@ port."
   ;; NAMESPACES: list of (DOC-PREFIX . (USER-PREFIX . URI)).
   ;; A DOC-PREFIX of #f indicates that it comes from the user.
   ;; Otherwise, prefixes are symbols.
-  (define (user-namespaces)
+  (define (munge-namespaces namespaces)
     (map (lambda (el)
            (match el
              ((prefix . uri-string)
@@ -105,6 +114,9 @@ port."
                      (ssax:uri-string->symbol uri-string)))))
          namespaces))
 
+  (define (user-namespaces)
+    (munge-namespaces namespaces))
+
   (define (user-entities)
     (if (and default-entity-handler
              (not (assq '*DEFAULT* entities)))
@@ -117,6 +129,13 @@ port."
        (symbol-append prefix (string->symbol ":") local-part))
       (_ name)))
 
+  (define (doctype-continuation seed)
+    (lambda* (#:key (entities '()) (namespaces '()))
+      (values #f
+              (append entities (user-entities))
+              (append (munge-namespaces namespaces) (user-namespaces))
+              seed)))
+
   ;; The SEED in this parser is the SXML: initialized to '() at each new
   ;; level by the fdown handlers; built in reverse by the fhere parsers;
   ;; and reverse-collected by the fup handlers.
@@ -159,18 +178,29 @@ port."
      ;;
      ;; SEED builds up the content.
      (lambda (port docname systemid internal-subset? seed)
-       (when internal-subset?
-         (ssax:warn port "Internal DTD subset is not currently handled ")
-         (ssax:skip-internal-dtd port))
-       (ssax:warn port "DOCTYPE DECL " docname " "
-                  systemid " found and skipped")
-       (values #f (user-entities) (user-namespaces) seed))
+       (call-with-values
+           (lambda ()
+             (cond
+              (doctype-handler
+               (doctype-handler docname systemid
+                                (and internal-subset?
+                                     (read-internal-doctype-as-string port))))
+              (else
+               (when internal-subset?
+                 (ssax:skip-internal-dtd port))
+               (values))))
+         (doctype-continuation seed)))
 
      UNDECL-ROOT
      ;; This is like the DOCTYPE handler, but for documents that do not
      ;; have a <!DOCTYPE!> entry.
      (lambda (elem-gi seed)
-       (values #f (user-entities) (user-namespaces) seed))
+       (call-with-values
+           (lambda ()
+             (if doctype-handler
+                 (doctype-handler #f #f #f)
+                 (values)))
+        (doctype-continuation seed)))
 
      PI
      ((*DEFAULT*
diff --git a/test-suite/tests/filesys.test b/test-suite/tests/filesys.test
index 8bd974d..a6bfb6e 100644
--- a/test-suite/tests/filesys.test
+++ b/test-suite/tests/filesys.test
@@ -1,6 +1,6 @@
 ;;;; filesys.test --- test file system functions -*- scheme -*-
 ;;;; 
-;;;; Copyright (C) 2004, 2006, 2013 Free Software Foundation, Inc.
+;;;; Copyright (C) 2004, 2006 Free Software Foundation, Inc.
 ;;;; 
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -17,8 +17,6 @@
 ;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
USA
 
 (define-module (test-suite test-filesys)
-  #:use-module (ice-9 rdelim)
-  #:use-module (ice-9 binary-ports)
   #:use-module (test-suite lib)
   #:use-module (test-suite guile-test))
 
@@ -129,58 +127,3 @@
 
 (delete-file (test-file))
 (delete-file (test-symlink))
-
-(let ((s "\ufeffHello, world!"))
-  (define* (test-encoding encoding #:optional (ambient "ISO-8859-1"))
-    (with-fluids ((%default-port-encoding ambient))
-      (let* ((bytes (catch 'misc-error
-                      (lambda ()
-                        (call-with-values open-bytevector-output-port
-                          (lambda (port get-bytevector)
-                            (set-port-encoding! port encoding)
-                            (display s port)
-                            (get-bytevector))))
-                      (lambda args
-                        (throw 'unresolved))))
-             (name (string-copy "myfile-XXXXXX"))
-             (port (mkstemp! name)))
-        (put-bytevector port bytes)
-        (close-port port)
-        (let ((contents (call-with-input-file name read-string)))
-          (delete-file name)
-          contents))))
-
-  (pass-if "UTF-8"
-    (equal? (test-encoding "UTF-8")
-            "Hello, world!"))
-
-  (pass-if "UTF-16BE"
-    (equal? (test-encoding "UTF-16BE")
-            "Hello, world!"))
-
-  (pass-if "UTF-16LE"
-    (equal? (test-encoding "UTF-16LE")
-            "Hello, world!"))
-
-  (pass-if "UTF-8 (ambient)"
-    (equal? (test-encoding "UTF-8" "UTF-8")
-            "Hello, world!"))
-
-  (pass-if "UTF-8 (UTF-16 ambient)"
-    (equal? (test-encoding "UTF-8" "UTF-16")
-            "Hello, world!"))
-
-  ;; Unicode 6.2 section 16.8:
-  ;;
-  ;; For compatibility with versions of the Unicode Standard prior to
-  ;; Version 3.2, the code point U+FEFF has the word-joining semantics
-  ;; of zero width no-break space when it is not used as a BOM.  [...]
-  ;;
-  ;; Where the byte order is explicitly specified, such as in UTF-16BE
-  ;; or UTF-16LE, then all U+FEFF characters -- even at the very
-  ;; beginning of the text -- are to be interpreted as zero width
-  ;; no-break spaces.
-  ;;
-  (pass-if "UTF-16LE (ambient)"
-    (equal? (test-encoding "UTF-16LE" "UTF-16LE")
-            "\ufeffHello, world!")))
diff --git a/test-suite/tests/sxml.simple.test 
b/test-suite/tests/sxml.simple.test
index 623f13e..e52ba31 100644
--- a/test-suite/tests/sxml.simple.test
+++ b/test-suite/tests/sxml.simple.test
@@ -1,6 +1,6 @@
 ;;;; sxml.simple.test --- (sxml simple)  -*- mode: scheme; coding: utf-8; -*-
 ;;;;
-;;;;   Copyright (C) 2010 Free Software Foundation, Inc.
+;;;;   Copyright (C) 2010, 2013 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -20,6 +20,8 @@
   #:use-module (test-suite lib)
   #:use-module (sxml simple))
 
+(define parser-error '(parser-error . ""))
+
 (define %xml-sample
   ;; An XML sample without any space in between tags, to make it easier.
   (string-append "<?xml version='1.0' encoding='utf-8'?>"
@@ -50,3 +52,84 @@
                 (lambda ()
                   (sxml->xml
                    (xml->sxml (open-input-string %xml-sample))))))))))
+
+(with-test-prefix "namespaces"
+  (pass-if-equal
+      (xml->sxml "<foo xmlns=\"http://example.org/ns1\";>text</foo>")
+      '(*TOP* (http://example.org/ns1:foo "text")))
+
+  (pass-if-equal
+      (xml->sxml "<foo xmlns=\"http://example.org/ns1\";>text</foo>"
+                 #:namespaces '((ns1 . "http://example.org/ns1";)))
+      '(*TOP* (ns1:foo "text")))
+
+  (pass-if-equal
+      (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\";><bar:baz/></foo>"
+                 #:namespaces '((ns2 . "http://example.org/ns2";)))
+      '(*TOP* (foo (ns2:baz))))
+
+  (pass-if-equal
+      (xml->sxml "<foo><ns2:baz/></foo>"
+                 #:namespaces '((ns2 . "http://example.org/ns2";)))
+      '(*TOP* (foo (ns2:baz))))
+
+  (pass-if-exception "namespace undeclared" parser-error
+    (xml->sxml "<foo><ns2:baz/></foo>"
+               #:namespaces '((ns2 . "http://example.org/ns2";))
+               #:declare-namespaces? #f)))
+
+(with-test-prefix "whitespace"
+  (pass-if-equal
+      (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
+      '(*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n")))
+
+  (pass-if-equal
+      (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
+                 #:trim-whitespace? #t)
+      '(*TOP* (foo (bar " Alfie the parrot! ")))))
+
+(with-test-prefix "parsed entities"
+  (pass-if-equal
+      '(*TOP* (foo "&"))
+      (xml->sxml "<foo>&amp;</foo>"))
+
+  (pass-if-exception "nbsp undefined" parser-error
+    (xml->sxml "<foo>&nbsp;</foo>"))
+
+  (pass-if-equal
+      '(*TOP* (foo "\xA0"))
+      (xml->sxml "<foo>&nbsp;</foo>"
+                 #:entities '((nbsp . "\xA0"))))
+
+  (pass-if-equal
+      '(*TOP* (foo "\xA0"))
+      (xml->sxml "<foo>&#xA0;</foo>"))
+
+  (let ((ents '()))
+    (pass-if-equal
+        (xml->sxml "<foo>&nbsp; &foo;</foo>"
+                   #:default-entity-handler
+                   (lambda (port name)
+                     (case name
+                       ((nbsp) "\xa0")
+                       (else
+                        (set! ents (cons name ents))
+                        "qux"))))
+        '(*TOP* (foo "\xa0 qux")))
+
+    (pass-if-equal
+        ents
+        '(foo))))
+
+(with-test-prefix "doctype handlers"
+  (define (handle-foo docname systemid internal-subset)
+    (case docname
+      ((foo)
+       (values #:entities '((greets . "<i>Hello, world!</i>"))))
+      (else
+       (values))))
+
+  (pass-if-equal
+      (xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
+                 #:doctype-handler handle-foo)
+      '(*TOP* (p (i "Hello, world!")))))


hooks/post-receive
-- 
GNU Guile



reply via email to

[Prev in Thread] Current Thread [Next in Thread]