guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, master, updated. release_1-9-14-40-g4a


From: Andy Wingo
Subject: [Guile-commits] GNU Guile branch, master, updated. release_1-9-14-40-g4a655e5
Date: Fri, 07 Jan 2011 17:16:34 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=4a655e50a3890da9cd453a0b2c83dacc2cfcc34e

The branch, master has been updated
       via  4a655e50a3890da9cd453a0b2c83dacc2cfcc34e (commit)
       via  30c282bf93d7162fd8898a035ab60a3a556c5199 (commit)
       via  17072fd2c6acad1d4f2b5c98eb0d62911ea07406 (commit)
       via  6efbc280c56a1c318717723423e2b4e2654c353a (commit)
       via  f0554ee74a7ae2114408abb16c65b886f1ca73d3 (commit)
       via  328255e4f47e73b73a10937db8250d8b2357aa67 (commit)
       via  e0c83bf5009480ea3e872465b669d26f6d6871ef (commit)
       via  622415380cb4b24d1d12641781f705f99b9e720e (commit)
       via  f756cd30764c76d270cc34969d7156ebcfcfb214 (commit)
       via  247a56fa5a00e3e2c373ab30762bd119fc250a07 (commit)
       via  ad5cbc470f2cc29d63a735871883a9436e0915d5 (commit)
       via  d40e1ca893149e9781bad54ac1e39d03e7be988f (commit)
       via  929ccf48fc4bada585b29b3887f295bfcc1dcdaa (commit)
       via  569269b4b23f48c0490bea5538207970598b10dd (commit)
       via  91b320fe1643d80bdd1997a8575db5d540fcdabb (commit)
       via  7d6b8b75fc5b7b5ef1901220687b36d407636877 (commit)
       via  8745c33afbb7b103e9aa27adbd55884e6bbd4b51 (commit)
       via  b3f9444892224e1ee35645681cb20fe8e8ec2ff8 (commit)
       via  d75a81b1286fc0144274ebb628088d02683a40c7 (commit)
      from  8a41c56af1d155d1987c8eeeac324871efd9131b (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 4a655e50a3890da9cd453a0b2c83dacc2cfcc34e
Author: Andy Wingo <address@hidden>
Date:   Fri Jan 7 09:08:58 2011 -0800

    use scm_from_latin1_symboln for string literals and load-symbol
    
    * libguile/bytevectors.c:
    * libguile/eval.c:
    * libguile/goops.c:
    * libguile/i18n.c:
    * libguile/load.c:
    * libguile/memoize.c:
    * libguile/modules.c:
    * libguile/ports.c:
    * libguile/print.c:
    * libguile/procs.c:
    * libguile/programs.c:
    * libguile/read.c:
    * libguile/script.c:
    * libguile/srfi-14.c:
    * libguile/stacks.c:
    * libguile/strings.c:
    * libguile/throw.c:
    * libguile/vm.c: Use scm_from_latin1_symboln to make symbols from string
      literals, because they aren't in the user's locale -- they are in
      ASCII, and we can optimize this case.
    
    * libguile/vm-i-loader.c: Also use scm_from_latin1_symboln when loading
      narrow symbols.

commit 30c282bf93d7162fd8898a035ab60a3a556c5199
Author: Andy Wingo <address@hidden>
Date:   Fri Jan 7 09:03:12 2011 -0800

    optimize scm_from_latin1_symboln
    
    * libguile/symbols.c (lookup_interned_latin1_symbol): New helper.
      (scm_from_latin1_symboln): Use lookup_interned_latin1_symbol, so we
      avoid allocating a string in that case.

commit 17072fd2c6acad1d4f2b5c98eb0d62911ea07406
Author: Andy Wingo <address@hidden>
Date:   Fri Jan 7 08:42:15 2011 -0800

    lookup_interned_symbol uses get_handle_by_hash
    
    * libguile/symbols.c (lookup_interned_symbol): Change to use
      scm_hash_fn_get_handle_by_hash.

commit 6efbc280c56a1c318717723423e2b4e2654c353a
Author: Andy Wingo <address@hidden>
Date:   Fri Jan 7 08:36:39 2011 -0800

    add scm_hash_fn_get_handle_by_hash
    
    * libguile/hashtab.h:
    * libguile/hashtab.c (scm_hash_fn_get_handle_by_hash): New internal
      procedure, which should make symbol table lookup faster.

commit f0554ee74a7ae2114408abb16c65b886f1ca73d3
Author: Andy Wingo <address@hidden>
Date:   Fri Jan 7 07:44:27 2011 -0800

    remove vector hash table code
    
    * libguile/hashtab.c: Remove deprecated hash-tables-as-vectors support
      code.

commit 328255e4f47e73b73a10937db8250d8b2357aa67
Author: Andy Wingo <address@hidden>
Date:   Thu Jan 6 16:20:54 2011 -0800

    hashtab cleanups
    
    * libguile/hashtab.c: Update comments.
      (hashtable_size): Allow bigger vectors on 64-bit machines.

commit e0c83bf5009480ea3e872465b669d26f6d6871ef
Author: Andy Wingo <address@hidden>
Date:   Wed Jan 5 20:15:11 2011 -0800

    fix symbol garbage collection
    
    * libguile/symbols.c (lookup_interned_symbol, intern_symbol): Refactor
      to use hashtab.[ch] interfaces.

commit 622415380cb4b24d1d12641781f705f99b9e720e
Author: Andy Wingo <address@hidden>
Date:   Wed Jan 5 18:43:28 2011 -0800

    add hash functions for locale, latin1, and utf8 strings
    
    * libguile/hash.c (scm_i_locale_string_hash)
      (scm_i_latin1_string_hash, scm_i_utf8_string_hash): New functions.

commit f756cd30764c76d270cc34969d7156ebcfcfb214
Author: Andy Wingo <address@hidden>
Date:   Wed Jan 5 16:40:22 2011 -0800

    multibyte regex error handling fix
    
    * libguile/regex-posix.c (fixup_multibyte_match): Fix mbrlen error
      handling.

commit 247a56fa5a00e3e2c373ab30762bd119fc250a07
Author: Andy Wingo <address@hidden>
Date:   Wed Jan 5 16:32:48 2011 -0800

    hash.c cleanup
    
    * libguile/hash.c (scm_hasher): Remove needless remember_upto_here.

commit ad5cbc470f2cc29d63a735871883a9436e0915d5
Author: Andy Wingo <address@hidden>
Date:   Wed Jan 5 18:24:32 2011 -0600

    add scm_from_{latin1,utf8}_symbol{n,}
    
    * libguile/symbols.c (scm_from_latin1_symbol, scm_from_latin1_symboln)
      (scm_from_utf8_symbol, scm_from_utf8_symboln): New functions.

commit d40e1ca893149e9781bad54ac1e39d03e7be988f
Author: Andy Wingo <address@hidden>
Date:   Wed Jan 5 18:21:54 2011 -0600

    add scm_{to,from}_{utf8,latin1}_string{n,}
    
    * libguile/strings.h:
    * libguile/strings.c (scm_from_latin1_string, scm_to_latin1_string): New
      functions, in terms of the latin1_stringn variants.
      (scm_from_utf8_string, scm_from_utf8_stringn)
      (scm_to_utf8_string, scm_to_utf8_stringn): New functions.
      (scm_i_from_utf8_string, scm_i_to_utf8_string): Removed these internal
      functions.
      (scm_from_stringn): Handle -1 as a length. Unlike the previous
      behavior of scm_from_locale_string (NULL), which returned the empty
      string, we now raise an error.  The null pointer is not the same as
      the empty string.
    
    * libguile/stime.c (scm_strftime, scm_strptime): Adapt to publishing of
      utf8 functions.

commit 929ccf48fc4bada585b29b3887f295bfcc1dcdaa
Author: Andy Wingo <address@hidden>
Date:   Sun Jan 2 12:42:25 2011 -0500

    read-header returns EOF at end, update (web http) docs
    
    * doc/ref/web.texi (HTTP): Add an example for declaring a header, and
      adapt to read-header change.
    
    * module/web/http.scm (read-header): Return EOF for both values if there
      are no more headers, instead of #f.
      (read-headers): Adapt.

commit 569269b4b23f48c0490bea5538207970598b10dd
Author: Andy Wingo <address@hidden>
Date:   Sun Jan 2 10:16:57 2011 -0500

    update URI documentation
    
    * doc/ref/web.texi (Types and the Web): Fix spacing.
      (URIs): Give default values, and clarify a number of procedure docs.
      Update "encoding" name, and string->uri name.

commit 91b320fe1643d80bdd1997a8575db5d540fcdabb
Author: Andy Wingo <address@hidden>
Date:   Sun Jan 2 10:07:54 2011 -0500

    uri-encode fast path
    
    * module/web/uri.scm (uri-encode): Add a fast-path for the common case
      in which the string does not contain any reserved characters.

commit 7d6b8b75fc5b7b5ef1901220687b36d407636877
Author: Andy Wingo <address@hidden>
Date:   Sun Jan 2 09:41:14 2011 -0500

    uri-decode #:encoding, not #:charset
    
    * module/web/uri.scm (call-with-encoded-output-string, encode-string)
      (decode-string, uri-decode, uri-encode): Change all instances of
      "charset" to "encoding", as variables and arguments.

commit 8745c33afbb7b103e9aa27adbd55884e6bbd4b51
Author: Andy Wingo <address@hidden>
Date:   Fri Dec 31 12:44:11 2010 -0500

    rename string->uri and uri->string.
    
    * module/web/uri.scm (string->uri, uri->string): Rename from parse-uri
      and unparse-uri.
    
    * test-suite/tests/web-uri.test:
    * module/web/http.scm: All callers changed.

commit b3f9444892224e1ee35645681cb20fe8e8ec2ff8
Author: Andy Wingo <address@hidden>
Date:   Fri Dec 31 11:53:14 2010 -0500

    clarify uri fragment discussion
    
    * doc/ref/web.texi (URIs): Clarify the discussion of URI fragments.

commit d75a81b1286fc0144274ebb628088d02683a40c7
Author: Andy Wingo <address@hidden>
Date:   Fri Dec 31 11:12:07 2010 -0500

    rewrite web.texi intro
    
    * doc/ref/web.texi (Web): Rewrite the intro.
      (Types and the Web): New subsection, a mini-rant.

-----------------------------------------------------------------------

Summary of changes:
 doc/ref/web.texi              |  548 ++++++++++++++++++++++++++++++-----------
 libguile/bytevectors.c        |    6 +-
 libguile/eval.c               |   10 +-
 libguile/goops.c              |   38 ++--
 libguile/hash.c               |   82 ++++++-
 libguile/hash.h               |    9 +-
 libguile/hashtab.c            |  222 ++++++++++-------
 libguile/hashtab.h            |    8 +
 libguile/i18n.c               |   14 +-
 libguile/load.c               |    4 +-
 libguile/memoize.c            |    8 +-
 libguile/modules.c            |    4 +-
 libguile/ports.c              |   12 +-
 libguile/print.c              |    8 +-
 libguile/procs.c              |    4 +-
 libguile/programs.c           |    4 +-
 libguile/read.c               |    4 +-
 libguile/regex-posix.c        |   10 +-
 libguile/script.c             |    8 +-
 libguile/srfi-14.c            |    8 +-
 libguile/stacks.c             |    4 +-
 libguile/stime.c              |   31 +--
 libguile/strings.c            |  129 ++++++-----
 libguile/strings.h            |   23 ++-
 libguile/symbols.c            |  262 ++++++++++++--------
 libguile/symbols.h            |   16 ++
 libguile/throw.c              |    8 +-
 libguile/vm-i-loader.c        |    4 +-
 libguile/vm.c                 |   12 +-
 module/web/http.scm           |   20 +-
 module/web/uri.scm            |   72 +++---
 test-suite/tests/web-uri.test |   36 ++--
 32 files changed, 1059 insertions(+), 569 deletions(-)

diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index ea5cd46..0fab8d3 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -9,28 +9,31 @@
 @cindex WWW
 @cindex HTTP
 
-When Guile started back in the mid-nineties, the GNU system was still
-focused on producing a good POSIX implementation.  This is why Guile's
-POSIX support is good, and has been so for a while.
-
-But times change, and in a way these days the web is the new POSIX: a
-standard and a motley set of implementations on which much computing is
-done.  So today's Guile also supports the web at the programming
-language level, by defining common data types and operations for the
-technologies underpinning the web: URIs, HTTP, and XML.
-
-It is particularly important to define native web data types.  Though
-the web is text in motion, programming the web in text is like
-programming with @code{goto}: muddy, and error-prone.  Most current
-security problems on the web are due to treating the web as text instead
-of as instances of the proper data types.
-
-In addition, common web data types help programmers to share code.
-
-Well.  That's all very nice and opinionated and such, but how do I use
-the thing?  Read on!
+It has always been possible to connect computers together and share
+information between them, but the rise of the World-Wide Web over the
+last couple of decades has made it much easier to do so.  The result is
+a richly connected network of computation, in which Guile forms a part.
+
+By ``the web'', we mean the HTTP address@hidden, the P is for
+protocol, but this phrase appears repeatedly in RFC 2616.} as handled by
+servers, clients, proxies, caches, and the various kinds of messages and
+message components that can be sent and received by that protocol,
+notably HTML.
+
+On one level, the web is text in motion: the protocols themselves are
+textual (though the payload may be binary), and it's possible to create
+a socket and speak text to the web.  But such an approach is obviously
+primitive.  This section details the higher-level data types and
+operations provided by Guile: URIs, HTTP request and response records,
+and a conventional web server implementation.
+
+The material in this section is arranged in ascending order, in which
+later concepts build on previous ones.  If you prefer to start with the
+highest-level perspective, @pxref{Web Examples}, and work your way
+back.
 
 @menu
+* Types and the Web::           Types prevent bugs and security problems.
 * URIs::                        Universal Resource Identifiers.
 * HTTP::                        The Hyper-Text Transfer Protocol.
 * HTTP Headers::                How Guile represents specific header values.
@@ -40,6 +43,125 @@ the thing?  Read on!
 * Web Examples::                How to use this thing.
 @end menu
 
address@hidden Types and the Web
address@hidden Types and the Web
+
+It is a truth universally acknowledged, that a program with good use of
+data types, will be free from many common bugs.  Unfortunately, the
+common practice in web programming seems to ignore this maxim.  This
+subsection makes the case for expressive data types in web programming.
+
+By ``expressive data types'', we mean that the data types @emph{say}
+something about how a program solves a problem.  For example, if we
+choose to represent dates using SRFI 19 date records (@pxref{SRFI-19}),
+this indicates that there is a part of the program that will always have
+valid dates.  Error handling for a number of basic cases, like invalid
+dates, occurs on the boundary in which we produce a SRFI 19 date record
+from other types, like strings.
+
+With regards to the web, data types are help in the two broad phases of
+HTTP messages: parsing and generation.
+
+Consider a server, which has to parse a request, and produce a response.
+Guile will parse the request into an HTTP request object
+(@pxref{Requests}), with each header parsed into an appropriate Scheme
+data type.  This transition from an incoming stream of characters to
+typed data is a state change in a program---the strings might parse, or
+they might not, and something has to happen if they do not.  (Guile
+throws an error in this case.)  But after you have the parsed request,
+``client'' code (code built on top of the Guile web framework) will not
+have to check for syntactic validity.  The types already make this
+information manifest.
+
+This state change on the parsing boundary makes programs more robust,
+as they themselves are freed from the need to do a number of common
+error checks, and they can use normal Scheme procedures to handle a
+request instead of ad-hoc string parsers.
+
+The need for types on the response generation side (in a server) is more
+subtle, though not less important.  Consider the example of a POST
+handler, which prints out the text that a user submits from a form.
+Such a handler might include a procedure like this:
+
address@hidden
+;; First, a helper procedure
+(define (para . contents)
+  (string-append "<p>" (string-concatenate contents) "</p>"))
+
+;; Now the meat of our simple web application
+(define (you-said text)
+  (para "You said: " text))
+
+(display (you-said "Hi!"))
address@hidden <p>You said: Hi!</p>
address@hidden example
+
+This is a perfectly valid implementation, provided that the incoming
+text does not contain the special HTML characters @samp{<}, @samp{>}, or
address@hidden&}.  But this provision of a restricted character set is not
+reflected anywhere in the program itself: we must @emph{assume} that the
+programmer understands this, and performs the check elsewhere.
+
+Unfortunately, the short history of the practice of programming does not
+bear out this assumption.  A @dfn{cross-site scripting} (@acronym{XSS})
+vulnerability is just such a common error in which unfiltered user input
+is allowed into the output.  A user could submit a crafted comment to
+your web site which results in visitors running malicious Javascript,
+within the security context of your domain:
+
address@hidden
+(display (you-said "<script src=\"http://bad.com/nasty.js\"; />"))
address@hidden <p>You said: <script src="http://bad.com/nasty.js"; /></p>
address@hidden example
+
+The fundamental problem here is that both user data and the program
+template are represented using strings.  This identity means that types
+can't help the programmer to make a distinction between these two, so
+they get confused.
+
+There are a number of possible solutions, but perhaps the best is to
+treat HTML not as strings, but as native s-expressions: as SXML.  The
+basic idea is that HTML is either text, represented by a string, or an
+element, represented as a tagged list.  So @samp{foo} becomes
address@hidden"foo"}, and @samp{<b>foo</b>} becomes @samp{(b "foo")}.
+Attributes, if present, go in a tagged list headed by @samp{@@}, like
address@hidden(img (@@ (src "http://example.com/foo.png";)))}.  @xref{sxml
+simple}, for more information.
+
+The good thing about SXML is that HTML elements cannot be confused with
+text.  Let's make a new definition of @code{para}:
+
address@hidden
+(define (para . contents)
+  `(p ,@@contents))
+
+(use-modules (sxml simple))
+(sxml->xml (you-said "Hi!"))
address@hidden <p>You said: Hi!</p>
+
+(sxml->xml (you-said "<i>Rats, foiled again!</i>"))
address@hidden <p>You said: &lt;i&gt;Rats, foiled again!&lt;/i&gt;</p>
address@hidden example
+
+So we see in the second example that HTML elements cannot be unwittingly
+introduced into the output.  However it is now perfectly acceptable to
+pass SXML to @code{you-said}; in fact, that is the big advantage of SXML
+over everything-as-a-string.
+
address@hidden
+(sxml->xml (you-said (you-said "<Hi!>")))
address@hidden <p>You said: <p>You said: &lt;Hi!&gt;</p></p>
address@hidden example
+
+The SXML types allow procedures to @emph{compose}.  The types make
+manifest which parts are HTML elements, and which are text.  So you
+needn't worry about escaping user input; the type transition back to a
+string handles that for you.  @acronym{XSS} vulnerabilities are a thing
+of the past.
+
+Well.  That's all very nice and opinionated and such, but how do I use
+the thing?  Read on!
+
 @node URIs
 @subsection Universal Resource Identifiers
 
@@ -53,23 +175,26 @@ URI := scheme ":" ["//" [userinfo "@@"] host [":" port]] 
path \
        [ "?" query ] [ "#" fragment ]
 @end example
 
-So, all URIs have a scheme and a path. Some URIs have a host, and some
-of those have ports and userinfo. Any URI might have a query part or a
-fragment.
+For example, in the URI, @indicateurl{http://www.gnu.org/help/}, the
+scheme is @code{http}, the host is @code{www.gnu.org}, the path is
address@hidden/help/}, and there is no userinfo, port, query, or path.  All URIs
+have a scheme and a path (though the path might be empty).  Some URIs
+have a host, and some of those have ports and userinfo.  Any URI might
+have a query part or a fragment.
 
 Userinfo is something of an abstraction, as some legacy URI schemes
-allowed userinfo of the form @address@hidden:@var{passwd}}.
-Passwords don't belong in URIs, so the RFC does not want to condone
-this, but neither can it say that what is before the @code{@@} sign is
-just a username, so the RFC punts on the issue and calls it
+allowed userinfo of the form @address@hidden:@var{passwd}}.  But
+since passwords do not belong in URIs, the RFC does not want to condone
+this practice, so it calls anything before the @code{@@} sign
 @dfn{userinfo}.
 
-Also, strictly speaking, a URI with a fragment is a @dfn{URI
-reference}.  A fragment is typically not serialized when sending a URI
-over the wire; that is, it is not part of the identifier of a resource.
-It only identifies a part of a given resource.  But it's useful to have
-a field for it in the URI record itself, so we hope you will forgive the
-inconsistency.
+Properly speaking, a fragment is not part of a URI.  For example, when a
+web browser follows a link to @indicateurl{http://example.com/#foo}, it
+sends a request for @indicateurl{http://example.com/}, then looks in the
+resulting page for the fragment identified @code{foo} reference.  A
+fragment identifies a part of a resource, not the resource itself.  But
+it is useful to have a fragment field in the URI record itself, so we
+hope you will forgive the inconsistency.
 
 @example
 (use-modules (web uri))
@@ -79,9 +204,13 @@ The following procedures can be found in the @code{(web 
uri)}
 module. Load it into your Guile, using a form like the above, to have
 access to them.
 
address@hidden build-uri scheme [#:userinfo] [#:host] [#:port] [#:path] 
[#:query] [#:fragment] [#:validate?]
-Construct a URI object. If @var{validate?} is true, also run some
-consistency checks to make sure that the constructed URI is valid.
address@hidden build-uri scheme [#:address@hidden [#:address@hidden @
+       [#:address@hidden [#:address@hidden""}] [#:address@hidden @
+       [#:address@hidden [#:address@hidden
+Construct a URI object.  @var{scheme} should be a symbol, and the rest
+of the fields are either strings or @code{#f}.  If @var{validate?} is
+true, also run some consistency checks to make sure that the constructed
+URI is valid.
 @end defun
 
 @defun uri? x
@@ -92,45 +221,58 @@ consistency checks to make sure that the constructed URI 
is valid.
 @defunx uri-path uri
 @defunx uri-query uri
 @defunx uri-fragment uri
-A predicate and field accessors for the URI record type.
+A predicate and field accessors for the URI record type.  The URI scheme
+will be a symbol, and the rest either strings or @code{#f} if not
+present.
 @end defun
 
address@hidden declare-default-port! scheme port
-Declare a default port for the given URI scheme.
-
-Default ports are for printing URI objects: a default port is not
-printed.
address@hidden string->uri string
+Parse @var{string} into a URI object.  Return @code{#f} if the string
+could not be parsed.
 @end defun
 
address@hidden parse-uri string
-Parse @var{string} into a URI object. Returns @code{#f} if the string
-could not be parsed.
address@hidden uri->string uri
+Serialize @var{uri} to a string.  If the URI has a port that is the
+default port for its scheme, the port is not included in the
+serialization.
 @end defun
 
address@hidden unparse-uri uri
-Serialize @var{uri} to a string.
address@hidden declare-default-port! scheme port
+Declare a default port for the given URI scheme.
 @end defun
 
address@hidden uri-decode str [#:charset]
-Percent-decode the given @var{str}, according to @var{charset}.
address@hidden uri-decode str [#:address@hidden"utf-8"}]
+Percent-decode the given @var{str}, according to @var{encoding}, which
+should be the name of a character encoding.
 
 Note that this function should not generally be applied to a full URI
 string. For paths, use split-and-decode-uri-path instead. For query
 strings, split the query on @code{&} and @code{=} boundaries, and decode
 the components separately.
 
-Note that percent-encoded strings encode @emph{bytes}, not characters.
-There is no guarantee that a given byte sequence is a valid string
-encoding. Therefore this routine may signal an error if the decoded
-bytes are not valid for the given encoding. Pass @code{#f} for
address@hidden if you want decoded bytes as a bytevector directly.
+Note also that percent-encoded strings encode @emph{bytes}, not
+characters.  There is no guarantee that a given byte sequence is a valid
+string encoding. Therefore this routine may signal an error if the
+decoded bytes are not valid for the given encoding. Pass @code{#f} for
address@hidden if you want decoded bytes as a bytevector directly.
address@hidden, @code{set-port-encoding!}}, for more information on
+character encodings.
+
+Returns a string of the decoded characters, or a bytevector if
address@hidden was @code{#f}.
 @end defun
 
address@hidden uri-encode str [#:charset] [#:unescaped-chars]
-Percent-encode any character not in @var{unescaped-chars}.
+Fixme: clarify return type. indicate default values. type of
+unescaped-chars.
 
-Percent-encoding first writes out the given character to a bytevector
-within the given @var{charset}, then encodes each byte as
address@hidden uri-encode str [#:address@hidden"utf-8"}] [#:unescaped-chars]
+Percent-encode any character not in the character set,
address@hidden
+
+The default character set includes alphanumerics from ASCII, as well as
+the special characters @samp{-}, @samp{.}, @samp{_}, and @samp{~}.  Any
+other character will be percent-encoded, by writing out the character to
+a bytevector within the given @var{encoding}, then encoding each byte as
 @address@hidden, where @var{HH} is the hexadecimal representation of
 the byte.
 @end defun
@@ -139,13 +281,16 @@ the byte.
 Split @var{path} into its components, and decode each component,
 removing empty components.
 
-For example, @code{"/foo/bar/"} decodes to the two-element list,
address@hidden("foo" "bar")}.
+For example, @code{"/foo/bar%20baz/"} decodes to the two-element list,
address@hidden("foo" "bar baz")}.
 @end defun
 
 @defun encode-and-join-uri-path parts
 URI-encode each element of @var{parts}, which should be a list of
 strings, and join the parts together with @code{/} as a delimiter.
+
+For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes
+as @code{"scrambled%20eggs/biscuits%26gravy"}.
 @end defun
 
 @node HTTP
@@ -207,11 +352,36 @@ A writer, which writes a value to the port given in the 
second argument.
 @end table
 @end defun
 
address@hidden declare-header! sym name [#:multiple?] [#:parser] [#:validator] 
[#:writer]
address@hidden declare-header! sym name [#:address@hidden [#:parser] 
[#:validator] [#:writer]
 Make a header declaration, as above, and register it by symbol and by
-name.
+name. The @var{parser}, @var{validator}, and @var{writer} arguments are
+all mandatory.
 @end defun
 
+For example, let's say you are running a web server behind some sort of
+proxy, and your proxy adds an @code{X-Client-Address} header, indicating
+the IPv4 address of the original client.  You would like for the HTTP
+request record to parse out this header to a Scheme value, instead of
+leaving it as a string.  You could register this header with Guile's
+HTTP stack like this:
+
address@hidden
+(define (parse-ip str)
+  (inet-aton str)
+(define (validate-ip ip)
+(define (write-ip ip port)
+  (display (inet-ntoa ip) port))
+
+(declare-header! 'x-client-address
+  "X-Client-Address"
+  #:parser    (lambda (str)
+                (inet-aton str))
+  #:validator (lambda (ip)
+                (and (integer? ip) (exact? ip) (<= 0 ip 4294967295)))
+  #:writer    (lambda (ip port)
+                (display (inet-ntoa ip) port)))
address@hidden example
+
 @defun lookup-header-decl name
 Return the @var{header-decl} object registered for the given @var{name}.
 
@@ -220,27 +390,27 @@ a case-insensitive fashion.
 @end defun
 
 @defun valid-header? sym val
-Returns a true value iff @var{val} is a valid Scheme value for the
-header with name @var{sym}.
+Return a true value iff @var{val} is a valid Scheme value for the header
+with name @var{sym}.
 @end defun
 
 Now that we have a generic interface for reading and writing headers, we
 do just that.
 
 @defun read-header port
-Reads one HTTP header from @var{port}. Returns two values: the header
+Read one HTTP header from @var{port}. Return two values: the header
 name and the parsed Scheme value. May raise an exception if the header
 was known but the value was invalid.
 
-Returns @var{#f} for both values if the end of the message body was
-reached (i.e., a blank line).
+Returns the end-of-file object for both values if the end of the message
+body was reached (i.e., a blank line).
 @end defun
 
 @defun parse-header name val
 Parse @var{val}, a string, with the parser for the header named
 @var{name}.
 
-Returns two values, the header name and parsed value. If a parser was
+Return two values, the header name and parsed value. If a parser was
 found, the header name will be returned as a symbol. If a parser was not
 found, both the header name and the value are returned as strings.
 @end defun
@@ -252,8 +422,8 @@ value is written using @var{display}.
 @end defun
 
 @defun read-headers port
-Read an HTTP message from @var{port}, returning the headers as an
-ordered alist.
+Read the headers of an HTTP message from @var{port}, returning the
+headers as an ordered alist.
 @end defun
 
 @defun write-headers headers port
@@ -302,14 +472,49 @@ Write the first line of an HTTP response to @var{port}.
 @node HTTP Headers
 @subsection HTTP Headers
 
-The @code{(web http)} module defines parsers and unparsers for all
-headers defined in the HTTP/1.1 standard.  This section describes the
+In addition to defining the infrastructure to parse headers, the
address@hidden(web http)} module defines specific parsers and unparsers for all
+headers defined in the HTTP/1.1 standard.
+
+For example, if you receive a header named @samp{Accept-Language} with a
+value @samp{en, es;q=0.8}, Guile parses it as follows:
+
address@hidden
+(parse-header "Accept-Language" "en, es;q=0.8")
address@hidden accept-language
address@hidden ((1000 . "en") (800 . "es"))
address@hidden example
+
+There are two results, because @code{parse-header} returns two
+values.  The first value is a symbol, because the @code{accept-language}
+header is known to Guile and has a parser registered.  The format of the
+value for @code{accept-language} headers is defined below, along with
+all other headers defined in the HTTP standard.  (If the header were not
+recognized, it and the value would be returned as strings.)
+
+For brevity, the header definitions below are given in the form,
address@hidden @address@hidden, indicating that values for the header
address@hidden@var{name}} will be of the given @var{type}.  A short description
+of the each header's purpose and an example follow.  For full details on
+the meanings of all of these headers, see the HTTP 1.1 standard, RFC
+2616.
+
address@hidden HTTP Header Types
address@hidden {HTTP Header Type} Date
+foo
address@hidden deftp
+
+So for example if you are implementing a
+
+This section describes the
 parsed format of the various headers.
 
 We cannot describe the function of all of these headers, however, in
 sufficient detail.  The interested reader would do well to download a
 copy of RFC 2616 and have it on hand.
 
+example? and examples in each, and brief meaning description.
+
 To begin with, we should make a few definitions:
 
 @table @dfn
@@ -321,11 +526,6 @@ which is the symbol or string key, and the cdr is the 
parsed value.
 Parsed values for known keys have key-dependent formats.  Parsed values
 for unknown keys are strings.
 
address@hidden param list
-A param list is a list of key-value lists.  When serialized to a string,
-items in the inner lists are separated by semicolons.  Again, known keys
-are parsed to symbols.
-
 @item quality
 A number of headers have quality values in them, which are decimal
 fractions between zero and one indicating a preference for various kinds
@@ -345,8 +545,7 @@ true iff the entity tag is a ``strong'' entity tag.
 
 @subsubsection General Headers
 
address@hidden @code
address@hidden cache-control
address@hidden {HTTP Header} KVList cache-control
 A key-value list of cache-control directives. Known keys are
 @code{max-age}, @code{max-stale}, @code{min-fresh},
 @code{must-revalidate}, @code{no-cache}, @code{no-store},
@@ -360,68 +559,82 @@ integers.
 If present, parameters to @code{private} and @code{no-cache} are parsed
 as lists of header names, represented as symbols if they are known
 headers or strings otherwise.
address@hidden deftypevr
 
address@hidden connection
address@hidden {HTTP Header} @i{List of Strings} connection
 A list of connection tokens.  A connection token is a string.
address@hidden deftypevr
 
address@hidden date
address@hidden {HTTP Header} {date} date
 A SRFI-19 date record.
address@hidden deftypevr
 
address@hidden pragma
address@hidden {HTTP Header} {Key-Value List} pragma
 A key-value list of pragma directives.  @code{no-cache} is the only
 known key.
address@hidden deftypevr
 
address@hidden trailer
address@hidden {HTTP Header} {tp} trailer
 A list of header names.  Known header names are parsed to symbols,
 otherwise they are left as strings.
address@hidden deftypevr
 
address@hidden transfer-encoding
address@hidden {HTTP Header} {tp} transfer-encoding
 A param list of transfer codings.  @code{chunked} is the only known key.
address@hidden deftypevr
 
address@hidden upgrade
address@hidden {HTTP Header} {tp} upgrade
 A list of strings.
address@hidden deftypevr
 
address@hidden via
address@hidden {HTTP Header} {tp} via
 A list of strings.  There may be multiple @code{via} headers in ne
 message.
address@hidden deftypevr
 
address@hidden warning
address@hidden {HTTP Header} {tp} warning
 A list of warnings.  Each warning is a itself a list of four elements: a
 code, as an exact integer between 0 and 1000, a host as a string, the
 warning text as a string, and either @code{#f} or a SRFI-19 date.
 
 There may be multiple @code{warning} headers in one message.
address@hidden table
address@hidden deftypevr
 
 
 @subsubsection Entity Headers
 
address@hidden @code
address@hidden allow
address@hidden {HTTP Header} {tp} allow
 A list of methods, as strings.  Methods are parsed as strings instead of
 @code{parse-http-method} so as to allow for new methods.
address@hidden deftypevr
 
address@hidden content-encoding
address@hidden {HTTP Header} {tp} content-encoding
 A list of content codings, as strings.
address@hidden deftypevr
 
address@hidden content-language
address@hidden {HTTP Header} {tp} content-language
 A list of language tags, as strings.
address@hidden deftypevr
 
address@hidden content-length
address@hidden {HTTP Header} {tp} content-length
 An exact, non-negative integer.
address@hidden deftypevr
 
address@hidden content-location
address@hidden {HTTP Header} {tp} content-location
 A URI record.
address@hidden deftypevr
 
address@hidden content-md5
address@hidden {HTTP Header} {tp} content-md5
 A string.
address@hidden deftypevr
 
address@hidden content-range
address@hidden {HTTP Header} {tp} content-range
 A list of three elements: the symbol @code{bytes}, either the symbol
 @code{*} or a pair of integers, indicating the byte rage, and either
 @code{*} or an integer, for the instance length.
address@hidden deftypevr
 
address@hidden content-type
address@hidden {HTTP Header} {tp} content-type
 A pair, the car of which is the media type as a string, and the cdr is
 an alist of parameters, with strings as keys and values.
 
@@ -429,116 +642,144 @@ For example, @code{"text/plain"} parses as 
@code{("text/plain")}, and
 @code{"text/plain;charset=utf-8"} parses as @code{("text/plain"
 ("charset" . "utf-8"))}.
 
address@hidden expires
-A SRFI-19 date.
+note charset and encoding
address@hidden deftypevr
 
address@hidden last-modified
address@hidden {HTTP Header} {tp} expires
 A SRFI-19 date.
address@hidden deftypevr
 
address@hidden table
address@hidden {HTTP Header} {tp} last-modified
+A SRFI-19 date.
address@hidden deftypevr
 
 
 @subsubsection Request Headers
 
address@hidden @code
address@hidden accept
address@hidden {HTTP Header} {tp} accept
 A param list.  Each element in the list indicates one media-range
 with accept-params.  They only known key is @code{q}, whose value is
 parsed as a quality value.
address@hidden deftypevr
 
address@hidden accept-charset
address@hidden {HTTP Header} {tp} accept-charset
 A quality-list of charsets, as strings.
 
address@hidden accept-encoding
+charset and encoding
address@hidden deftypevr
address@hidden {HTTP Header} {tp} accept-encoding
 A quality-list of content codings, as strings.
address@hidden deftypevr
 
address@hidden accept-language
address@hidden {HTTP Header} {tp} accept-language
 A quality-list of languages, as strings.
address@hidden deftypevr
 
address@hidden authorization
address@hidden {HTTP Header} {tp} authorization
 A string.
address@hidden deftypevr
 
address@hidden expect
address@hidden {HTTP Header} {tp} expect
 A param list of expectations.  The only known key is
 @code{100-continue}.
address@hidden deftypevr
 
address@hidden from
address@hidden {HTTP Header} {tp} from
 A string.
address@hidden deftypevr
 
address@hidden host
address@hidden {HTTP Header} {tp} host
 A pair of the host, as a string, and the port, as an integer. If no port
 is given, port is @code{#f}.
address@hidden deftypevr
 
address@hidden if-match
address@hidden {HTTP Header} {tp} if-match
 Either the symbol @code{*}, or a list of entity tags (see above).
address@hidden deftypevr
 
address@hidden if-modified-since
address@hidden {HTTP Header} {tp} if-modified-since
 A SRFI-19 date.
address@hidden deftypevr
 
address@hidden if-none-match
address@hidden {HTTP Header} {tp} if-none-match
 Either the symbol @code{*}, or a list of entity tags (see above).
address@hidden deftypevr
 
address@hidden if-range
address@hidden {HTTP Header} {tp} if-range
 Either an entity tag, or a SRFI-19 date.
address@hidden deftypevr
 
address@hidden if-unmodified-since
address@hidden {HTTP Header} {tp} if-unmodified-since
 A SRFI-19 date.
address@hidden deftypevr
 
address@hidden max-forwards
address@hidden {HTTP Header} {tp} max-forwards
 An exact non-negative integer.
address@hidden deftypevr
 
address@hidden proxy-authorization
address@hidden {HTTP Header} {tp} proxy-authorization
 A string.
address@hidden deftypevr
 
address@hidden range
address@hidden {HTTP Header} {tp} range
 A pair whose car is the symbol @code{bytes}, and whose cdr is a list of
 pairs. Each element of the cdr indicates a range; the car is the first
 byte position and the cdr is the last byte position, as integers, or
 @code{#f} if not given.
address@hidden deftypevr
 
address@hidden referer
address@hidden {HTTP Header} {tp} referer
 A URI.
address@hidden deftypevr
 
address@hidden te
address@hidden {HTTP Header} {tp} te
 A param list of transfer-codings.  The only known key is
 @code{trailers}.
address@hidden deftypevr
 
address@hidden user-agent
address@hidden {HTTP Header} {tp} user-agent
 A string.
address@hidden table
address@hidden deftypevr
 
 
 @subsubsection Response Headers
 
address@hidden @code
address@hidden accept-ranges
address@hidden {HTTP Header} {tp} accept-ranges
 A list of strings.
address@hidden deftypevr
 
address@hidden age
address@hidden {HTTP Header} {tp} age
 An exact, non-negative integer.
address@hidden deftypevr
 
address@hidden etag
address@hidden {HTTP Header} {tp} etag
 An entity tag.
address@hidden deftypevr
 
address@hidden location
address@hidden {HTTP Header} {tp} location
 A URI.
address@hidden deftypevr
 
address@hidden proxy-authenticate
address@hidden {HTTP Header} {tp} proxy-authenticate
 A string.
address@hidden deftypevr
 
address@hidden retry-after
address@hidden {HTTP Header} {tp} retry-after
 Either an exact, non-negative integer, or a SRFI-19 date.
address@hidden deftypevr
 
address@hidden server
address@hidden {HTTP Header} {tp} server
 A string.
address@hidden deftypevr
 
address@hidden vary
address@hidden {HTTP Header} {tp} vary
 Either the symbol @code{*}, or a list of headers, with known headers
 parsed to symbols.
address@hidden deftypevr
 
address@hidden www-authenticate
-A string.
address@hidden table
address@hidden {HTTP Header} {tp} www-authenticate
+A string. (FIXME)
address@hidden deftypevr
 
 
 @node Requests
@@ -553,6 +794,8 @@ the body is not part of the request, but the port is.  Once 
you have
 read a request, you may read the body separately, and likewise for
 writing requests.
 
+discussion of charsets and bytes and stuff.
+
 @defun build-request [#:method] [#:uri] [#:version] [#:headers] [#:port] 
[#:meta] [#:validate-headers?]
 Construct an HTTP request object. If @var{validate-headers?} is true,
 the headers are each run through their respective validators.
@@ -595,10 +838,12 @@ discussion of character sets in "HTTP Requests" in the 
manual, for more
 information.
 @end defun
 
+Fixme^
+
 @defun write-request r port
 Write the given HTTP request to @var{port}.
 
-Returns a new request, whose @code{request-port} will continue writing
+Return a new request, whose @code{request-port} will continue writing
 on @var{port}, perhaps using some transfer encoding.
 @end defun
 
@@ -607,7 +852,7 @@ Reads the request body from @var{r}, as a string.
 
 Assumes that the request port has ISO-8859-1 encoding, so that the
 number of characters to read is the same as the
address@hidden Returns @code{#f} if there was no request
address@hidden  Return @code{#f} if there was no request
 body.
 @end defun
 
@@ -617,7 +862,7 @@ corresponding to the HTTP request @var{r}.
 @end defun
 
 @defun read-request-body/bytevector r
-Reads the request body from @var{r}, as a bytevector. Returns @code{#f}
+Reads the request body from @var{r}, as a bytevector.  Return @code{#f}
 if there was no request body.
 @end defun
 
@@ -726,13 +971,14 @@ Construct an HTTP response object. If 
@var{validate-headers?} is true,
 the headers are each run through their respective validators.
 @end defun
 
+FIXME
 @defun extend-response r k v . additional
 Extend an HTTP response by setting additional HTTP headers @var{k},
address@hidden Returns a new HTTP response.
address@hidden  Return a new HTTP response.
 @end defun
 
 @defun adapt-response-version response version
-Adapt the given response to a different HTTP version. Returns a new HTTP
+Adapt the given response to a different HTTP version.  Return a new HTTP
 response.
 
 The idea is that many applications might just build a response for the
@@ -745,7 +991,7 @@ the version field.
 @defun write-response r port
 Write the given HTTP response to @var{port}.
 
-Returns a new response, whose @code{response-port} will continue writing
+Return a new response, whose @code{response-port} will continue writing
 on @var{port}, perhaps using some transfer encoding.
 @end defun
 
@@ -754,7 +1000,7 @@ Reads the response body from @var{r}, as a string.
 
 Assumes that the response port has ISO-8859-1 encoding, so that the
 number of characters to read is the same as the
address@hidden Returns @code{#f} if there was no
address@hidden Return @code{#f} if there was no
 response body.
 @end defun
 
@@ -764,7 +1010,7 @@ corresponding to the HTTP response @var{r}.
 @end defun
 
 @defun read-response-body/bytevector r
-Reads the response body from @var{r}, as a bytevector. Returns @code{#f}
+Read the response body from @var{r}, as a bytevector.  Return @code{#f}
 if there was no response body.
 @end defun
 
@@ -858,7 +1104,7 @@ A user-provided handler procedure is called, with the 
request
 and body as its arguments.  The handler should return two
 values: the response, as a @code{<response>} record from @code{(web
 response)}, and the response body as a string, bytevector, or
address@hidden if not present.  We also allow the reponse to be simply an
address@hidden if not present.  We also allow the response to be simply an
 alist of headers, in which case a default response object is
 constructed with those headers.
 
@@ -901,16 +1147,16 @@ that we don't expose the accessors for the various 
fields of a
 any access to the impl objects.
 
 @defun open-server impl open-params
-Open a server for the given implementation. Returns one value, the new
+Open a server for the given implementation.  Return one value, the new
 server object. The implementation's @code{open} procedure is applied to
 @var{open-params}, which should be a list.
 @end defun
 
 @defun read-client impl server
 Read a new client from @var{server}, by applying the implementation's
address@hidden procedure to the server. If successful, returns three
address@hidden procedure to the server.  If successful, return three
 values: an object corresponding to the client, a request object, and the
-request body. If any exception occurs, returns @code{#f} for all three
+request body. If any exception occurs, return @code{#f} for all three
 values.
 @end defun
 
@@ -962,7 +1208,7 @@ Given the procedures above, it is a small matter to make a 
web server:
 
 @defun serve-one-client handler impl server state
 Read one request from @var{server}, call @var{handler} on the request
-and body, and write the response to the client. Returns the new state
+and body, and write the response to the client.  Return the new state
 produced by the handler procedure.
 @end defun
 
@@ -990,6 +1236,8 @@ Additional return values are accumulated into a new 
@var{state}, which
 will be used for subsequent requests. In this way a handler can
 explicitly manage its state.
 
+FIXME: elide?
+
 The default server implementation is @code{http}, which accepts
 @var{open-params} like @code{(#:port 8081)}, among others. See "Web
 Server" in the manual, for more information.
@@ -1165,9 +1413,9 @@ Here we see the power of keyword arguments with default 
initializers. By
 the time the arguments are fully parsed, the @code{sxml} local variable
 will hold the templated SXML, ready for sending out to the client.
 
-Instead of returning the body as a string, here we give a procedure,
-which will be called by the web server to write out the response to the
-client.
+Also, instead of returning the body as a string, @code{respond} gives a
+procedure, which will be called by the web server to write out the
+response to the client.
 
 Now, a simple example using this responder, which lays out the incoming
 headers in an HTML table.
diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
index 30adbff..f014697 100644
--- a/libguile/bytevectors.c
+++ b/libguile/bytevectors.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2009, 2010 Free Software Foundation, Inc.
+/* Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -2221,9 +2221,9 @@ scm_bootstrap_bytevectors (void)
   scm_null_bytevector = make_bytevector (0, SCM_ARRAY_ELEMENT_TYPE_VU8);
 
 #ifdef WORDS_BIGENDIAN
-  scm_i_native_endianness = scm_from_locale_symbol ("big");
+  scm_i_native_endianness = scm_from_latin1_symbol ("big");
 #else
-  scm_i_native_endianness = scm_from_locale_symbol ("little");
+  scm_i_native_endianness = scm_from_latin1_symbol ("little");
 #endif
 
   scm_c_register_extension ("libguile-" SCM_EFFECTIVE_VERSION,
diff --git a/libguile/eval.c b/libguile/eval.c
index 414645f..7852178 100644
--- a/libguile/eval.c
+++ b/libguile/eval.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 
1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010
+/* Copyright (C) 
1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011
  * Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
@@ -162,14 +162,14 @@ static void error_used_before_defined (void)
 
 static void error_invalid_keyword (SCM proc)
 {
-  scm_error_scm (scm_from_locale_symbol ("keyword-argument-error"), proc,
+  scm_error_scm (scm_from_latin1_symbol ("keyword-argument-error"), proc,
                  scm_from_locale_string ("Invalid keyword"), SCM_EOL,
                  SCM_BOOL_F);
 }
 
 static void error_unrecognized_keyword (SCM proc)
 {
-  scm_error_scm (scm_from_locale_symbol ("keyword-argument-error"), proc,
+  scm_error_scm (scm_from_latin1_symbol ("keyword-argument-error"), proc,
                  scm_from_locale_string ("Unrecognized keyword"), SCM_EOL,
                  SCM_BOOL_F);
 }
@@ -1012,9 +1012,9 @@ boot_closure_print (SCM closure, SCM port, 
scm_print_state *pstate)
   scm_uintprint ((scm_t_bits)SCM2PTR (closure), 16, port);
   scm_putc (' ', port);
   args = scm_make_list (scm_from_int (BOOT_CLOSURE_NUM_REQUIRED_ARGS 
(closure)),
-                        scm_from_locale_symbol ("_"));
+                        scm_from_latin1_symbol ("_"));
   if (!BOOT_CLOSURE_IS_FIXED (closure) && BOOT_CLOSURE_HAS_REST_ARGS (closure))
-    args = scm_cons_star (scm_from_locale_symbol ("_"), args);
+    args = scm_cons_star (scm_from_latin1_symbol ("_"), args);
   /* FIXME: optionals and rests */
   scm_display (args, port);
   scm_putc ('>', port);
diff --git a/libguile/goops.c b/libguile/goops.c
index bc250fc..c597044 100644
--- a/libguile/goops.c
+++ b/libguile/goops.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1998,1999,2000,2001,2002,2003,2004,2008,2009,2010
+/* Copyright (C) 1998,1999,2000,2001,2002,2003,2004,2008,2009,2010,2011
  * Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
@@ -896,7 +896,7 @@ create_basic_classes (void)
 
   /**** <class> ****/
   SCM cs = scm_from_locale_string (SCM_CLASS_CLASS_LAYOUT);
-  SCM name = scm_from_locale_symbol ("<class>");
+  SCM name = scm_from_latin1_symbol ("<class>");
   scm_class_class = scm_make_vtable_vtable (cs, SCM_INUM0, SCM_EOL);
   SCM_SET_CLASS_FLAGS (scm_class_class, (SCM_CLASSF_GOOPS_OR_VALID
                                         | SCM_CLASSF_METACLASS));
@@ -918,14 +918,14 @@ create_basic_classes (void)
   DEFVAR(name, scm_class_class);
 
   /**** <top> ****/
-  name = scm_from_locale_symbol ("<top>");
+  name = scm_from_latin1_symbol ("<top>");
   scm_class_top = scm_basic_make_class (scm_class_class, name,
                                         SCM_EOL, SCM_EOL);
 
   DEFVAR(name, scm_class_top);
 
   /**** <object> ****/
-  name  = scm_from_locale_symbol ("<object>");
+  name  = scm_from_latin1_symbol ("<object>");
   scm_class_object = scm_basic_make_class (scm_class_class, name,
                                            scm_list_1 (scm_class_top), 
SCM_EOL);
 
@@ -1089,7 +1089,7 @@ SCM_DEFINE (scm_method_generic_function, 
"method-generic-function", 1, 0, 0,
 #define FUNC_NAME s_scm_method_generic_function
 {
   SCM_VALIDATE_METHOD (1, obj);
-  return scm_slot_ref (obj, scm_from_locale_symbol ("generic-function"));
+  return scm_slot_ref (obj, scm_from_latin1_symbol ("generic-function"));
 }
 #undef FUNC_NAME
 
@@ -1099,7 +1099,7 @@ SCM_DEFINE (scm_method_specializers, 
"method-specializers", 1, 0, 0,
 #define FUNC_NAME s_scm_method_specializers
 {
   SCM_VALIDATE_METHOD (1, obj);
-  return scm_slot_ref (obj, scm_from_locale_symbol ("specializers"));
+  return scm_slot_ref (obj, scm_from_latin1_symbol ("specializers"));
 }
 #undef FUNC_NAME
 
@@ -2158,7 +2158,7 @@ SCM_DEFINE (scm_make, "make",  0, 0, 1,
            scm_i_get_keyword (k_name,
                               args,
                               len - 1,
-                              scm_from_locale_symbol ("???"),
+                              scm_from_latin1_symbol ("???"),
                               FUNC_NAME));
          SCM_SET_SLOT (z, scm_si_direct_supers,
            scm_i_get_keyword (k_dsupers,
@@ -2286,26 +2286,26 @@ static void
 create_standard_classes (void)
 {
   SCM slots;
-  SCM method_slots = scm_list_n (scm_from_locale_symbol ("generic-function"),
-                                scm_from_locale_symbol ("specializers"),
+  SCM method_slots = scm_list_n (scm_from_latin1_symbol ("generic-function"),
+                                scm_from_latin1_symbol ("specializers"),
                                 sym_procedure,
-                                scm_from_locale_symbol ("formals"),
-                                scm_from_locale_symbol ("body"),
-                                scm_from_locale_symbol ("make-procedure"),
+                                scm_from_latin1_symbol ("formals"),
+                                scm_from_latin1_symbol ("body"),
+                                scm_from_latin1_symbol ("make-procedure"),
                                  SCM_UNDEFINED);
-  SCM amethod_slots = scm_list_1 (scm_list_3 (scm_from_locale_symbol 
("slot-definition"),
+  SCM amethod_slots = scm_list_1 (scm_list_3 (scm_from_latin1_symbol 
("slot-definition"),
                                              k_init_keyword,
                                              k_slot_definition));
-  SCM gf_slots = scm_list_4 (scm_from_locale_symbol ("methods"),
-                            scm_list_3 (scm_from_locale_symbol 
("n-specialized"),
+  SCM gf_slots = scm_list_4 (scm_from_latin1_symbol ("methods"),
+                            scm_list_3 (scm_from_latin1_symbol 
("n-specialized"),
                                         k_init_value,
                                         SCM_INUM0),
-                            scm_list_3 (scm_from_locale_symbol ("extended-by"),
+                            scm_list_3 (scm_from_latin1_symbol ("extended-by"),
                                         k_init_value,
                                         SCM_EOL),
-                             scm_from_locale_symbol ("effective-methods"));
+                             scm_from_latin1_symbol ("effective-methods"));
   SCM setter_slots = scm_list_1 (sym_setter);
-  SCM egf_slots = scm_list_1 (scm_list_3 (scm_from_locale_symbol ("extends"),
+  SCM egf_slots = scm_list_1 (scm_list_3 (scm_from_latin1_symbol ("extends"),
                                          k_init_value,
                                          SCM_EOL));
   /* Foreign class slot classes */
@@ -2745,7 +2745,7 @@ scm_init_goops_builtins (void)
   create_port_classes ();
 
   {
-    SCM name = scm_from_locale_symbol ("no-applicable-method");
+    SCM name = scm_from_latin1_symbol ("no-applicable-method");
     scm_no_applicable_method =
       scm_make (scm_list_3 (scm_class_generic, k_name, name));
     DEFVAR (name, scm_no_applicable_method);
diff --git a/libguile/hash.c b/libguile/hash.c
index 78a84a4..0dcd1c2 100644
--- a/libguile/hash.c
+++ b/libguile/hash.c
@@ -1,4 +1,4 @@
-/*     Copyright (C) 1995,1996,1997, 2000, 2001, 2003, 2004, 2006, 2008, 2009, 
2010 Free Software Foundation, Inc.
+/*     Copyright (C) 1995,1996,1997, 2000, 2001, 2003, 2004, 2006, 2008, 2009, 
2010, 2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -22,6 +22,12 @@
 # include <config.h>
 #endif
 
+#ifdef HAVE_WCHAR_H
+#include <wchar.h>
+#endif
+
+#include <unistr.h>
+
 #include "libguile/_scm.h"
 #include "libguile/chars.h"
 #include "libguile/ports.h"
@@ -64,6 +70,79 @@ scm_i_string_hash (SCM str)
   return h;
 }
 
+unsigned long 
+scm_i_locale_string_hash (const char *str, size_t len)
+{
+#ifdef HAVE_WCHAR_H
+  mbstate_t state;
+  wchar_t c;
+  size_t byte_idx = 0, nbytes;
+  unsigned long h = 0;
+
+  if (len == (size_t) -1)
+    len = strlen (str);
+
+  while ((nbytes = mbrtowc (&c, str + byte_idx, len - byte_idx, &state)) > 0)
+    {
+      if (nbytes >= (size_t) -2)
+        /* Invalid input string; punt.  */
+        return scm_i_string_hash (scm_from_locale_stringn (str, len));
+
+      h = (unsigned long) c + h * 37;
+      byte_idx += nbytes;
+    }
+
+  return h;
+#else
+  return scm_i_string_hash (scm_from_locale_stringn (str, len));
+#endif
+}
+
+unsigned long 
+scm_i_latin1_string_hash (const char *str, size_t len)
+{
+  const scm_t_uint8 *ustr = (const scm_t_uint8 *) str;
+  size_t i = 0;
+  unsigned long h = 0;
+  
+  if (len == (size_t) -1)
+    len = strlen (str);
+
+  for (; i < len; i++)
+    h = (unsigned long) ustr[i] + h * 37;
+
+  return h;
+}
+
+unsigned long 
+scm_i_utf8_string_hash (const char *str, size_t len)
+{
+  const scm_t_uint8 *ustr = (const scm_t_uint8 *) str;
+  size_t byte_idx = 0;
+  unsigned long h = 0;
+  
+  if (len == (size_t) -1)
+    len = strlen (str);
+
+  while (byte_idx < len)
+    {
+      ucs4_t c;
+      int nbytes;
+
+      nbytes = u8_mbtouc (&c, ustr + byte_idx, len - byte_idx);
+      if (nbytes == 0)
+        break;
+      else if (nbytes < 0)
+        /* Bad UTF-8; punt.  */
+        return scm_i_string_hash (scm_from_utf8_stringn (str, len));
+
+      h = (unsigned long) c + h * 37;
+      byte_idx += nbytes;
+    }
+
+  return h;
+}
+
 
 /* Dirk:FIXME:: why downcase for characters? (2x: scm_hasher, scm_ihashv) */
 /* Dirk:FIXME:: scm_hasher could be made static. */
@@ -130,7 +209,6 @@ scm_hasher(SCM obj, unsigned long n, size_t d)
       {
        unsigned long hash =
          scm_i_string_hash (obj) % n;
-       scm_remember_upto_here_1 (obj);
        return hash;
       }
     case scm_tc7_symbol:
diff --git a/libguile/hash.h b/libguile/hash.h
index 2ebc053..3077486 100644
--- a/libguile/hash.h
+++ b/libguile/hash.h
@@ -3,7 +3,7 @@
 #ifndef SCM_HASH_H
 #define SCM_HASH_H
 
-/* Copyright (C) 1995,1996,2000, 2006, 2008 Free Software Foundation, Inc.
+/* Copyright (C) 1995,1996,2000, 2006, 2008, 2011 Free Software Foundation, 
Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -28,6 +28,13 @@
 
 
 SCM_API unsigned long scm_string_hash (const unsigned char *str, size_t len);
+SCM_INTERNAL unsigned long scm_i_locale_string_hash (const char *str,
+                                                     size_t len);
+SCM_INTERNAL unsigned long scm_i_latin1_string_hash (const  char *str,
+                                                     size_t len);
+SCM_INTERNAL unsigned long scm_i_utf8_string_hash (const char *str,
+                                                   size_t len);
+
 SCM_INTERNAL unsigned long scm_i_string_hash (SCM str);
 SCM_API unsigned long scm_hasher (SCM obj, unsigned long n, size_t d);
 SCM_API unsigned long scm_ihashq (SCM obj, unsigned long n);
diff --git a/libguile/hashtab.c b/libguile/hashtab.c
index b7cc72b..ce155e9 100644
--- a/libguile/hashtab.c
+++ b/libguile/hashtab.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995,1996,1998,1999,2000,2001, 2003, 2004, 2006, 2008, 2009, 
2010 Free Software Foundation, Inc.
+/* Copyright (C) 1995,1996,1998,1999,2000,2001, 2003, 2004, 2006, 2008, 2009, 
2010, 2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -40,17 +40,6 @@
 
 
 
-/* NOTES
- *
- * 1. The current hash table implementation uses weak alist vectors
- *    (implementation in weaks.c) internally, but we do the scanning
- *    ourselves (in scan_weak_hashtables) because we need to update the
- *    hash table structure when items are dropped during GC.
- *
- * 2. All hash table operations still work on alist vectors.
- *
- */
-
 /* A hash table is a cell containing a vector of association lists.
  *
  * Growing or shrinking, with following rehashing, is triggered when
@@ -63,6 +52,9 @@
  * The implementation stores the upper and lower number of items which
  * trigger a resize in the hashtable object.
  *
+ * Weak hash tables use weak pairs in the bucket lists rather than
+ * normal pairs.
+ *
  * Possible hash table sizes (primes) are stored in the array
  * hashtable_size.
  */
@@ -70,10 +62,11 @@
 static unsigned long hashtable_size[] = {
   31, 61, 113, 223, 443, 883, 1759, 3517, 7027, 14051, 28099, 56197, 112363,
   224717, 449419, 898823, 1797641, 3595271, 7190537, 14381041
-#if 0
-  /* vectors are currently restricted to 2^24-1 = 16777215 elements. */
-  28762081, 57524111, 115048217, 230096423, 460192829
-  /* larger values can't be represented as INUMs */
+#if SIZEOF_SCM_T_BITS > 4
+  /* vector lengths are stored in the first word of vectors, shifted by
+     8 bits for the tc8, so for 32-bit we only get 2^24-1 = 16777215
+     elements.  But we allow a few more sizes for 64-bit. */
+  , 28762081, 57524111, 115048217, 230096423, 460192829
 #endif
 };
 
@@ -128,16 +121,6 @@ scm_fixup_weak_alist (SCM alist, size_t *removed_items)
 }
 
 
-/* Return true if OBJ is either a weak hash table or a weak alist vector (as
-   defined in `weaks.[ch]').
-   FIXME: We should eventually keep only weah hash tables.  Actually, the
-   procs in `weaks.c' already no longer return vectors.  */
-/* XXX: We assume that if OBJ is a vector, then it's a _weak_ alist vector.  */
-#define IS_WEAK_THING(_obj)                                    \
-  ((SCM_HASHTABLE_P (table) && (SCM_HASHTABLE_WEAK_P (table))) \
-   || (SCM_I_IS_VECTOR (table)))
-
-
 /* Packed arguments for `do_weak_bucket_fixup'.  */
 struct t_fixup_args
 {
@@ -212,7 +195,7 @@ weak_bucket_assoc (SCM table, SCM buckets, size_t 
bucket_index,
 
   scm_remember_upto_here_1 (strong_refs);
 
-  if (args.removed_items > 0 && SCM_HASHTABLE_P (table))
+  if (args.removed_items > 0)
     {
       /* Update TABLE's item count and optionally trigger a rehash.  */
       size_t remaining;
@@ -230,6 +213,37 @@ weak_bucket_assoc (SCM table, SCM buckets, size_t 
bucket_index,
 }
 
 
+/* Packed arguments for `weak_bucket_assoc_by_hash'.  */
+struct assoc_by_hash_data
+{
+  SCM alist;
+  SCM ret;
+  scm_t_hash_predicate_fn predicate;
+  void *closure;
+};
+
+/* See scm_hash_fn_get_handle_by_hash below.  */
+static void*
+weak_bucket_assoc_by_hash (void *args)
+{
+  struct assoc_by_hash_data *data = args;
+  SCM alist = data->alist;
+
+  for (; scm_is_pair (alist); alist = SCM_CDR (alist))
+    {
+      SCM pair = SCM_CAR (alist);
+      
+      if (!SCM_WEAK_PAIR_DELETED_P (pair)
+          && data->predicate (SCM_CAR (pair), data->closure))
+        {
+          data->ret = pair;
+          break;
+        }
+    }
+  return args;
+}
+        
+
 
 static SCM
 make_hash_table (int flags, unsigned long k, const char *func_name) 
@@ -494,21 +508,16 @@ scm_hash_fn_get_handle (SCM table, SCM obj,
   unsigned long k;
   SCM buckets, h;
 
-  if (SCM_HASHTABLE_P (table))
-    buckets = SCM_HASHTABLE_VECTOR (table);
-  else
-    {
-      SCM_VALIDATE_VECTOR (1, table);
-      buckets = table;
-    }
+  SCM_VALIDATE_HASHTABLE (SCM_ARG1, table);
+  buckets = SCM_HASHTABLE_VECTOR (table);
 
   if (SCM_SIMPLE_VECTOR_LENGTH (buckets) == 0)
     return SCM_BOOL_F;
   k = hash_fn (obj, SCM_SIMPLE_VECTOR_LENGTH (buckets), closure);
   if (k >= SCM_SIMPLE_VECTOR_LENGTH (buckets))
-    scm_out_of_range ("hash_fn_get_handle", scm_from_ulong (k));
+    scm_out_of_range (FUNC_NAME, scm_from_ulong (k));
 
-  if (IS_WEAK_THING (table))
+  if (SCM_HASHTABLE_WEAK_P (table))
     h = weak_bucket_assoc (table, buckets, k, hash_fn,
                           assoc_fn, obj, closure);
   else
@@ -519,6 +528,64 @@ scm_hash_fn_get_handle (SCM table, SCM obj,
 #undef FUNC_NAME
 
 
+/* This procedure implements three optimizations, with respect to the
+   raw get_handle():
+
+   1. For weak tables, it's assumed that calling the predicate in the
+      allocation lock is safe. In practice this means that the predicate
+      cannot call arbitrary scheme functions. 
+
+   2. We don't check for overflow / underflow and rehash.
+
+   3. We don't actually have to allocate a key -- instead we get the
+      hash value directly. This is useful for, for example, looking up
+      strings in the symbol table.
+ */
+SCM
+scm_hash_fn_get_handle_by_hash (SCM table, unsigned long raw_hash,
+                                scm_t_hash_predicate_fn predicate_fn,
+                                void *closure)
+#define FUNC_NAME "scm_hash_fn_ref_by_hash"
+{
+  unsigned long k;
+  SCM buckets, alist, h = SCM_BOOL_F;
+
+  SCM_VALIDATE_HASHTABLE (SCM_ARG1, table);
+  buckets = SCM_HASHTABLE_VECTOR (table);
+
+  if (SCM_SIMPLE_VECTOR_LENGTH (buckets) == 0)
+    return SCM_BOOL_F;
+
+  k = raw_hash % SCM_SIMPLE_VECTOR_LENGTH (buckets);
+  alist = SCM_SIMPLE_VECTOR_REF (buckets, k);
+
+  if (SCM_HASHTABLE_WEAK_P (table))
+    {
+      struct assoc_by_hash_data args;
+
+      args.alist = alist;
+      args.ret = SCM_BOOL_F;
+      args.predicate = predicate_fn;
+      args.closure = closure;
+      GC_call_with_alloc_lock (weak_bucket_assoc_by_hash, &args);
+      h = args.ret;
+    }
+  else
+    for (; scm_is_pair (alist); alist = SCM_CDR (alist))
+      {
+        SCM pair = SCM_CAR (alist);
+        if (predicate_fn (SCM_CAR (pair), closure))
+          {
+            h = pair;
+            break;
+          }
+      }
+
+  return h;
+}
+#undef FUNC_NAME
+
+
 SCM
 scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init,
                             scm_t_hash_fn hash_fn, scm_t_assoc_fn assoc_fn,
@@ -528,14 +595,9 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init,
   unsigned long k;
   SCM buckets, it;
 
-  if (SCM_HASHTABLE_P (table))
-    buckets = SCM_HASHTABLE_VECTOR (table);
-  else
-    {
-      SCM_ASSERT (scm_is_simple_vector (table),
-                 table, SCM_ARG1, "hash_fn_create_handle_x");
-      buckets = table;
-    }
+  SCM_VALIDATE_HASHTABLE (SCM_ARG1, table);
+  buckets = SCM_HASHTABLE_VECTOR (table);
+
   if (SCM_SIMPLE_VECTOR_LENGTH (buckets) == 0)
     SCM_MISC_ERROR ("void hashtable", SCM_EOL);
 
@@ -543,7 +605,7 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init,
   if (k >= SCM_SIMPLE_VECTOR_LENGTH (buckets))
     scm_out_of_range ("hash_fn_create_handle_x", scm_from_ulong (k));
 
-  if (IS_WEAK_THING (table))
+  if (SCM_HASHTABLE_WEAK_P (table))
     it = weak_bucket_assoc (table, buckets, k, hash_fn,
                            assoc_fn, obj, closure);
   else
@@ -563,7 +625,7 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init,
       */
       SCM handle, new_bucket;
 
-      if ((SCM_HASHTABLE_P (table)) && (SCM_HASHTABLE_WEAK_P (table)))
+      if (SCM_HASHTABLE_WEAK_P (table))
        {
          /* FIXME: We don't support weak alist vectors.  */
          /* Use a weak cell.  */
@@ -580,8 +642,7 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init,
 
       new_bucket = scm_cons (handle, SCM_EOL);
 
-      if (!scm_is_eq (table, buckets)
-         && !scm_is_eq (SCM_HASHTABLE_VECTOR (table), buckets))
+      if (!scm_is_eq (SCM_HASHTABLE_VECTOR (table), buckets))
        {
          buckets = SCM_HASHTABLE_VECTOR (table);
          k = hash_fn (obj, SCM_SIMPLE_VECTOR_LENGTH (buckets), closure);
@@ -590,18 +651,15 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init,
        }
       SCM_SETCDR (new_bucket, SCM_SIMPLE_VECTOR_REF (buckets, k));
       SCM_SIMPLE_VECTOR_SET (buckets, k, new_bucket);
-      if (!scm_is_eq (table, buckets))
-       {
-         /* Update element count and maybe rehash the table.  The
-            table might have too few entries here since weak hash
-            tables used with the hashx_* functions can not be
-            rehashed after GC.
-         */
-         SCM_HASHTABLE_INCREMENT (table);
-         if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table)
-             || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table))
-           scm_i_rehash (table, hash_fn, closure, FUNC_NAME);
-       }
+      /* Update element count and maybe rehash the table.  The
+         table might have too few entries here since weak hash
+         tables used with the hashx_* functions can not be
+         rehashed after GC.
+      */
+      SCM_HASHTABLE_INCREMENT (table);
+      if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table)
+          || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table))
+        scm_i_rehash (table, hash_fn, closure, FUNC_NAME);
       return SCM_CAR (new_bucket);
     }
 }
@@ -633,8 +691,7 @@ scm_hash_fn_set_x (SCM table, SCM obj, SCM val,
   it = scm_hash_fn_create_handle_x (table, obj, SCM_BOOL_F, hash_fn, assoc_fn, 
closure);
   SCM_SETCDR (it, val);
 
-  if (SCM_HASHTABLE_P (table) && SCM_HASHTABLE_WEAK_VALUE_P (table)
-      && SCM_NIMP (val))
+  if (SCM_HASHTABLE_WEAK_VALUE_P (table) && SCM_NIMP (val))
     /* IT is a weak-cdr pair.  Register a disappearing link from IT's
        cdr to VAL like `scm_weak_cdr_pair' does.  */
     SCM_I_REGISTER_DISAPPEARING_LINK ((void *) SCM_CDRLOC (it), SCM2PTR (val));
@@ -648,26 +705,23 @@ scm_hash_fn_remove_x (SCM table, SCM obj,
                      scm_t_hash_fn hash_fn,
                      scm_t_assoc_fn assoc_fn,
                       void *closure)
+#define FUNC_NAME "hash_fn_remove_x"
 {
   unsigned long k;
   SCM buckets, h;
 
-  if (SCM_HASHTABLE_P (table))
-    buckets = SCM_HASHTABLE_VECTOR (table);
-  else
-    {
-      SCM_ASSERT (scm_is_simple_vector (table), table,
-                 SCM_ARG1, "hash_fn_remove_x");
-      buckets = table;
-    }
+  SCM_VALIDATE_HASHTABLE (SCM_ARG1, table);
+
+  buckets = SCM_HASHTABLE_VECTOR (table);
+
   if (SCM_SIMPLE_VECTOR_LENGTH (buckets) == 0)
     return SCM_EOL;
 
   k = hash_fn (obj, SCM_SIMPLE_VECTOR_LENGTH (buckets), closure);
   if (k >= SCM_SIMPLE_VECTOR_LENGTH (buckets))
-    scm_out_of_range ("hash_fn_remove_x", scm_from_ulong (k));
+    scm_out_of_range (FUNC_NAME, scm_from_ulong (k));
 
-  if (IS_WEAK_THING (table))
+  if (SCM_HASHTABLE_WEAK_P (table))
     h = weak_bucket_assoc (table, buckets, k, hash_fn,
                           assoc_fn, obj, closure);
   else
@@ -677,28 +731,24 @@ scm_hash_fn_remove_x (SCM table, SCM obj,
     {
       SCM_SIMPLE_VECTOR_SET 
        (buckets, k, scm_delq_x (h, SCM_SIMPLE_VECTOR_REF (buckets, k)));
-      if (!scm_is_eq (table, buckets))
-       {
-         SCM_HASHTABLE_DECREMENT (table);
-         if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table))
-           scm_i_rehash (table, hash_fn, closure, "scm_hash_fn_remove_x");
-       }
+      SCM_HASHTABLE_DECREMENT (table);
+      if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table))
+        scm_i_rehash (table, hash_fn, closure, FUNC_NAME);
     }
   return h;
 }
+#undef FUNC_NAME
 
 SCM_DEFINE (scm_hash_clear_x, "hash-clear!", 1, 0, 0,
            (SCM table),
            "Remove all items from @var{table} (without triggering a resize).")
 #define FUNC_NAME s_scm_hash_clear_x
 {
-  if (SCM_HASHTABLE_P (table))
-    {
-      scm_vector_fill_x (SCM_HASHTABLE_VECTOR (table), SCM_EOL);
-      SCM_SET_HASHTABLE_N_ITEMS (table, 0);
-    }
-  else
-    scm_vector_fill_x (table, SCM_EOL);
+  SCM_VALIDATE_HASHTABLE (SCM_ARG1, table);
+
+  scm_vector_fill_x (SCM_HASHTABLE_VECTOR (table), SCM_EOL);
+  SCM_SET_HASHTABLE_N_ITEMS (table, 0);
+
   return SCM_UNSPECIFIED;
 }
 #undef FUNC_NAME
@@ -1202,7 +1252,7 @@ scm_internal_hash_fold (scm_t_hash_fold_fn fn, void 
*closure,
          if (!scm_is_pair (handle))
            scm_wrong_type_arg (s_scm_hash_fold, SCM_ARG3, buckets);
 
-         if (IS_WEAK_THING (table))
+         if (SCM_HASHTABLE_WEAK_P (table))
            {
              if (SCM_WEAK_PAIR_DELETED_P (handle))
                {
diff --git a/libguile/hashtab.h b/libguile/hashtab.h
index 75b60e9..3149946 100644
--- a/libguile/hashtab.h
+++ b/libguile/hashtab.h
@@ -70,6 +70,10 @@ typedef unsigned long (*scm_t_hash_fn) (SCM obj, unsigned 
long max,
    some equality predicate.  */
 typedef SCM (*scm_t_assoc_fn) (SCM obj, SCM alist, void *closure);
 
+/* Function that returns true if the given object is the one we are
+   looking for, for scm_hash_fn_ref_by_hash.  */
+typedef int (*scm_t_hash_predicate_fn) (SCM obj, void *closure);
+
 /* Function to fold over the entries of a hash table.  */
 typedef SCM (*scm_t_hash_fold_fn) (void *closure, SCM key, SCM value,
                                   SCM result);
@@ -110,6 +114,10 @@ SCM_API SCM scm_hash_fn_get_handle (SCM table, SCM obj,
                                    scm_t_hash_fn hash_fn,
                                    scm_t_assoc_fn assoc_fn,
                                    void *closure);
+SCM_INTERNAL
+SCM scm_hash_fn_get_handle_by_hash (SCM table, unsigned long raw_hash,
+                                    scm_t_hash_predicate_fn predicate_fn,
+                                    void *closure);
 SCM_API SCM scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init,
                                         scm_t_hash_fn hash_fn,
                                         scm_t_assoc_fn assoc_fn,
diff --git a/libguile/i18n.c b/libguile/i18n.c
index b091b68..14dc9b9 100644
--- a/libguile/i18n.c
+++ b/libguile/i18n.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
+/* Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, 
Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -1629,27 +1629,27 @@ SCM_DEFINE (scm_nl_langinfo, "nl-langinfo", 1, 1, 0,
          switch (*c_result)
            {
            case 0:
-             result = scm_from_locale_symbol ("parenthesize");
+             result = scm_from_latin1_symbol ("parenthesize");
              break;
 
            case 1:
-             result = scm_from_locale_symbol ("sign-before");
+             result = scm_from_latin1_symbol ("sign-before");
              break;
 
            case 2:
-             result = scm_from_locale_symbol ("sign-after");
+             result = scm_from_latin1_symbol ("sign-after");
              break;
 
            case 3:
-             result = scm_from_locale_symbol ("sign-before-currency-symbol");
+             result = scm_from_latin1_symbol ("sign-before-currency-symbol");
              break;
 
            case 4:
-             result = scm_from_locale_symbol ("sign-after-currency-symbol");
+             result = scm_from_latin1_symbol ("sign-after-currency-symbol");
              break;
 
            default:
-             result = scm_from_locale_symbol ("unspecified");
+             result = scm_from_latin1_symbol ("unspecified");
            }
          break;
 #endif
diff --git a/libguile/load.c b/libguile/load.c
index 9c12a60..cbf9dc0 100644
--- a/libguile/load.c
+++ b/libguile/load.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995,1996,1998,1999,2000,2001, 2004, 2006, 2008, 2009, 2010 
Free Software Foundation, Inc.
+/* Copyright (C) 1995,1996,1998,1999,2000,2001, 2004, 2006, 2008, 2009, 2010, 
2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -680,7 +680,7 @@ do_try_autocompile (void *data)
 
   comp_mod = scm_c_resolve_module ("system base compile");
   compile_file = scm_module_variable
-    (comp_mod, scm_from_locale_symbol ("compile-file"));
+    (comp_mod, scm_from_latin1_symbol ("compile-file"));
 
   if (scm_is_true (compile_file))
     {
diff --git a/libguile/memoize.c b/libguile/memoize.c
index b841f24..49d2948 100644
--- a/libguile/memoize.c
+++ b/libguile/memoize.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 
1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010
+/* Copyright (C) 
1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011
  * Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
@@ -496,8 +496,8 @@ static SCM m_apply (SCM proc, SCM arg, SCM rest)
       
       while (scm_is_pair (rest))
         {
-          tail = MAKMEMO_CALL (MAKMEMO_MOD_REF (scm_list_1 
(scm_from_locale_symbol ("guile")),
-                                                scm_from_locale_symbol 
("cons"),
+          tail = MAKMEMO_CALL (MAKMEMO_MOD_REF (scm_list_1 
(scm_from_latin1_symbol ("guile")),
+                                                scm_from_latin1_symbol 
("cons"),
                                                 SCM_BOOL_F),
                                2,
                                scm_list_2 (scm_car (rest), tail));
@@ -868,7 +868,7 @@ scm_init_memoize ()
 
 #include "libguile/memoize.x"
 
-  list_of_guile = scm_list_1 (scm_from_locale_symbol ("guile"));
+  list_of_guile = scm_list_1 (scm_from_latin1_symbol ("guile"));
 }
 
 /*
diff --git a/libguile/modules.c b/libguile/modules.c
index 43e2373..c4e08e5 100644
--- a/libguile/modules.c
+++ b/libguile/modules.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1998,2000,2001,2002,2003,2004,2006,2007,2008,2009,2010 Free 
Software Foundation, Inc.
+/* Copyright (C) 1998,2000,2001,2002,2003,2004,2006,2007,2008,2009,2010,2011 
Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -59,7 +59,7 @@ static SCM default_duplicate_binding_procedures_var;
 
 static SCM unbound_variable (const char *func, SCM sym)
 {
-  scm_error (scm_from_locale_symbol ("unbound-variable"), func,
+  scm_error (scm_from_latin1_symbol ("unbound-variable"), func,
              "Unbound variable: ~S", scm_list_1 (sym), SCM_BOOL_F);
 }
 
diff --git a/libguile/ports.c b/libguile/ports.c
index a9ba08e..5983ff2 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -2233,11 +2233,11 @@ SCM_DEFINE (scm_port_conversion_strategy, 
"port-conversion-strategy",
 
   h = scm_i_get_conversion_strategy (port);
   if (h == SCM_FAILED_CONVERSION_ERROR)
-    return scm_from_locale_symbol ("error");
+    return scm_from_latin1_symbol ("error");
   else if (h == SCM_FAILED_CONVERSION_QUESTION_MARK)
-    return scm_from_locale_symbol ("substitute");
+    return scm_from_latin1_symbol ("substitute");
   else if (h == SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE)
-    return scm_from_locale_symbol ("escape");
+    return scm_from_latin1_symbol ("escape");
   else
     abort ();
 
@@ -2275,14 +2275,14 @@ SCM_DEFINE (scm_set_port_conversion_strategy_x, 
"set-port-conversion-strategy!",
       SCM_VALIDATE_OPPORT (1, port);
     }
 
-  err = scm_from_locale_symbol ("error");
+  err = scm_from_latin1_symbol ("error");
   if (scm_is_true (scm_eqv_p (sym, err)))
     {
       scm_i_set_conversion_strategy_x (port, SCM_FAILED_CONVERSION_ERROR);
       return SCM_UNSPECIFIED;
     }
 
-  qm = scm_from_locale_symbol ("substitute");
+  qm = scm_from_latin1_symbol ("substitute");
   if (scm_is_true (scm_eqv_p (sym, qm)))
     {
       scm_i_set_conversion_strategy_x (port, 
@@ -2290,7 +2290,7 @@ SCM_DEFINE (scm_set_port_conversion_strategy_x, 
"set-port-conversion-strategy!",
       return SCM_UNSPECIFIED;
     }
 
-  esc = scm_from_locale_symbol ("escape");
+  esc = scm_from_latin1_symbol ("escape");
   if (scm_is_true (scm_eqv_p (sym, esc)))
     {
       scm_i_set_conversion_strategy_x (port,
diff --git a/libguile/print.c b/libguile/print.c
index f62378f..352ca94 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995-1999,2000,2001, 2002, 2003, 2004, 2006, 2008, 2009, 2010 
Free Software Foundation, Inc.
+/* Copyright (C) 1995-1999,2000,2001, 2002, 2003, 2004, 2006, 2008, 2009, 
2010, 2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -1363,9 +1363,9 @@ scm_init_print ()
 
   scm_init_opts (scm_print_options, scm_print_opts);
 
-  scm_print_options (scm_list_4 (scm_from_locale_symbol ("highlight-prefix"),
+  scm_print_options (scm_list_4 (scm_from_latin1_symbol ("highlight-prefix"),
                                 scm_from_locale_string ("{"),
-                                scm_from_locale_symbol ("highlight-suffix"),
+                                scm_from_latin1_symbol ("highlight-suffix"),
                                 scm_from_locale_string ("}")));
 
   scm_gc_register_root (&print_state_pool);
@@ -1374,7 +1374,7 @@ scm_init_print ()
   layout =
     scm_make_struct_layout (scm_from_locale_string (SCM_PRINT_STATE_LAYOUT));
   type = scm_make_struct (vtable, SCM_INUM0, scm_list_1 (layout));
-  scm_set_struct_vtable_name_x (type, scm_from_locale_symbol ("print-state"));
+  scm_set_struct_vtable_name_x (type, scm_from_latin1_symbol ("print-state"));
   scm_print_state_vtable = type;
 
   /* Don't want to bind a wrapper class in GOOPS, so pass 0 as arg1. */
diff --git a/libguile/procs.c b/libguile/procs.c
index c6fab72..dc5a320 100644
--- a/libguile/procs.c
+++ b/libguile/procs.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995,1996,1997,1999,2000,2001, 2006, 2008, 2009, 2010 Free 
Software Foundation, Inc.
+/* Copyright (C) 1995,1996,1997,1999,2000,2001, 2006, 2008, 2009, 2010, 2011 
Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -165,7 +165,7 @@ scm_init_procs ()
     scm_c_make_struct (scm_applicable_struct_with_setter_vtable_vtable,
                        0,
                        1,
-                       SCM_UNPACK (scm_from_locale_symbol ("pwpw")));
+                       SCM_UNPACK (scm_from_latin1_symbol ("pwpw")));
 
 #include "libguile/procs.x"
 }
diff --git a/libguile/programs.c b/libguile/programs.c
index 8b769a5..b84f84b 100644
--- a/libguile/programs.c
+++ b/libguile/programs.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2001, 2009, 2010 Free Software Foundation, Inc.
+/* Copyright (C) 2001, 2009, 2010, 2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -77,7 +77,7 @@ scm_i_program_print (SCM program, SCM port, scm_print_state 
*pstate)
   if (scm_is_false (write_program) && scm_module_system_booted_p)
     write_program = scm_module_local_variable
       (scm_c_resolve_module ("system vm program"),
-       scm_from_locale_symbol ("write-program"));
+       scm_from_latin1_symbol ("write-program"));
   
   if (SCM_PROGRAM_IS_CONTINUATION (program))
     {
diff --git a/libguile/read.c b/libguile/read.c
index 4a9b5ea..54384fa 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995,1996,1997,1999,2000,2001,2003, 2004, 2006, 2007, 2008, 
2009, 2010 Free Software
+/* Copyright (C) 1995,1996,1997,1999,2000,2001,2003, 2004, 2006, 2007, 2008, 
2009, 2010, 2011 Free Software
  * Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
@@ -111,7 +111,7 @@ scm_i_input_error (char const *function,
     
   string = scm_get_output_string (string_port);
   scm_close_output_port (string_port);
-  scm_error_scm (scm_from_locale_symbol ("read-error"),
+  scm_error_scm (scm_from_latin1_symbol ("read-error"),
                 function? scm_from_locale_string (function) : SCM_BOOL_F,
                 string,
                 arg,
diff --git a/libguile/regex-posix.c b/libguile/regex-posix.c
index 4c03577..3423099 100644
--- a/libguile/regex-posix.c
+++ b/libguile/regex-posix.c
@@ -1,4 +1,4 @@
-/*     Copyright (C) 1997, 1998, 1999, 2000, 2001, 2004, 2006, 2007, 2010 Free 
Software Foundation, Inc.
+/*     Copyright (C) 1997, 1998, 1999, 2000, 2001, 2004, 2006, 2007, 2010, 
2011 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -228,12 +228,12 @@ fixup_multibyte_match (regmatch_t *matches, int nmatches, 
char *str)
         }
 
       nbytes = mbrlen (str + byte_idx, MB_LEN_MAX, &state);
+      if (nbytes == (size_t) -2 || nbytes == (size_t) -1)
+        /* Something is wrong. Shouldn't be possible, as the regex match
+           succeeded.  */
+        abort ();
     }
 
-  if (nbytes >= (size_t) -2)
-    /* Something is wrong. Shouldn't be possible, as the regex match
-       succeeded.  */
-    abort ();
 }
 #endif
 
diff --git a/libguile/script.c b/libguile/script.c
index d61d8ca..b4dcd7b 100644
--- a/libguile/script.c
+++ b/libguile/script.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1994, 1995, 1996, 1997, 1998, 2000, 2001, 2002, 2003, 2004, 
2005, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
+/* Copyright (C) 1994, 1995, 1996, 1997, 1998, 2000, 2001, 2002, 2003, 2004, 
2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
  * as published by the Free Software Foundation; either version 3 of
@@ -803,9 +803,9 @@ scm_compile_shell_switches (int argc, char **argv)
 
     /* Wrap the expression in a prompt. */
     val = scm_list_2 (scm_list_3 (scm_sym_at,
-                                      scm_list_2 (scm_from_locale_symbol 
("ice-9"),
-                                                  scm_from_locale_symbol 
("control")),
-                                      scm_from_locale_symbol ("%")),
+                                      scm_list_2 (scm_from_latin1_symbol 
("ice-9"),
+                                                  scm_from_latin1_symbol 
("control")),
+                                      scm_from_latin1_symbol ("%")),
                       val);
 
 #if 0
diff --git a/libguile/srfi-14.c b/libguile/srfi-14.c
index 09fe90c..af106ed 100644
--- a/libguile/srfi-14.c
+++ b/libguile/srfi-14.c
@@ -1,6 +1,6 @@
 /* srfi-14.c --- SRFI-14 procedures for Guile
  *
- * Copyright (C) 2001, 2004, 2006, 2007, 2009 Free Software Foundation, Inc.
+ * Copyright (C) 2001, 2004, 2006, 2007, 2009, 2011 Free Software Foundation, 
Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -2036,9 +2036,9 @@ SCM_DEFINE (scm_sys_char_set_dump, "%char-set-dump", 1, 
0, 0, (SCM charset),
   SCM_VALIDATE_SMOB (1, charset, charset);
   cs = SCM_CHARSET_DATA (charset);
 
-  e1 = scm_cons (scm_from_locale_symbol ("char-set"),
+  e1 = scm_cons (scm_from_latin1_symbol ("char-set"),
                  charset);
-  e2 = scm_cons (scm_from_locale_symbol ("n"),
+  e2 = scm_cons (scm_from_latin1_symbol ("n"),
                  scm_from_size_t (cs->len));
 
   for (i = 0; i < cs->len; i++)
@@ -2059,7 +2059,7 @@ SCM_DEFINE (scm_sys_char_set_dump, "%char-set-dump", 1, 
0, 0, (SCM charset),
       ranges = scm_append (scm_list_2 (ranges,
                                        scm_list_1 (elt)));
     }
-  e3 = scm_cons (scm_from_locale_symbol ("ranges"),
+  e3 = scm_cons (scm_from_latin1_symbol ("ranges"),
                  ranges);
 
   return scm_list_3 (e1, e2, e3);
diff --git a/libguile/stacks.c b/libguile/stacks.c
index a7ebda0..267b3c4 100644
--- a/libguile/stacks.c
+++ b/libguile/stacks.c
@@ -1,5 +1,5 @@
 /* A stack holds a frame chain
- * Copyright (C) 1996,1997,2000,2001, 2006, 2007, 2008, 2009, 2010 Free 
Software Foundation
+ * Copyright (C) 1996,1997,2000,2001, 2006, 2007, 2008, 2009, 2010, 2011 Free 
Software Foundation
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -393,7 +393,7 @@ scm_init_stacks ()
   scm_stack_type = scm_make_vtable (scm_from_locale_string (SCM_STACK_LAYOUT),
                                     SCM_UNDEFINED);
   scm_set_struct_vtable_name_x (scm_stack_type,
-                               scm_from_locale_symbol ("stack"));
+                               scm_from_latin1_symbol ("stack"));
 #include "libguile/stacks.x"
 }
 
diff --git a/libguile/stime.c b/libguile/stime.c
index 07dedf3..78aa673 100644
--- a/libguile/stime.c
+++ b/libguile/stime.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995,1996,1997,1998,1999,2000,2001, 2003, 2004, 2005, 2006, 
2007, 2008, 2009 Free Software Foundation, Inc.
+/* Copyright (C) 1995,1996,1997,1998,1999,2000,2001, 2003, 2004, 2005, 2006, 
2007, 2008, 2009, 2011 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -625,11 +625,11 @@ SCM_DEFINE (scm_strftime, "strftime", 2, 0, 0,
 {
   struct tm t;
 
-  scm_t_uint8 *tbuf;
+  char *tbuf;
   int size = 50;
-  scm_t_uint8 *fmt;
-  scm_t_uint8 *myfmt;
-  int len;
+  char *fmt;
+  char *myfmt;
+  size_t len;
   SCM result;
 
   SCM_VALIDATE_STRING (1, format);
@@ -637,8 +637,7 @@ SCM_DEFINE (scm_strftime, "strftime", 2, 0, 0,
 
   /* Convert string to UTF-8 so that non-ASCII characters in the
      format are passed through unchanged.  */
-  fmt = scm_i_to_utf8_string (format);
-  len = strlen ((const char *) fmt);
+  fmt = scm_to_utf8_stringn (format, &len);
 
   /* Ugly hack: strftime can return 0 if its buffer is too small,
      but some valid time strings (e.g. "%p") can sometimes produce
@@ -647,7 +646,7 @@ SCM_DEFINE (scm_strftime, "strftime", 2, 0, 0,
      nonzero. */
   myfmt = scm_malloc (len+2);
   *myfmt = (scm_t_uint8) 'x';
-  strncpy ((char *) myfmt + 1, (const char *) fmt, len);
+  strncpy (myfmt + 1, fmt, len);
   myfmt[len + 1] = 0;
   scm_remember_upto_here_1 (format);
   free (fmt);
@@ -685,8 +684,7 @@ SCM_DEFINE (scm_strftime, "strftime", 2, 0, 0,
 
     /* Use `nstrftime ()' from Gnulib, which supports all GNU extensions
        supported by glibc.  */
-    while ((len = nstrftime ((char *) tbuf, size, 
-                            (const char *) myfmt, &t, 0, 0)) == 0)
+    while ((len = nstrftime (tbuf, size, myfmt, &t, 0, 0)) == 0)
       {
        free (tbuf);
        size *= 2;
@@ -702,7 +700,7 @@ SCM_DEFINE (scm_strftime, "strftime", 2, 0, 0,
 #endif
     }
 
-  result = scm_i_from_utf8_string ((const scm_t_uint8 *) tbuf + 1);
+  result = scm_from_utf8_string (tbuf + 1);
   free (tbuf);
   free (myfmt);
 #if HAVE_STRUCT_TM_TM_ZONE
@@ -728,7 +726,7 @@ SCM_DEFINE (scm_strptime, "strptime", 2, 0, 0,
 #define FUNC_NAME s_scm_strptime
 {
   struct tm t;
-  scm_t_uint8 *fmt, *str, *rest;
+  char *fmt, *str, *rest;
   size_t used_len;
   long zoff;
 
@@ -737,8 +735,8 @@ SCM_DEFINE (scm_strptime, "strptime", 2, 0, 0,
 
   /* Convert strings to UTF-8 so that non-ASCII characters are passed
      through unchanged.  */
-  fmt = scm_i_to_utf8_string (format);
-  str = scm_i_to_utf8_string (string);
+  fmt = scm_to_utf8_string (format);
+  str = scm_to_utf8_string (string);
 
   /* initialize the struct tm */
 #define tm_init(field) t.field = 0
@@ -760,8 +758,7 @@ SCM_DEFINE (scm_strptime, "strptime", 2, 0, 0,
      fields, hence the use of SCM_CRITICAL_SECTION_START.  */
   t.tm_isdst = -1;
   SCM_CRITICAL_SECTION_START;
-  rest = (scm_t_uint8 *) strptime ((const char *) str, 
-                                   (const char *) fmt, &t);
+  rest = strptime (str, fmt, &t);
   SCM_CRITICAL_SECTION_END;
   if (rest == NULL)
     {
@@ -784,7 +781,7 @@ SCM_DEFINE (scm_strptime, "strptime", 2, 0, 0,
 #endif
 
   /* Compute the number of UTF-8 characters.  */
-  used_len = u8_strnlen (str, rest-str);
+  used_len = u8_strnlen ((scm_t_uint8*) str, rest-str);
   scm_remember_upto_here_2 (format, string);
   free (str);
   free (fmt);
diff --git a/libguile/strings.c b/libguile/strings.c
index 71f0b52..4760f33 100644
--- a/libguile/strings.c
+++ b/libguile/strings.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995,1996,1998,2000,2001, 2004, 2006, 2008, 2009, 2010 Free 
Software Foundation, Inc.
+/* Copyright (C) 1995,1996,1998,2000,2001, 2004, 2006, 2008, 2009, 2010, 2011 
Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -857,31 +857,31 @@ SCM_DEFINE (scm_sys_string_dump, "%string-dump", 1, 0, 0, 
(SCM str),
   SCM_VALIDATE_STRING (1, str);
 
   /* String info */
-  e1 = scm_cons (scm_from_locale_symbol ("string"),
+  e1 = scm_cons (scm_from_latin1_symbol ("string"),
                  str);
-  e2 = scm_cons (scm_from_locale_symbol ("start"),
+  e2 = scm_cons (scm_from_latin1_symbol ("start"),
                  scm_from_size_t (STRING_START (str)));
-  e3 = scm_cons (scm_from_locale_symbol ("length"),
+  e3 = scm_cons (scm_from_latin1_symbol ("length"),
                  scm_from_size_t (STRING_LENGTH (str)));
 
   if (IS_SH_STRING (str))
     {
-      e4 = scm_cons (scm_from_locale_symbol ("shared"),
+      e4 = scm_cons (scm_from_latin1_symbol ("shared"),
                      SH_STRING_STRING (str));
       buf = STRING_STRINGBUF (SH_STRING_STRING (str));
     }
   else
     {
-      e4 = scm_cons (scm_from_locale_symbol ("shared"),
+      e4 = scm_cons (scm_from_latin1_symbol ("shared"),
                      SCM_BOOL_F);
       buf = STRING_STRINGBUF (str);
     }
 
   if (IS_RO_STRING (str))
-    e5 = scm_cons (scm_from_locale_symbol ("read-only"),
+    e5 = scm_cons (scm_from_latin1_symbol ("read-only"),
                    SCM_BOOL_T);
   else
-    e5 = scm_cons (scm_from_locale_symbol ("read-only"),
+    e5 = scm_cons (scm_from_latin1_symbol ("read-only"),
                    SCM_BOOL_F);
 
   /* Stringbuf info */
@@ -891,7 +891,7 @@ SCM_DEFINE (scm_sys_string_dump, "%string-dump", 1, 0, 0, 
(SCM str),
       char *cbuf;
       SCM sbc = scm_i_make_string (len, &cbuf);
       memcpy (cbuf, STRINGBUF_CHARS (buf), len);
-      e6 = scm_cons (scm_from_locale_symbol ("stringbuf-chars"),
+      e6 = scm_cons (scm_from_latin1_symbol ("stringbuf-chars"),
                      sbc);
     }
   else
@@ -901,22 +901,22 @@ SCM_DEFINE (scm_sys_string_dump, "%string-dump", 1, 0, 0, 
(SCM str),
       SCM sbc = scm_i_make_wide_string (len, &cbuf);
       u32_cpy ((scm_t_uint32 *) cbuf, 
                (scm_t_uint32 *) STRINGBUF_WIDE_CHARS (buf), len);
-      e6 = scm_cons (scm_from_locale_symbol ("stringbuf-chars"),
+      e6 = scm_cons (scm_from_latin1_symbol ("stringbuf-chars"),
                      sbc);
     }
-  e7 = scm_cons (scm_from_locale_symbol ("stringbuf-length"), 
+  e7 = scm_cons (scm_from_latin1_symbol ("stringbuf-length"), 
                  scm_from_size_t (STRINGBUF_LENGTH (buf)));
   if (STRINGBUF_SHARED (buf))
-    e8 = scm_cons (scm_from_locale_symbol ("stringbuf-shared"), 
+    e8 = scm_cons (scm_from_latin1_symbol ("stringbuf-shared"), 
                    SCM_BOOL_T);
   else
-    e8 = scm_cons (scm_from_locale_symbol ("stringbuf-shared"), 
+    e8 = scm_cons (scm_from_latin1_symbol ("stringbuf-shared"), 
                    SCM_BOOL_F);
   if (STRINGBUF_WIDE (buf))
-    e9 = scm_cons (scm_from_locale_symbol ("stringbuf-wide"),
+    e9 = scm_cons (scm_from_latin1_symbol ("stringbuf-wide"),
                   SCM_BOOL_T);
   else
-    e9 = scm_cons (scm_from_locale_symbol ("stringbuf-wide"),
+    e9 = scm_cons (scm_from_latin1_symbol ("stringbuf-wide"),
                   SCM_BOOL_F);
 
   return scm_list_n (e1, e2, e3, e4, e5, e6, e7, e8, e9, SCM_UNDEFINED);
@@ -949,11 +949,11 @@ SCM_DEFINE (scm_sys_symbol_dump, "%symbol-dump", 1, 0, 0, 
(SCM sym),
   SCM e1, e2, e3, e4, e5, e6, e7;
   SCM buf;
   SCM_VALIDATE_SYMBOL (1, sym);
-  e1 = scm_cons (scm_from_locale_symbol ("symbol"),
+  e1 = scm_cons (scm_from_latin1_symbol ("symbol"),
                  sym);
-  e2 = scm_cons (scm_from_locale_symbol ("hash"),
+  e2 = scm_cons (scm_from_latin1_symbol ("hash"),
                  scm_from_ulong (scm_i_symbol_hash (sym)));
-  e3 = scm_cons (scm_from_locale_symbol ("interned"),
+  e3 = scm_cons (scm_from_latin1_symbol ("interned"),
                  scm_symbol_interned_p (sym));
   buf = SYMBOL_STRINGBUF (sym);
 
@@ -964,7 +964,7 @@ SCM_DEFINE (scm_sys_symbol_dump, "%symbol-dump", 1, 0, 0, 
(SCM sym),
       char *cbuf;
       SCM sbc = scm_i_make_string (len, &cbuf);
       memcpy (cbuf, STRINGBUF_CHARS (buf), len);
-      e4 = scm_cons (scm_from_locale_symbol ("stringbuf-chars"),
+      e4 = scm_cons (scm_from_latin1_symbol ("stringbuf-chars"),
                      sbc);
     }
   else
@@ -974,22 +974,22 @@ SCM_DEFINE (scm_sys_symbol_dump, "%symbol-dump", 1, 0, 0, 
(SCM sym),
       SCM sbc = scm_i_make_wide_string (len, &cbuf);
       u32_cpy ((scm_t_uint32 *) cbuf, 
                (scm_t_uint32 *) STRINGBUF_WIDE_CHARS (buf), len);
-      e4 = scm_cons (scm_from_locale_symbol ("stringbuf-chars"),
+      e4 = scm_cons (scm_from_latin1_symbol ("stringbuf-chars"),
                      sbc);
     }
-  e5 = scm_cons (scm_from_locale_symbol ("stringbuf-length"), 
+  e5 = scm_cons (scm_from_latin1_symbol ("stringbuf-length"), 
                  scm_from_size_t (STRINGBUF_LENGTH (buf)));
   if (STRINGBUF_SHARED (buf))
-    e6 = scm_cons (scm_from_locale_symbol ("stringbuf-shared"), 
+    e6 = scm_cons (scm_from_latin1_symbol ("stringbuf-shared"), 
                    SCM_BOOL_T);
   else
-    e6 = scm_cons (scm_from_locale_symbol ("stringbuf-shared"), 
+    e6 = scm_cons (scm_from_latin1_symbol ("stringbuf-shared"), 
                    SCM_BOOL_F);
   if (STRINGBUF_WIDE (buf))
-    e7 = scm_cons (scm_from_locale_symbol ("stringbuf-wide"),
+    e7 = scm_cons (scm_from_latin1_symbol ("stringbuf-wide"),
                     SCM_BOOL_T);
   else
-    e7 = scm_cons (scm_from_locale_symbol ("stringbuf-wide"),
+    e7 = scm_cons (scm_from_latin1_symbol ("stringbuf-wide"),
                     SCM_BOOL_F);
   return scm_list_n (e1, e2, e3, e4, e5, e6, e7, SCM_UNDEFINED);
 
@@ -1437,8 +1437,13 @@ scm_from_stringn (const char *str, size_t len, const 
char *encoding,
   int wide = 0;
   SCM res;
 
+  /* The order of these checks is important. */
   if (len == 0)
     return scm_nullstr;
+  if (!str)
+    scm_misc_error ("scm_from_stringn", "NULL string pointer", SCM_EOL);
+  if (len == (size_t) -1)
+    len = strlen (str);
 
   if (encoding == NULL)
     {
@@ -1502,9 +1507,9 @@ scm_from_stringn (const char *str, size_t len, const char 
*encoding,
 }
 
 SCM
-scm_from_latin1_stringn (const char *str, size_t len)
+scm_from_locale_string (const char *str)
 {
-  return scm_from_stringn (str, len, NULL, SCM_FAILED_CONVERSION_ERROR);
+  return scm_from_locale_stringn (str, -1);
 }
 
 SCM
@@ -1515,11 +1520,6 @@ scm_from_locale_stringn (const char *str, size_t len)
   SCM inport;
   scm_t_port *pt;
 
-  if (len == (size_t) -1)
-    len = strlen (str);
-  if (len == 0)
-    return scm_nullstr;
-
   inport = scm_current_input_port ();
   if (!SCM_UNBNDP (inport) && SCM_OPINPORTP (inport))
     {
@@ -1537,20 +1537,27 @@ scm_from_locale_stringn (const char *str, size_t len)
 }
 
 SCM
-scm_from_locale_string (const char *str)
+scm_from_latin1_string (const char *str)
 {
-  if (str == NULL)
-    return scm_nullstr;
+  return scm_from_latin1_stringn (str, -1);
+}
 
-  return scm_from_locale_stringn (str, -1);
+SCM
+scm_from_latin1_stringn (const char *str, size_t len)
+{
+  return scm_from_stringn (str, len, NULL, SCM_FAILED_CONVERSION_ERROR);
 }
 
 SCM
-scm_i_from_utf8_string (const scm_t_uint8 *str)
+scm_from_utf8_string (const char *str)
 {
-  return scm_from_stringn ((const char *) str,
-                           strlen ((char *) str), "UTF-8",
-                           SCM_FAILED_CONVERSION_ERROR);
+  return scm_from_utf8_stringn (str, -1);
+}
+
+SCM
+scm_from_utf8_stringn (const char *str, size_t len)
+{
+  return scm_from_stringn (str, len, "UTF-8", SCM_FAILED_CONVERSION_ERROR);
 }
 
 /* Create a new scheme string from the C string STR.  The memory of
@@ -1707,9 +1714,9 @@ scm_i_unistring_escapes_to_r6rs_escapes (char *buf, 
size_t *lenp)
 }
 
 char *
-scm_to_latin1_stringn (SCM str, size_t *lenp)
+scm_to_locale_string (SCM str)
 {
-  return scm_to_stringn (str, lenp, NULL, SCM_FAILED_CONVERSION_ERROR);
+  return scm_to_locale_stringn (str, NULL);
 }
 
 char *
@@ -1733,6 +1740,30 @@ scm_to_locale_stringn (SCM str, size_t *lenp)
                          scm_i_get_conversion_strategy (SCM_BOOL_F));
 }
 
+char *
+scm_to_latin1_string (SCM str)
+{
+  return scm_to_latin1_stringn (str, NULL);
+}
+
+char *
+scm_to_latin1_stringn (SCM str, size_t *lenp)
+{
+  return scm_to_stringn (str, lenp, NULL, SCM_FAILED_CONVERSION_ERROR);
+}
+
+char *
+scm_to_utf8_string (SCM str)
+{
+  return scm_to_utf8_stringn (str, NULL);
+}
+
+char *
+scm_to_utf8_stringn (SCM str, size_t *lenp)
+{
+  return scm_to_stringn (str, lenp, "UTF-8", SCM_FAILED_CONVERSION_ERROR);
+}
+
 /* Return a malloc(3)-allocated buffer containing the contents of STR encoded
    according to ENCODING.  If LENP is non-NULL, set it to the size in bytes of
    the returned buffer.  If the conversion to ENCODING fails, apply the 
strategy
@@ -1845,20 +1876,6 @@ scm_to_stringn (SCM str, size_t *lenp, const char 
*encoding,
   return buf;
 }
 
-char *
-scm_to_locale_string (SCM str)
-{
-  return scm_to_locale_stringn (str, NULL);
-}
-
-scm_t_uint8 *
-scm_i_to_utf8_string (SCM str)
-{
-  char *u8str;
-  u8str = scm_to_stringn (str, NULL, "UTF-8", SCM_FAILED_CONVERSION_ERROR);
-  return (scm_t_uint8 *) u8str;
-}
-
 size_t
 scm_to_locale_stringbuf (SCM str, char *buf, size_t max_len)
 {
diff --git a/libguile/strings.h b/libguile/strings.h
index 00bc224..1a8ff7c 100644
--- a/libguile/strings.h
+++ b/libguile/strings.h
@@ -3,7 +3,7 @@
 #ifndef SCM_STRINGS_H
 #define SCM_STRINGS_H
 
-/* Copyright (C) 1995,1996,1997,1998,2000,2001, 2004, 2005, 2006, 2008, 2009, 
2010 Free Software Foundation, Inc.
+/* Copyright (C) 1995,1996,1997,1998,2000,2001, 2004, 2005, 2006, 2008, 2009, 
2010, 2011 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -125,18 +125,31 @@ SCM_API SCM scm_c_substring_read_only (SCM str, size_t 
start, size_t end);
 SCM_API SCM scm_c_substring_shared (SCM str, size_t start, size_t end);
 SCM_API SCM scm_c_substring_copy (SCM str, size_t start, size_t end);
 
-SCM_API SCM scm_from_latin1_stringn (const char *str, size_t len);
+/* Use locale encoding for user input, user output, or interacting with
+   the C library.  Use latin1 for ASCII, and for literals in source
+   code.  Use utf8 for interaction with modern libraries which deal in
+   UTF-8.  Otherwise use scm_to_stringn or scm_from_stringn with a
+   specific encoding. */
+
 SCM_API SCM scm_from_locale_string (const char *str);
 SCM_API SCM scm_from_locale_stringn (const char *str, size_t len);
-SCM_INTERNAL SCM scm_i_from_utf8_string (const scm_t_uint8 *str);
 SCM_API SCM scm_take_locale_string (char *str);
 SCM_API SCM scm_take_locale_stringn (char *str, size_t len);
-SCM_API char *scm_to_latin1_stringn (SCM str, size_t *lenp);
 SCM_API char *scm_to_locale_string (SCM str);
 SCM_API char *scm_to_locale_stringn (SCM str, size_t *lenp);
+
+SCM_API SCM scm_from_latin1_string (const char *str);
+SCM_API SCM scm_from_latin1_stringn (const char *str, size_t len);
+SCM_API char *scm_to_latin1_string (SCM str);
+SCM_API char *scm_to_latin1_stringn (SCM str, size_t *lenp);
+
+SCM_API char *scm_to_utf8_string (SCM str);
+SCM_API char *scm_to_utf8_stringn (SCM str, size_t *lenp);
+SCM_API SCM scm_from_utf8_string (const char *str);
+SCM_API SCM scm_from_utf8_stringn (const char *str, size_t len);
+
 SCM_API char *scm_to_stringn (SCM str, size_t *lenp, const char *encoding,
                               scm_t_string_failed_conversion_handler handler);
-SCM_INTERNAL scm_t_uint8 *scm_i_to_utf8_string (SCM str);
 SCM_API size_t scm_to_locale_stringbuf (SCM str, char *buf, size_t max_len);
 
 SCM_API SCM scm_string_normalize_nfd (SCM str);
diff --git a/libguile/symbols.c b/libguile/symbols.c
index c77749f..b9d41b0 100644
--- a/libguile/symbols.c
+++ b/libguile/symbols.c
@@ -68,128 +68,141 @@ SCM_DEFINE (scm_sys_symbols, "%symbols", 0, 0, 0,
 /* {Symbols}
  */
 
-/* In order to optimize reading speed, this function breaks part of
- * the hashtable abstraction.  The optimizations are:
- *
- * 1. The argument string can be compared directly to symbol objects
- *    without first creating an SCM string object.  (This would have
- *    been necessary if we had used the hashtable API in hashtab.h.)
- *
- * 2. We can use the raw hash value stored in scm_i_symbol_hash (sym)
- *    to speed up lookup.
- *
- * Both optimizations might be possible without breaking the
- * abstraction if the API in hashtab.c is improved.
- */
-
 unsigned long
 scm_i_hash_symbol (SCM obj, unsigned long n, void *closure)
 {
   return scm_i_symbol_hash (obj) % n;
 }
 
+struct string_lookup_data
+{
+  SCM string;
+  unsigned long string_hash;
+};
+
+static int
+string_lookup_predicate_fn (SCM sym, void *closure)
+{
+  struct string_lookup_data *data = closure;
+
+  if (scm_i_symbol_hash (sym) == data->string_hash
+      && scm_i_symbol_length (sym) == scm_i_string_length (data->string))
+    {
+      size_t n = scm_i_symbol_length (sym);
+      while (n--)
+        if (scm_i_symbol_ref (sym, n) != scm_i_string_ref (data->string, n))
+          return 0;
+      return 1;
+    }
+  else
+    return 0;
+}
+
 static SCM
 lookup_interned_symbol (SCM name, unsigned long raw_hash)
 {
-  /* Try to find the symbol in the symbols table */
-  SCM result = SCM_BOOL_F;
-  SCM bucket, elt, previous_elt;
+  struct string_lookup_data data;
+  SCM handle;
+
+  data.string = name;
+  data.string_hash = raw_hash;
+  
+  /* Strictly speaking, we should take a lock here.  But instead we rely
+     on the fact that if this fails, we do take the lock on the
+     intern_symbol path; and since nothing deletes from the hash table
+     except GC, we should be OK.  */
+  handle = scm_hash_fn_get_handle_by_hash (symbols, raw_hash,
+                                           string_lookup_predicate_fn,
+                                           &data);  
+
+  if (scm_is_true (handle))
+    return SCM_CAR (handle);
+  else
+    return SCM_BOOL_F;
+}
+
+struct latin1_lookup_data
+{
+  const char *str;
   size_t len;
-  unsigned long hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
+  unsigned long string_hash;
+};
+
+static int
+latin1_lookup_predicate_fn (SCM sym, void *closure)
+{
+  struct latin1_lookup_data *data = closure;
 
-  len = scm_i_string_length (name);
-  bucket = SCM_HASHTABLE_BUCKET (symbols, hash);
+  return scm_i_symbol_hash (sym) == data->string_hash
+    && scm_i_is_narrow_symbol (sym)
+    && scm_i_symbol_length (sym) == data->len
+    && strncmp (scm_i_symbol_chars (sym), data->str, data->len) == 0;
+}
+
+static SCM
+lookup_interned_latin1_symbol (const char *str, size_t len,
+                               unsigned long raw_hash)
+{
+  struct latin1_lookup_data data;
+  SCM handle;
 
-  for (elt = bucket, previous_elt = SCM_BOOL_F;
-       !scm_is_null (elt);
-       previous_elt = elt, elt = SCM_CDR (elt))
+  data.str = str;
+  data.len = len;
+  data.string_hash = raw_hash;
+  
+  /* Strictly speaking, we should take a lock here.  But instead we rely
+     on the fact that if this fails, we do take the lock on the
+     intern_symbol path; and since nothing deletes from the hash table
+     except GC, we should be OK.  */
+  handle = scm_hash_fn_get_handle_by_hash (symbols, raw_hash,
+                                           latin1_lookup_predicate_fn,
+                                           &data);  
+
+  if (scm_is_true (handle))
+    return SCM_CAR (handle);
+  else
+    return SCM_BOOL_F;
+}
+
+static unsigned long
+symbol_lookup_hash_fn (SCM obj, unsigned long max, void *closure)
+{
+  return scm_i_symbol_hash (obj) % max;
+}
+
+static SCM
+symbol_lookup_assoc_fn (SCM obj, SCM alist, void *closure)
+{
+  for (; !scm_is_null (alist); alist = SCM_CDR (alist))
     {
-      SCM pair, sym;
-
-      pair = SCM_CAR (elt);
-      if (!scm_is_pair (pair))
-       abort ();
-
-      if (SCM_WEAK_PAIR_CAR_DELETED_P (pair))
-       {
-         /* PAIR is a weak pair whose key got nullified: remove it from
-            BUCKET.  */
-         /* FIXME: Since this is done lazily, i.e., only when a new symbol
-            is to be inserted in a bucket containing deleted symbols, the
-            number of items in the hash table may remain erroneous for some
-            time, thus precluding proper rehashing.  */
-         if (previous_elt != SCM_BOOL_F)
-           SCM_SETCDR (previous_elt, SCM_CDR (elt));
-         else
-           bucket = SCM_CDR (elt);
-
-         SCM_HASHTABLE_DECREMENT (symbols);
-         continue;
-       }
-
-      sym = SCM_CAR (pair);
-
-      if (scm_i_symbol_hash (sym) == raw_hash
-         && scm_i_symbol_length (sym) == len)
-       {
-          size_t i = len;
-
-          /* Slightly faster path for comparing narrow to narrow.  */
-          if (scm_i_is_narrow_string (name) && scm_i_is_narrow_symbol (sym))
-            {
-              const char *chrs = scm_i_symbol_chars (sym);
-              const char *str = scm_i_string_chars (name);
-
-              while (i != 0)
-                {
-                  --i;
-                  if (str[i] != chrs[i])
-                    goto next_symbol;
-                }
-            }
-          else
-            {
-              /* Somewhat slower path for comparing narrow to wide or
-                 wide to wide.  */
-              while (i != 0)
-                {
-                  --i;
-                  if (scm_i_string_ref (name, i) != scm_i_symbol_ref (sym, i))
-                    goto next_symbol;
-                }
-            }
-
-         /* We found it.  */
-         result = sym;
-         break;
-       }
-    next_symbol:
-      ;
-    }
+      SCM sym = SCM_CAAR (alist);
 
-  if (SCM_HASHTABLE_N_ITEMS (symbols) < SCM_HASHTABLE_LOWER (symbols))
-    /* We removed many symbols in this pass so trigger a rehashing.  */
-    scm_i_rehash (symbols, scm_i_hash_symbol, 0, "lookup_interned_symbol");
+      if (scm_i_symbol_hash (sym) == scm_i_symbol_hash (obj)
+          && scm_is_true (scm_string_equal_p (scm_symbol_to_string (sym),
+                                              scm_symbol_to_string (obj))))
+        return SCM_CAR (alist);
+    }
 
-  return result;
+  return SCM_BOOL_F;
 }
 
-/* Intern SYMBOL, an uninterned symbol.  */
-static void
+static scm_i_pthread_mutex_t intern_lock = SCM_I_PTHREAD_MUTEX_INITIALIZER;
+
+/* Intern SYMBOL, an uninterned symbol.  Might return a different
+   symbol, if another one was interned at the same time.  */
+static SCM
 intern_symbol (SCM symbol)
 {
-  SCM slot, cell;
-  unsigned long hash;
+  SCM handle;
 
-  hash = scm_i_symbol_hash (symbol) % SCM_HASHTABLE_N_BUCKETS (symbols);
-  slot = SCM_HASHTABLE_BUCKET (symbols, hash);
-  cell = scm_cons (symbol, SCM_UNDEFINED);
+  scm_i_pthread_mutex_lock (&intern_lock);
+  handle = scm_hash_fn_create_handle_x (symbols, symbol, SCM_UNDEFINED,
+                                        symbol_lookup_hash_fn,
+                                        symbol_lookup_assoc_fn,
+                                        NULL);
+  scm_i_pthread_mutex_unlock (&intern_lock);
 
-  SCM_SET_HASHTABLE_BUCKET (symbols, hash, scm_cons (cell, slot));
-  SCM_HASHTABLE_INCREMENT (symbols);
-
-  if (SCM_HASHTABLE_N_ITEMS (symbols) > SCM_HASHTABLE_UPPER (symbols))
-    scm_i_rehash (symbols, scm_i_hash_symbol, 0, "intern_symbol");
+  return SCM_CAR (handle);
 }
 
 static SCM
@@ -199,15 +212,15 @@ scm_i_str2symbol (SCM str)
   size_t raw_hash = scm_i_string_hash (str);
 
   symbol = lookup_interned_symbol (str, raw_hash);
-  if (scm_is_false (symbol))
+  if (scm_is_true (symbol))
+    return symbol;
+  else
     {
       /* The symbol was not found, create it.  */
       symbol = scm_i_make_symbol (str, 0, raw_hash,
                                  scm_cons (SCM_BOOL_F, SCM_EOL));
-      intern_symbol (symbol);
+      return intern_symbol (symbol);
     }
-
-  return symbol;
 }
 
 
@@ -443,6 +456,45 @@ scm_take_locale_symbol (char *sym)
   return scm_take_locale_symboln (sym, (size_t)-1);
 }
 
+SCM
+scm_from_latin1_symbol (const char *sym)
+{
+  return scm_from_latin1_symboln (sym, -1);
+}
+
+SCM
+scm_from_latin1_symboln (const char *sym, size_t len)
+{
+  unsigned long hash;
+  SCM ret;
+
+  if (len == (size_t) -1)
+    len = strlen (sym);
+  hash = scm_i_latin1_string_hash (sym, len);
+
+  ret = lookup_interned_latin1_symbol (sym, len, hash);
+  if (scm_is_false (ret))
+    {
+      SCM str = scm_from_latin1_stringn (sym, len);
+      ret = scm_i_str2symbol (str);
+    }
+
+  return ret;
+}
+
+SCM
+scm_from_utf8_symbol (const char *sym)
+{
+  return scm_from_utf8_symboln (sym, -1);
+}
+
+SCM
+scm_from_utf8_symboln (const char *sym, size_t len)
+{
+  SCM str = scm_from_utf8_stringn (sym, len);
+  return scm_i_str2symbol (str);
+}
+
 void
 scm_symbols_prehistory ()
 {
diff --git a/libguile/symbols.h b/libguile/symbols.h
index 8f96d65..6106f9e 100644
--- a/libguile/symbols.h
+++ b/libguile/symbols.h
@@ -67,11 +67,27 @@ SCM_API SCM scm_symbol_pset_x (SCM s, SCM val);
 SCM_API SCM scm_symbol_hash (SCM s);
 SCM_API SCM scm_gensym (SCM prefix);
 
+/* Use locale encoding for user input, user output, or interacting with
+   the C library.  Use latin-1 for ASCII, and for literals in source
+   code.  Use UTF-8 for interaction with modern libraries which deal in
+   UTF-8.  Otherwise use scm_to_stringn or scm_from_stringn, and
+   convert.  */
+
 SCM_API SCM scm_from_locale_symbol (const char *str);
 SCM_API SCM scm_from_locale_symboln (const char *str, size_t len);
 SCM_API SCM scm_take_locale_symbol (char *sym);
 SCM_API SCM scm_take_locale_symboln (char *sym, size_t len);
 
+SCM_API SCM scm_from_latin1_symbol (const char *str);
+SCM_API SCM scm_from_latin1_symboln (const char *str, size_t len);
+SCM_API SCM scm_take_latin1_symbol (char *sym);
+SCM_API SCM scm_take_latin1_symboln (char *sym, size_t len);
+
+SCM_API SCM scm_from_utf8_symbol (const char *str);
+SCM_API SCM scm_from_utf8_symboln (const char *str, size_t len);
+SCM_API SCM scm_take_utf8_symbol (char *sym);
+SCM_API SCM scm_take_utf8_symboln (char *sym, size_t len);
+
 /* internal functions. */
 
 SCM_INTERNAL unsigned long scm_i_hash_symbol (SCM obj, unsigned long n,
diff --git a/libguile/throw.c b/libguile/throw.c
index a0cb106..d2277a6 100644
--- a/libguile/throw.c
+++ b/libguile/throw.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995,1996,1997,1998,2000,2001, 2003, 2004, 2006, 2008, 2009, 
2010 Free Software Foundation, Inc.
+/* Copyright (C) 1995,1996,1997,1998,2000,2001, 2003, 2004, 2006, 2008, 2009, 
2010, 2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -341,7 +341,7 @@ handler_message (void *handler_data, SCM tag, SCM args)
   char *prog_name = (char *) handler_data;
   SCM p = scm_current_error_port ();
 
-  if (scm_is_eq (tag, scm_from_locale_symbol ("syntax-error"))
+  if (scm_is_eq (tag, scm_from_latin1_symbol ("syntax-error"))
       && scm_ilength (args) >= 5)
     {
       SCM who = SCM_CAR (args);
@@ -465,7 +465,7 @@ handler_message (void *handler_data, SCM tag, SCM args)
 SCM
 scm_handle_by_message (void *handler_data, SCM tag, SCM args)
 {
-  if (scm_is_true (scm_eq_p (tag, scm_from_locale_symbol ("quit"))))
+  if (scm_is_true (scm_eq_p (tag, scm_from_latin1_symbol ("quit"))))
     exit (scm_exit_status (args));
 
   handler_message (handler_data, tag, args);
@@ -485,7 +485,7 @@ scm_handle_by_message (void *handler_data, SCM tag, SCM 
args)
 SCM
 scm_handle_by_message_noexit (void *handler_data, SCM tag, SCM args)
 {
-  if (scm_is_true (scm_eq_p (tag, scm_from_locale_symbol ("quit"))))
+  if (scm_is_true (scm_eq_p (tag, scm_from_latin1_symbol ("quit"))))
     exit (scm_exit_status (args));
 
   handler_message (handler_data, tag, args);
diff --git a/libguile/vm-i-loader.c b/libguile/vm-i-loader.c
index a9326c9..fae39fb 100644
--- a/libguile/vm-i-loader.c
+++ b/libguile/vm-i-loader.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2001,2008,2009,2010 Free Software Foundation, Inc.
+/* Copyright (C) 2001,2008,2009,2010,2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -52,7 +52,7 @@ VM_DEFINE_LOADER (103, load_symbol, "load-symbol")
   FETCH_LENGTH (len);
   SYNC_REGISTER ();
   /* FIXME: should be scm_from_latin1_symboln */
-  PUSH (scm_from_locale_symboln ((const char*)ip, len));
+  PUSH (scm_from_latin1_symboln ((const char*)ip, len));
   ip += len;
   NEXT;
 }
diff --git a/libguile/vm.c b/libguile/vm.c
index c08b084..e8f8ddf 100644
--- a/libguile/vm.c
+++ b/libguile/vm.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2001, 2009, 2010 Free Software Foundation, Inc.
+/* Copyright (C) 2001, 2009, 2010, 2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -883,11 +883,11 @@ scm_bootstrap_vm (void)
                             "scm_init_vm",
                             (scm_t_extension_init_func)scm_init_vm, NULL);
 
-  sym_vm_run = scm_from_locale_symbol ("vm-run");
-  sym_vm_error = scm_from_locale_symbol ("vm-error");
-  sym_keyword_argument_error = scm_from_locale_symbol 
("keyword-argument-error");
-  sym_regular = scm_from_locale_symbol ("regular");
-  sym_debug = scm_from_locale_symbol ("debug");
+  sym_vm_run = scm_from_latin1_symbol ("vm-run");
+  sym_vm_error = scm_from_latin1_symbol ("vm-error");
+  sym_keyword_argument_error = scm_from_latin1_symbol 
("keyword-argument-error");
+  sym_regular = scm_from_latin1_symbol ("regular");
+  sym_debug = scm_from_latin1_symbol ("debug");
 
 #ifdef VM_ENABLE_PRECISE_STACK_GC_SCAN
   vm_stack_gc_kind =
diff --git a/module/web/http.scm b/module/web/http.scm
index f2f0866..422669a 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -130,17 +130,19 @@ port, and writes the value to the port."
                                                (read-line* port))))
       val))
 
+(define *eof* (call-with-input-string "" read))
+
 (define (read-header port)
   "Reads one HTTP header from @var{port}. Returns two values: the header
 name and the parsed Scheme value. May raise an exception if the header
 was known but the value was invalid.
 
-Returns @var{#f} for both values if the end of the message body was
-reached (i.e., a blank line)."
+Returns the end-of-file object for both values if the end of the message
+body was reached (i.e., a blank line)."
   (let ((line (read-line* port)))
     (if (or (string-null? line)
             (string=? line "\r"))
-        (values #f #f)
+        (values *eof* *eof*)
         (let ((delim (or (string-index line #\:)
                          (bad-header '%read line))))
           (parse-header
@@ -205,9 +207,9 @@ ordered alist."
   (let lp ((headers '()))
     (call-with-values (lambda () (read-header port))
       (lambda (k v)
-        (if k
-            (lp (acons k v headers))
-            (reverse! headers))))))
+        (if (eof-object? k)
+            (reverse! headers)
+            (lp (acons k v headers)))))))
 
 (define (write-headers headers port)
   "Write the given header alist to @var{port}.  Doesn't write the final
@@ -628,7 +630,7 @@ ordered alist."
   (display (date->string date "~a, ~d ~b ~Y ~H:~M:~S GMT") port))
 
 (define (write-uri uri port)
-  (display (unparse-uri uri) port))
+  (display (uri->string uri) port))
 
 (define (parse-entity-tag val)
   (if (string-prefix? "W/" val)
@@ -751,7 +753,7 @@ not have to have a scheme or host name.  The result is a 
URI object."
                  #:query (and q (substring str (1+ q) (or f end)))
                  #:fragment (and f (substring str (1+ f) end)))))
    (else
-    (or (parse-uri (substring str start end))
+    (or (string->uri (substring str start end))
         (bad-request "Invalid URI: ~a" (substring str start end))))))
 
 (define (read-request-line port)
@@ -890,7 +892,7 @@ phrase\"."
     ((_ sym name)
      (declare-header sym
        name
-       (lambda (str) (or (parse-uri str) (bad-header-component 'uri str)))
+       (lambda (str) (or (string->uri str) (bad-header-component 'uri str)))
        uri?
        write-uri))))
 
diff --git a/module/web/uri.scm b/module/web/uri.scm
index a7c4f2e..2361d87 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -37,7 +37,7 @@
 
             build-uri
             declare-default-port!
-            parse-uri unparse-uri
+            string->uri uri->string
             uri-decode uri-encode
             split-and-decode-uri-path
             encode-and-join-uri-path))
@@ -160,7 +160,7 @@ consistency checks to make sure that the constructed URI is 
valid."
 (define uri-regexp
   (make-regexp uri-pat))
 
-(define (parse-uri string)
+(define (string->uri string)
   "Parse @var{string} into a URI object. Returns @code{#f} if the string
 could not be parsed."
   (% (let ((m (regexp-exec uri-regexp string)))
@@ -197,7 +197,7 @@ printed."
 (declare-default-port! 'http 80)
 (declare-default-port! 'https 443)
 
-(define (unparse-uri uri)
+(define (uri->string uri)
   "Serialize @var{uri} to a string."
   (let* ((scheme-str (string-append
                       (symbol->string (uri-scheme uri)) ":"))
@@ -227,29 +227,29 @@ printed."
          ""))))
 
 
-(define (call-with-encoded-output-string charset proc)
-  (if (string-ci=? charset "utf-8")
+(define (call-with-encoded-output-string encoding proc)
+  (if (string-ci=? encoding "utf-8")
       (string->utf8 (call-with-output-string proc))
       (call-with-values
           (lambda ()
             (open-bytevector-output-port))
         (lambda (port get-bytevector)
-          (set-port-encoding! port charset)
+          (set-port-encoding! port encoding)
           (proc port)
           (get-bytevector)))))
 
-(define (encode-string str charset)
-  (if (string-ci=? charset "utf-8")
+(define (encode-string str encoding)
+  (if (string-ci=? encoding "utf-8")
       (string->utf8 str)
-      (call-with-encoded-output-string charset
+      (call-with-encoded-output-string encoding
                                        (lambda (port)
                                          (display str port)))))
 
-(define (decode-string bv charset)
-  (if (string-ci=? charset "utf-8")
+(define (decode-string bv encoding)
+  (if (string-ci=? encoding "utf-8")
       (utf8->string bv)
       (let ((p (open-bytevector-input-port bv)))
-        (set-port-encoding! p charset)
+        (set-port-encoding! p encoding)
         (read-delimited "" p))))
 
 
@@ -266,8 +266,8 @@ printed."
 (define hex-chars
   (string->char-set "0123456789abcdefABCDEF"))
 
-(define* (uri-decode str #:key (charset "utf-8"))
-  "Percent-decode the given @var{str}, according to @var{charset}.
+(define* (uri-decode str #:key (encoding "utf-8"))
+  "Percent-decode the given @var{str}, according to @var{encoding}.
 
 Note that this function should not generally be applied to a full URI
 string. For paths, use split-and-decode-uri-path instead. For query
@@ -278,14 +278,14 @@ Note that percent-encoded strings encode @emph{bytes}, 
not characters.
 There is no guarantee that a given byte sequence is a valid string
 encoding. Therefore this routine may signal an error if the decoded
 bytes are not valid for the given encoding. Pass @code{#f} for
address@hidden if you want decoded bytes as a bytevector directly."
address@hidden if you want decoded bytes as a bytevector directly."
   (let ((len (string-length str)))
     (call-with-values open-bytevector-output-port
       (lambda (port get-bytevector)
         (let lp ((i 0))
           (if (= i len)
-              (if charset
-                  (decode-string (get-bytevector) charset)
+              (if encoding
+                  (decode-string (get-bytevector) encoding)
                   (get-bytevector)) ; raw bytevector
               (let ((ch (string-ref str i)))
                 (cond
@@ -328,29 +328,31 @@ bytes are not valid for the given encoding. Pass 
@code{#f} for
 ;; Return a new string made from uri-encoding @var{str}, unconditionally
 ;; transforming any characters not in @var{unescaped-chars}.
 ;;
-(define* (uri-encode str #:key (charset "utf-8")
+(define* (uri-encode str #:key (encoding "utf-8")
                      (unescaped-chars unreserved-chars))
-  "Percent-encode any character not in @var{unescaped-chars}.
+  "Percent-encode any character not in the character set, 
@var{unescaped-chars}.
 
 Percent-encoding first writes out the given character to a bytevector
-within the given @var{charset}, then encodes each byte as
+within the given @var{encoding}, then encodes each byte as
 @address@hidden, where @var{HH} is the hexadecimal representation of
 the byte."
-  (call-with-output-string
-   (lambda (port)
-     (string-for-each
-      (lambda (ch)
-        (if (char-set-contains? unescaped-chars ch)
-            (display ch port)
-            (let* ((bv (encode-string (string ch) charset))
-                   (len (bytevector-length bv)))
-              (let lp ((i 0))
-                (if (< i len)
-                    (let ((byte (bytevector-u8-ref bv i)))
-                      (display #\% port)
-                      (display (number->string byte 16) port)
-                      (lp (1+ i))))))))
-      str))))
+  (if (string-index str unescaped-chars)
+      (call-with-output-string
+       (lambda (port)
+         (string-for-each
+          (lambda (ch)
+            (if (char-set-contains? unescaped-chars ch)
+                (display ch port)
+                (let* ((bv (encode-string (string ch) encoding))
+                       (len (bytevector-length bv)))
+                  (let lp ((i 0))
+                    (if (< i len)
+                        (let ((byte (bytevector-u8-ref bv i)))
+                          (display #\% port)
+                          (display (number->string byte 16) port)
+                          (lp (1+ i))))))))
+          str)))
+      str))
 
 (define (split-and-decode-uri-path path)
   "Split @var{path} into its components, and decode each
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 832bccf..534380a 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -107,25 +107,25 @@
                          (build-uri 'http #:userinfo "foo")))
 
 
-(with-test-prefix "parse-uri"
+(with-test-prefix "string->uri"
   (pass-if "ftp:"
-    (uri=? (parse-uri "ftp:")
+    (uri=? (string->uri "ftp:")
            #:scheme 'ftp
            #:path ""))
   
   (pass-if "ftp:foo"
-    (uri=? (parse-uri "ftp:foo")
+    (uri=? (string->uri "ftp:foo")
            #:scheme 'ftp
            #:path "foo"))
   
   (pass-if "ftp://foo/bar";
-    (uri=? (parse-uri "ftp://foo/bar";)
+    (uri=? (string->uri "ftp://foo/bar";)
            #:scheme 'ftp
            #:host "foo"
            #:path "/bar"))
   
   (pass-if "ftp://address@hidden:22/baz";
-    (uri=? (parse-uri "ftp://address@hidden:22/baz";)
+    (uri=? (string->uri "ftp://address@hidden:22/baz";)
            #:scheme 'ftp
            #:userinfo "foo"
            #:host "bar"
@@ -133,49 +133,49 @@
            #:path "/baz"))
 
   (pass-if "http://bad.host.1";
-    (not (parse-uri "http://bad.host.1";)))
+    (not (string->uri "http://bad.host.1";)))
 
   (pass-if "http://foo:";
-    (uri=? (parse-uri "http://foo:";)
+    (uri=? (string->uri "http://foo:";)
            #:scheme 'http #:host "foo" #:path ""))
 
   (pass-if "http://foo:/";
-    (uri=? (parse-uri "http://foo:/";)
+    (uri=? (string->uri "http://foo:/";)
            #:scheme 'http #:host "foo" #:path "/"))
 
   (pass-if "http://foo:not-a-port";
-    (not (parse-uri "http://foo:not-a-port";)))
+    (not (string->uri "http://foo:not-a-port";)))
   
   (pass-if "http://:10";
-    (not (parse-uri "http://:10";)))
+    (not (string->uri "http://:10";)))
 
   (pass-if "http://foo@";
-    (not (parse-uri "http://foo@";))))
+    (not (string->uri "http://foo@";))))
 
-(with-test-prefix "unparse-uri"
+(with-test-prefix "uri->string"
   (pass-if "ftp:"
     (equal? "ftp:"
-            (unparse-uri (parse-uri "ftp:"))))
+            (uri->string (string->uri "ftp:"))))
   
   (pass-if "ftp:foo"
     (equal? "ftp:foo"
-            (unparse-uri (parse-uri "ftp:foo"))))
+            (uri->string (string->uri "ftp:foo"))))
   
   (pass-if "ftp://foo/bar";
     (equal? "ftp://foo/bar";
-            (unparse-uri (parse-uri "ftp://foo/bar";))))
+            (uri->string (string->uri "ftp://foo/bar";))))
   
   (pass-if "ftp://address@hidden:22/baz";
     (equal? "ftp://address@hidden:22/baz";
-            (unparse-uri (parse-uri "ftp://address@hidden:22/baz";))))
+            (uri->string (string->uri "ftp://address@hidden:22/baz";))))
   
   (pass-if "http://foo:";
     (equal? "http://foo";
-            (unparse-uri (parse-uri "http://foo:";))))
+            (uri->string (string->uri "http://foo:";))))
   
   (pass-if "http://foo:/";
     (equal? "http://foo/";
-            (unparse-uri (parse-uri "http://foo:/";)))))
+            (uri->string (string->uri "http://foo:/";)))))
 
 (with-test-prefix "decode"
   (pass-if (equal? "foo bar" (uri-decode "foo%20bar"))))


hooks/post-receive
-- 
GNU Guile



reply via email to

[Prev in Thread] Current Thread [Next in Thread]