emacs-elpa-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nongnu] elpa/hyperdrive e103791cbc 20/26: Merge: Fix text encoding issu


From: ELPA Syncer
Subject: [nongnu] elpa/hyperdrive e103791cbc 20/26: Merge: Fix text encoding issues
Date: Fri, 30 Aug 2024 16:00:12 -0400 (EDT)

branch: elpa/hyperdrive
commit e103791cbcc455aece720d9b09b3eee0e62da2f2
Merge: 56c201e497 0702f40645
Author: Joseph Turner <joseph@ushin.org>
Commit: Joseph Turner <joseph@ushin.org>

    Merge: Fix text encoding issues
---
 CHANGELOG.org            |  3 +++
 doc/hyperdrive.org       |  2 ++
 doc/hyperdrive.texi      |  2 ++
 hyperdrive-dir.el        |  2 ++
 hyperdrive-lib.el        | 40 ++++++++++++++++++++++++++++++-------
 hyperdrive.el            | 52 ++++++++++++++++++++++++++++++++----------------
 tests/org links.org      |  4 ++--
 tests/test-hyperdrive.el |  2 +-
 8 files changed, 80 insertions(+), 27 deletions(-)

diff --git a/CHANGELOG.org b/CHANGELOG.org
index b28eb92cd3..4144df07bf 100644
--- a/CHANGELOG.org
+++ b/CHANGELOG.org
@@ -65,6 +65,7 @@ space by "forgetting" your copy of a file 
(~hyperdrive-forget-file~)!
 
 ** Fixed
 
+- Fix text encoding when reading and writing hyperdrive files.
 - Fixed link targets/fragments for non-Org documents, like HTML.
 - More reliably kill intermediate buffers when generating a diff.
 - Avoid unnecessarily updating ~hyperdrive-version-ranges~.
@@ -78,12 +79,14 @@ space by "forgetting" your copy of a file 
(~hyperdrive-forget-file~)!
 - Improved reading of hyperdrive URLs.
 - Improved visible buttons in ~hyperdrive-menu-bar-mode~.
 - Avoid unnecessary prompt to save hyperdrive EWW buffers.
+- More accurately calculate file size when writing a hyperdrive file.
 
 ** Internal
 
 - Refactor to accommodate 
[[https://ushin.org/hyperdrive/hyperdrive-manual.html#Org_002dtransclusion-integration][hyperdrive-org-transclusion]].
 - History-related functions now accept ~hyperdrive-entry~ structs.
 - Update hyperdrive ~latest-version~ metadata when deleting directories.
+- Reduce memory usage in ~hyperdrive-write-buffer~.
 - Added Org 9.7.6 as a dependency.
 - Bumped package dependencies.
 
diff --git a/doc/hyperdrive.org b/doc/hyperdrive.org
index 5c0f92a716..7986be2752 100644
--- a/doc/hyperdrive.org
+++ b/doc/hyperdrive.org
@@ -1323,6 +1323,8 @@ world of p2p as well as the development of 
~hyper-gateway~.
 
 [[https://protesilaos.com][Protesilaos Stavrou]] for design input and 
user-testing ~hyperdrive.el~.
 
+[[https://savannah.gnu.org/users/eliz][Eli Zaretskii]] for guidance about text 
encoding systems.
+
 [[https://emacs.ch/@yantar92][Ihor Radchenko]] for guidance about security and 
Org mode integration.
 
 [[https://karl-voit.at/][Karl Voit]] for his feedback which inspired the 
design of ~hyperdrive-mirror~.
diff --git a/doc/hyperdrive.texi b/doc/hyperdrive.texi
index 546bc64297..a153101b45 100644
--- a/doc/hyperdrive.texi
+++ b/doc/hyperdrive.texi
@@ -1797,6 +1797,8 @@ world of p2p as well as the development of 
@code{hyper-gateway}.
 
 @uref{https://protesilaos.com, Protesilaos Stavrou} for design input and 
user-testing @code{hyperdrive.el}.
 
+@uref{https://savannah.gnu.org/users/eliz, Eli Zaretskii} for guidance about 
text encoding systems.
+
 @uref{https://emacs.ch/@@yantar92, Ihor Radchenko} for guidance about security 
and Org mode integration.
 
 @uref{https://karl-voit.at/, Karl Voit} for his feedback which inspired the 
design of @code{hyperdrive-mirror}.
diff --git a/hyperdrive-dir.el b/hyperdrive-dir.el
index f11458e722..99620803ca 100644
--- a/hyperdrive-dir.el
+++ b/hyperdrive-dir.el
@@ -388,8 +388,10 @@ see Info node `(elisp)Yanking Media'."
     ;; TODO: Extend this to other media types?
     (cl-assert (and h/current-entry
                     (h//entry-directory-p h/current-entry)))
+    (cl-assert (not (multibyte-string-p image)))
     (let ((entry (h/read-entry (h//context-hyperdrive :predicate #'h/writablep)
                                :latest-version t)))
+      (setf (he/size entry) (string-bytes image))
       (he/api 'put entry :body image
         ;; TODO: Pass MIME type in a header? hyper-gateway detects it for us.
         :then (lambda (_res) (h/open entry))
diff --git a/hyperdrive-lib.el b/hyperdrive-lib.el
index caa9a0f7ea..da9d318b29 100644
--- a/hyperdrive-lib.el
+++ b/hyperdrive-lib.el
@@ -64,8 +64,8 @@ Passes ARGS to `format-message'."
   ;; the name, we store the name as-is because, for one thing, the name
   ;; could theoretically contain a slash, and `file-name-nondirectory'
   ;; would return the wrong value in that case.
-  (name nil :documentation "Decoded filename of entry (excluding leading 
slash).")
-  (path nil :documentation "Decoded path (including leading slash).")
+  (name nil :documentation "URL-decoded filename of entry without leading 
slash.")
+  (path nil :documentation "URL-decoded path with leading slash.")
   (headers nil :documentation "HTTP headers from request.")
   (mtime nil :documentation "Last modified time.")
   (size nil :documentation "Size of file.")
@@ -238,7 +238,18 @@ metadata from the response."
 Sets ENTRY's hyperdrive to the persisted version of the drive if
 it exists.  Persists ENTRY's hyperdrive.  Invalidates ENTRY display."
   (pcase-let*
-      (((map link allow content-length content-type last-modified x-drive-size
+      ((encoding
+        ;; TODO: After the resolution of <https://todo.sr.ht/~ushin/ushin/211>,
+        ;; use the encoding specified in the 'Content-Type' header.  For now, 
we
+        ;; rely on the guesswork of `detect-coding-region'.
+        (if-let ((filename-encoding (auto-coding-alist-lookup (he/name 
entry))))
+            filename-encoding
+          (pcase (detect-coding-string (plz-response-body response) t)
+            ((or 'undecided 'undecided-dos 'undecided-mac 'undecided-unix)
+             ;; Default to UTF-8
+             'utf-8)
+            (detected-encoding detected-encoding))))
+       ((map link allow content-length content-type last-modified x-drive-size
              x-drive-version x-file-block-length 
x-file-block-length-downloaded)
         (plz-response-headers response))
        ;; RESPONSE is guaranteed to have a "Link" header with the public key,
@@ -247,6 +258,16 @@ it exists.  Persists ENTRY's hyperdrive.  Invalidates 
ENTRY display."
                           (match-string 1 link)))
        ;; NOTE: Don't destructure `persisted-hyperdrive' with `pcase' here 
since it may be nil.
        (persisted-hyperdrive (gethash public-key h/hyperdrives)))
+    ;; Decode response body.
+    (unless (eq 'no-conversion encoding)
+      (cl-callf decode-coding-string (plz-response-body response) encoding))
+    ;; TODO: Once we can get hyperdrive file contents as a buffer
+    ;; (<https://github.com/alphapapa/plz.el/issues/61>), we should use
+    ;; `decode-coding-region' instead of `decode-coding-string'.
+    ;; `decode-coding-region' will set the buffer-local value of
+    ;; `buffer-file-coding-system' to the correct encoding.  Currently,
+    ;; hyperdrive file buffers always have `buffer-file-coding-system' to the
+    ;; global default, `utf-8' on my machine.
 
     (when persisted-hyperdrive
       ;; ENTRY's hyperdrive already persisted: merge domains into persisted
@@ -281,6 +302,10 @@ it exists.  Persists ENTRY's hyperdrive.  Invalidates 
ENTRY display."
       (setf (he/size entry)
             (ignore-errors (cl-parse-integer content-length))))
     (when content-type
+      ;; FIXME: `content-type' for 'text/plain' always has 'charset=utf-8',
+      ;; which may not be correct.  Since charset in `hyperdrive-entry-type' is
+      ;; not used anywhere, this should not result in any bugs.  This FIXME can
+      ;; be removed when <https://todo.sr.ht/~ushin/ushin/211> is resolved.
       (setf (he/type entry) content-type))
     (when last-modified
       (setf (he/mtime entry) (encode-time (parse-time-string last-modified))))
@@ -1021,11 +1046,10 @@ Call ELSE if request fails."
 
 (cl-defun h/write (entry &key body then else queue)
   "Write BODY to hyperdrive ENTRY's URL.
-THEN and ELSE are passed to `hyperdrive-entry-api', which see."
+BODY should be an encoded string or buffer.  THEN and ELSE are
+passed to `hyperdrive-entry-api', which see."
   (declare (indent defun))
   (he/api 'put entry
-    ;; TODO: plz accepts buffer as a body, we should refactor calls to h/write
-    ;; to pass in a buffer instead of a buffer-string.
     :body body :then then :else else :queue queue))
 
 (cl-defun h//format-entry-url
@@ -1327,7 +1351,9 @@ DEFAULT and INITIAL-INPUT are passed to `read-string' 
as-is."
   (declare (indent defun))
   (let ((entry (he/create :hyperdrive hyperdrive
                           :path "/.well-known/host-meta.json")))
-    (h/write entry :body (json-encode (h/metadata hyperdrive))
+    ;; TODO: Is it okay to always encode the JSON object as UTF-8?
+    (h/write entry :body (encode-coding-string
+                          (json-encode (h/metadata hyperdrive)) 'utf-8)
       :then then)
     hyperdrive))
 
diff --git a/hyperdrive.el b/hyperdrive.el
index f05cf54108..76552b2d45 100644
--- a/hyperdrive.el
+++ b/hyperdrive.el
@@ -490,8 +490,9 @@ overwrite.
 With universal prefix argument \\[universal-argument], overwrite
 without prompting.
 
-This function is for interactive use only; for non-interactive
-use, see `hyperdrive-write'."
+This function is for interactive use only since it calls
+`select-safe-coding-system', which may prompt for input.
+For non-interactive use, see `hyperdrive-write'."
   (interactive (list (h/read-entry (h/read-hyperdrive #'h/writablep)
                                    :default-path
                                    (or (and (buffer-file-name)
@@ -501,10 +502,22 @@ use, see `hyperdrive-write'."
                                             (he/path h/current-entry)))
                                    :latest-version t)
                      current-prefix-arg))
-  (pcase-let (((cl-struct hyperdrive-entry hyperdrive name) entry)
-              (url (he/url entry))
-              (buffer (current-buffer))
-              (buffer-visiting-entry (h//find-buffer-visiting entry)))
+  (pcase-let* (((cl-struct hyperdrive-entry hyperdrive name) entry)
+               (url (he/url entry))
+               (orig-buffer (current-buffer))
+               (coding-system
+                (with-current-buffer orig-buffer
+                  ;; Detect coding in orig buffer so 
`buffer-file-coding-system'
+                  ;; is prioritized.
+                  (select-safe-coding-system (point-min) (point-max))))
+               (encoded-buffer
+                (with-current-buffer
+                    (generate-new-buffer
+                     (format " *hyperdrive-encoded %s*" orig-buffer) t)
+                  (insert-buffer-substring orig-buffer)
+                  (encode-coding-region (point-min) (point-max) coding-system)
+                  (current-buffer)))
+               (buffer-visiting-entry (h//find-buffer-visiting entry)))
     (unless (or overwritep (not (he/at nil entry)))
       (unless (y-or-n-p
               (format "File %s exists; overwrite?" (h//format-entry entry)))
@@ -514,11 +527,10 @@ use, see `hyperdrive-write'."
                                   (h//format-entry entry)))
           (h/user-error "Aborted"))))
     (h/write entry
-      :body (without-restriction
-              (buffer-substring-no-properties (point-min) (point-max)))
+      :body encoded-buffer
       :then (lambda (response)
-              (when (buffer-live-p buffer)
-                (with-current-buffer buffer
+              (when (buffer-live-p orig-buffer)
+                (with-current-buffer orig-buffer
                   (unless h/mode
                     (h//clean-buffer)
                     (when (eq 'unknown (h/safe-p hyperdrive))
@@ -531,15 +543,16 @@ use, see `hyperdrive-write'."
                   (he//fill entry (plz-response-headers response))
                   ;; PUT responses only include ETag and Last-Modified
                   ;; headers, so we need to set other entry metadata manually.
-                  ;; FIXME: For large buffers, `buffer-size' returns a 
different
-                  ;; value than the gateway's Content-Length header.
-                  (setf (he/size entry) (buffer-size))
+                  (setf (he/size entry) (with-current-buffer encoded-buffer
+                                          (bufferpos-to-filepos (point-max))))
                   ;; FIXME: Will entry type ever be anything besides 
text/plain?
                   ;;        /.well-known/host-meta.json ?
-                  (setf (he/type entry) "text/plain; charset=utf-8")
+                  (setf (he/type entry)
+                        (format "text/plain; charset=%s"
+                                (coding-system-base coding-system)))
                   (setq-local h/current-entry entry)
                   (setf buffer-file-name nil)
-                  (unless (eq buffer buffer-visiting-entry)
+                  (unless (eq orig-buffer buffer-visiting-entry)
                     (when (buffer-live-p buffer-visiting-entry)
                       (kill-buffer buffer-visiting-entry))
                     (rename-buffer (h//generate-new-buffer-name entry)))
@@ -550,9 +563,13 @@ use, see `hyperdrive-write'."
                   ;; and lets us avoid making another request for
                   ;; metadata.
                   (set-visited-file-modtime (current-time))))
-              (h/message "Wrote: %S to \"%s\"" name url))
+              (h/message "Wrote: %S to \"%s\"" name url)
+              (when (buffer-live-p encoded-buffer)
+                (kill-buffer encoded-buffer)))
       :else (lambda (plz-error)
-              (h/message "Unable to write: %S: %S" name plz-error)))
+              (h/message "Unable to write: %S: %S" name plz-error)
+              (when (buffer-live-p encoded-buffer)
+                (kill-buffer encoded-buffer))))
     (h/message "Saving to \"%s\"..." url)
     ;; TODO: Reload relevant hyperdrive-dir buffers after writing buffer (if 
ewoc buffers display version, then possibly all ewoc buffers for a given 
hyperdrive should be reloaded)
     ))
@@ -723,6 +740,7 @@ After successful upload, call THEN.  When QUEUE, use it."
                                              ;; "%a, %-d %b %Y %T %Z"
                                              (file-attribute-modification-time
                                               (file-attributes filename)) t))))
+    (setf (he/size entry) (file-attribute-size (file-attributes filename)))
     (he/api 'put entry :queue queue
       :body `(file ,filename)
       :headers `(("Last-Modified" . ,last-modified))
diff --git a/tests/org links.org b/tests/org links.org
index ce606a60e9..10d8bddb41 100644
--- a/tests/org links.org       
+++ b/tests/org links.org       
@@ -9,7 +9,7 @@
 
 + *Notes*
   + This file should be loaded from a hyperdrive in order to verify that these 
links work correctly for an Org file loaded from a hyperdrive.
-  + To decode these links with Org, use ~(org-element-context)~ with point on 
the link.  The resulting element/plist is used in function 
~hyperdrive--org-open-at-point~.
+  + To URL-decode these links with Org, use ~(org-element-context)~ with point 
on the link.  The resulting element/plist is used in function 
~hyperdrive--org-open-at-point~.
 
 ** Link types
 
@@ -54,7 +54,7 @@ Here are various link types which we want to test.
 
 + *Notes:*
 
-  + Spaces and such are required to NOT be encoded :: Spaces in bracketed, 
Org-style links must /not/ have spaces, etc. URL-encoded.
+  + Spaces and such are required to NOT be URL-encoded :: Spaces in bracketed, 
Org-style links must /not/ have spaces, etc. URL-encoded.
     + We do this so that existing repositories of Org files can be mirrored to 
a hyperdrive as-is and the relative links between them will still work.
 
 **** Links without targets
diff --git a/tests/test-hyperdrive.el b/tests/test-hyperdrive.el
index 77e1d96c79..c055683f12 100644
--- a/tests/test-hyperdrive.el
+++ b/tests/test-hyperdrive.el
@@ -25,7 +25,7 @@
 ;; TODO: Consider having a "round-trip" test that makes a new
 ;; hyperdrive, writes files of various names, then reads the
 ;; hyperdrive back and ensures that the entries names and paths are
-;; correctly encoded/decoded.
+;; correctly URL-encoded/URL-decoded.
 
 ;;; Code:
 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]