emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ELPA] New package: dired-duplicates


From: Philip Kaludercic
Subject: Re: [ELPA] New package: dired-duplicates
Date: Wed, 01 Nov 2023 21:29:29 +0000

Harald Judt <h.judt@gmx.at> writes:

> Hi,
>
> On 2023-11-01 12:32, Philip Kaludercic wrote:
>
>> You forgot to CC me in the response, I would have almost missed your 
>> response.
>
> I am very sorry for this.

No worries, happens to everyone :)

>> Harald Judt <h.judt@gmx.at> writes:
>> 
>>> On 2023-10-31 13:21, Philip Kaludercic wrote:
>>>
>>> [...]
>>>
>>>> @@ -225,7 +220,7 @@ This is the same as `dired-do-flagged-delete', but 
>>>> calls
>>>>      (defvar dired-duplicates-map
>>>>      (let ((map (make-sparse-keymap)))
>>>> -    ;; workaround for Emacs bug #57565
>>>> +    ;; workaround for Emacs bug#57565
>>>>        (when (< emacs-major-version 29)
>>>>          (define-key map (kbd "x") 'dired-duplicates--do-flagged-delete)
>>>>          (define-key map (kbd "D") 'dired-duplicates--do-delete))
>>>
>>> BTW: What's the reasoning behind this suggested change?
>> I thought that this would be necessary for bug-reference-mode to
>> work,
>> but apparently it can detect both of these references.
>
> Yes, whether space or not, it doesn't matter, at least in
> Emacs-29.1. But that aside, bug reference mode wants to take me to a
> page on my project (https://codeberg.org/bug/issues/57565) which does
> not exist because it is an *Emacs* bug that is referenced in the
> comment, not a bug in dired-duplicates.

I am afraid I don't understand.  Did you modify `bug-reference-url-format'?

>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>>>> From: Philip Kaludercic <philipk@posteo.net>
>>>> Cc: emacs-devel@gnu.org
>>>> Date: Tue, 31 Oct 2023 12:21:51 +0000
>>>>
>>>> In the meantime, here is some quick and superficial feedback on the code:
>>>
>>> What I would like to ask is whether Harald tried to calculate SHA256
>>> using Emacs's own primitives, instead of relying on an external
>>> program, which may or may not be installed.
>> That is a good point, but I can imagine that one reason for using an
>> external tool is that if the user has a specific program they wish to
>> use that is not available as an Emacs primitive?
>
> I have not tried to calculate SHA256 using Emacs's own primitives, but
> not because I did not want to. The reasoning behind my decision is the
> following:
>
> dired-duplicates does support local files and *remote* files - via
> TRAMP. For local files using Emacs's own primitives could be OK
> performance-wise, I have not tested this. But if one starts to use
> them for fetching files via TRAMP which means over network, then
> things could slow down to a crawl pretty fast (it even takes quite a
> while collecting size stats over the network). So it is not Emacs's
> SHA256 calculation that would be a problem here, but instead slow
> transport over network. If the host does the calculation, this should
> be a lot faster (though that of course depends on whether the files
> are local files on the host of course and not some mounted network
> resource). I think it is reasonable to assume that one would *not*
> want to download large video files over the network to check if they
> are duplicates.
>
> The second thing is customizability of course, and probably necessary
> because of my reasoning explained in the previous paragraph. The
> remote host might have md5sum installed, but not sha256sum. So I
> wanted to provide a way for the user to configure the checksum
> command.
>
> I do not object against implementing a fallback as Eli suggested; I
> simply have not implemented it because I did not know about Emacs's
> primitives, and at that same time my reasoning about remote files
> started and so that is the way it is now.
>
> Regards,
> Harald

Harald Judt <h.judt@gmx.at> writes:

> Hi,
>
> On 2023-11-01 20:24, Eli Zaretskii wrote:
>>> From: Philip Kaludercic <philipk@posteo.net>
>>> Cc: Visuwesh <visuweshm@gmail.com>,  h.judt@gmx.at,  emacs-devel@gnu.org
>>> Date: Wed, 01 Nov 2023 17:57:40 +0000
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>>> I'd like to see the numbers which led to the conclusion that
>>>> performance was prohibitive.
>>>>
>>>> And even if the performance is indeed much worse, it could be a
>>>> fallback in case the program is not available -- which would IMO be
>>>> much better than simply failing to provide the functionality in that
>>>> case.
>>>
>>> This is a cheap test on a 1.4GB ISO I had lying around:
>> Thanks.  But this is just a single large file, not a frequent use
>> case
>> for this package.
>> 
>>> --8<---------------cut here---------------start------------->8---
>>> (benchmark-run 1
>>>    (with-temp-buffer
>>>      (insert-file-contents-literally 
>>> "~/Downloads/haiku-r1beta4-x86_64-anyboot.iso")
>>>      (secure-hash 'sha512 (current-buffer))))
>>> ;; (44.389091035 1 1.5836082630000021)
>>>
>>> (benchmark-run 1
>>>    (with-temp-buffer
>>>      (call-process "sha512sum" nil t nil (expand-file-name 
>>> "~/Downloads/haiku-r1beta4-x86_64-anyboot.iso"))
>>>      (goto-char (point-min))
>>>      (and (looking-at (rx bos (+ alnum)))
>>>      (match-string 0))))
>>> ;; (5.155846791 0 0.0)
>>> --8<---------------cut here---------------end--------------->8---
>> And this is not the package doing its job, this is just a single
>> task
>> the package does when looking for duplicates.  The numbers when
>> running the package with call-process replaced by secure-hash will
>> probably be different.
>
> Thanks for the numbers. What troubles me a bit more than the speed is
> which effects inserting such a large file into a buffer has
> memory-wise?

It appears to depend on the size of the file.  I haven't determined the
exact tipping point where loading a file into memory becomes more
expensive than spawning a process, but it seems to be relatively low.

Another advantage of using a subprocess, is that the files can be
processed in parallel, assuming you start the processes asynchronously. 

> Regards,
> Harald

Harald Judt <h.judt@gmx.at> writes:

> Hi Philip,
>
> Another more organisational question regarding your review: If I apply
> your patch and create commits from it in the project, should I then
> use your name and email address as author of the commit (I guess I can
> assume you also assigned copyright to the FSF ;-))?

You can attribute it to me, but I don't really care much either way.  I
didn't really send you a patch to apply, but just the output of C-x v =
after leaving a few comments in the file.

> Harald





reply via email to

[Prev in Thread] Current Thread [Next in Thread]