emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some hard numbers on licenses used by elisp packages


From: Jonas Bernoulli
Subject: Re: Some hard numbers on licenses used by elisp packages
Date: Wed, 12 Jul 2017 14:49:52 +0200
User-agent: mu4e 0.9.19; emacs 25.2.1

Richard has asked me privately (by accident, I suspect) for some
clarifications.  Many of his questions were already addressed by the
page I linked to, and most others were already answered by the code
that that page in turn linked to.

I have now improved the introductory text on the linked page and I am
including that text here for your convenience:

> This page contains statistics about the licenses used by known Emacs
> packages.  *These statistics are not legal advice.  They are
> distributed in the hope that they will be useful, but WITHOUT ANY
> WARRANTY; without even the implied warranty of MERCHANTABILITY or
> FITNESS FOR A PARTICULAR PURPOSE.*
>
> The information used here is available from the Emacsmirror database
> (also known as the Epkg database).  For more information about the
> Emacsmirror see these 
> [[https://emacsair.me/2016/04/16/re-introducing-the-emacsmirror][blog]] 
> [[https://emacsair.me/2016/05/17/assimilate-emacs-packages-as-git-submodules][posts]].
>
> I have created this page to accompany 
> [[http://lists.gnu.org/archive/html/emacs-devel/2017-07/msg00341.html][this]] 
> conversation on
> ~emacs-devel~.
>
> I will periodically update the these statistics.  If you want to do so
> yourself, then read the relevant documentation.  You may also ask me
> for guidance.
>
> This information is extracted using the function ~elx-license~, which is
> provided by my package [[https://github.com/tarsius/elx][elx]] (~git clone 
> https://github.com/tarsius/elx.git~).
>
> The license is determined from the contents of the "main library" of
> the package alone (the library whose name matches the name of the
> package).  First this function looks for a permission statement for a
> license published by the Free Software Foundation, if any.  If that
> fails, then the value of the "License" header keyword is considered.
> Finally it searches for brief, and potentially ambiguous, permission
> statements for non-FSF licenses.  For FSF licenses a "+" is appended
> if the text "or (at your option) any later version", or similar was
> found.  An effort is made to normalize the returned value.  This
> function also accounts for some commonly used variations in wording,
> typos, and other complications.
>
> However the returned value is sometimes false or ambiguous.  In
> particular note that if a license is "unknown", then that merely means
> that it is /not known/ what license applies.  This may be because the
> library lacks a permission statement altogether (possibly because an
> accompanying ~LICENSE~ file is considered sufficient by the upstream),
> but it may also be because ~elx-license~ does not attempt to detect the
> used non-standard and/or non-fsf permission statement, or because of
> typos in the statement, or for a number of other reasons.

I have also improved the code used to extract this information and made
a new `elx' release.  This is the relevant code, including doc-strings:

> (defconst elx-gnu-permission-statement-regexp
>   (replace-regexp-in-string
>    "\s" "[\s\t\n;]+"
>    ;; is free software[.,:;]? \
>    ;; you can redistribute it and/or modify it under the terms of the \
>    "\
> GNU \\(?1:Lesser \\| Library \\|Affero \\|Free \\)?\
> General Public Licen[sc]e[.,:;]? \
> \\(?:as published by the \\(?:Free Software Foundation\\|FSF\\)[.,:;]? \\)?\
> \\(?:either \\)?\
> \\(?:GPL \\)?\
> version \\(?2:[0-9.]*[0-9]\\)[.,:;]?\
> \\(?: of the Licen[sc]e[.,:;]?\\)?\
> \\(?3: or \\(?:(at your option) \\)?any later version\\)?"))
>
> (defconst elx-gnu-license-keyword-regexp "\
> \\(?:GNU \\(?1:Lesser \\| Library \\|Affero \\|Free \\)? General Public 
> Licen[sc]e\
> \\|\\(?4:[laf]?gpl\\)[- ]?\
> \\)\
> \\(?:\\(?:v\\|version \\)?\\(?2:[0-9.]*[0-9]\\)\\)?\
> \\(?3: or \\(?:(at your option) \\)?\\(?:any \\)?later\\(?: version\\)?\\)?")
>
> (defconst elx-non-gnu-license-keyword-alist
>   '(("Apache-2.0"    .  "apache-2\\.0")
>     ("MIT"           .  "mit")
>     ("as-is"         .  "as-?is")
>     ("public-domain" . "public[- ]domain")))
>
> (defconst elx-non-gnu-license-keyword-regexp "\
> \\`\\(?4:[a-z]+\\)\\(?:\\(?:v\\|version \\)?\\(?2:[0-9.]*[0-9]\\)\\)?\\'")
>
> (defconst elx-non-gnu-permission-statement-alist
>   `(("Apache-2.0"    . "^;.* Apache License, Version 2\\.0")
>     ("MIT"           . "^;.* mit license")
>     ("public-domain" . "^;.*in\\(to\\)? the public[- ]domain")
>     ("public-domain" . "^;+ +Public domain\\.")
>     ("as-is"         . "^;.* \\(provided\\|distributed\\) \
> \\(by the author \\)?[\"`']\\{0,2\\}as[- ]is[\"`']\\{0,2\\}")))
>
> (defun elx-license (&optional file)
>   "Attempt to return the license used for the file FILE.
> Or the license used for the file that is being visited in the
> current buffer if FILE is nil.
>
> *** A value is returned in the hope that it will be useful, but
> *** WITHOUT ANY WARRANTY; without even the implied warranty of
> *** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> This function completely ignores and \"LICENSE\" or similar file
> in the proximity of FILE.  The returned value is solely based on
> the contents of FILE itself.
>
> The license is determined from the permission statement, if any.
> Otherwise the value of the \"License\" header keyword is
> considered.  An effort is made to normalize the returned value.
>
> *** However this function does not always return the correct
> *** value and the returned value is not legal advice.
>
> Note in particular that if this function returns nil, then that
> merely merely means that it is not known what license applies.
> This may be because the library lacks a permission statement
> altogether (possibly because an accompanying \"LICENSE\" file
> is considered sufficient by the upstream), but it may also be
> because this function does not attempt to detect the used
> non-standard and/or non-fsf permission statement, or because
> of typos in the statement, or for a number of other reasons."
>   (lm-with-file file
>     (cl-flet ((format-gnu-abbrev
>                (&optional object)
>                (let ((abbrev  (match-string 1 object))
>                      (version (match-string 2 object))
>                      (later   (match-string 3 object))
>                      (prefix  (match-string 4 object)))
>                  (concat (if prefix
>                              (upcase prefix)
>                            (pcase abbrev
>                              ("Lesser "  "LGPL")
>                              ("Library " "LGBL")
>                              ("Affero "  "AGPL")
>                              ("Free "    "FDL")
>                              (`nil       "GPL")))
>                          (and version (concat "-" version))
>                          (and later "+")))))
>       (let ((bound (lm-code-start))
>             (case-fold-search t))
>         (or (and (re-search-forward elx-gnu-permission-statement-regexp bound 
> t)
>                  (format-gnu-abbrev))
>             (-when-let (license (lm-header "Licen[sc]e"))
>               (or (and (string-match elx-gnu-license-keyword-regexp license)
>                        (format-gnu-abbrev license))
>                   (car (cl-find-if (pcase-lambda (`(,_ . ,re))
>                                      (string-match re license))
>                                    elx-non-gnu-license-keyword-alist))
>                   (and (string-match elx-non-gnu-license-keyword-regexp 
> license)
>                        (format-gnu-abbrev license))))
>             (and (re-search-forward
>                   "^;\\{1,4\\} Licensed under the same terms as Emacs" bound 
> t)
>                  "GPL-3+")
>             (and ;; Some libraries are releases "under the *GPL and
>                  ;; "<other license>", while the GPL is mentioned in
>                  ;; a way the above code does not recognize.  Return
>                  ;; nil instead of "<other license>" in such cases.
>                  (not (re-search-forward elx-gnu-license-keyword-regexp bound 
> t))
>                  (car (cl-find-if (pcase-lambda (`(,_ . ,re))
>                                     (re-search-forward re bound t))
>                                   
> elx-non-gnu-permission-statement-alist))))))))

Note that this function now returns e.g. "GPL-3+" if the "or (at your
option) any later version" pattern was detected.  I also made some other
changes to avoid false-positives (which comes at the cost of also no
longer matching some patterns that were previously matched correctly).

I can provide lists of packages that fall into a particular "category".
These lists can contain the names and email addresses of the maintainer,
links to the homepage and repository and many other things you might
find useful.

I would also be willing to contribute this code to the `lisp-mnt.el'
library, which is part of Emacs.  It certainly could still be improved
a lot, but it is a start.

Oh, and I almost forgot - here is an updated table:

| License       | Count | Percent |
|---------------+-------+---------|
| GPL-3+        |  2230 |      61 |
| GPL-2+        |   611 |      17 |
| (unknown)     |   511 |      14 |
| as-is         |    91 |       2 |
| MIT           |    70 |       2 |
| public-domain |    52 |       1 |
| GPL-3         |    41 |       1 |
| GPL-2         |    31 |       1 |
| Apache-2.0    |    18 |       0 |
| GPL-1+        |     4 |       0 |
| BSD           |     3 |       0 |
| GPL           |     2 |       0 |
| LGPL          |     2 |       0 |
| AGPL-3        |     1 |       0 |
| AGPL-3+       |     1 |       0 |
| BSD-3         |     1 |       0 |
| EPL           |     1 |       0 |
| LGPL-3+       |     1 |       0 |
| LGPL-3.0      |     1 |       0 |
|---------------+-------+---------|
| total GNU     |  2925 |      80 |
|---------------+-------+---------|
| total         |  3672 |     100 |

And to briefly answer the post questions:

>   > | (unknown)     |   509 |      14 |
>
> Could you explain what "unknown" means?  If a program
> does not explicitly state a license, it is proprietary.

Either the license was not specified OR the code was unable to find
the permission statement, which actually is present.

>   > | as-is         |   117 |       3 |
>
> Could you tell me what "as-is" means, here?  Is "as-is" meant to
> identify a speciic license?  If so, could you please show it to me?  I
> need to determine whether it is a free license and GPL-compatible.

Essentially the string "as-is" was found in the header.  I do agree
that this is ambiguous and problematic, but I decided to provide
this information anyway, because it is at least less ambiguous than
"unknown".

>   > | MIT           |    45 |       1 |
>
> "MIT" as the name of a license is ambiguous; see

Merely reporting that the string "MIT license" was found.

>   > | GPL           |    29 |       1 |
>
> What does that mean, concretely?
> Do these packages say, "any version of the GNU GPL"?
> That would be peculiar but not a substantive problem.
>
>   > | GPL-1         |     4 |       0 |
>
> Do these packages carry "GPL version 1 only"
> or "GPL version 1 or later"?

This has been improved now:

* "GPL"     => the GPL was mentioned, no version was mention
               (or possibly was just not detected)
* "GPL-N"   => the GPL and version N were mentioned
* "GPL-N+"  => ... additionally "or (at your opinion) any later version"
               was found (or a variation thereof).

>   > | EPL           |     1 |       0 |
>
> Does that mean the Eclipse Public License?

My guess is as good as yours; the string ";; License: EPL" was found.

  Best regards,
  Jonas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]