|
From: | Jelle Licht |
Subject: | Re: License auditing |
Date: | Wed, 3 Aug 2016 20:00:38 +0200 |
On Wed, 3 Aug 2016 18:28:38 +0200
David Craven <address@hidden> wrote:
> How can I tell the difference between a lgpl2.1 and lgpl2.1+ license?
"or later"
> Is this a job that an automated tool could do? Detecting licenses
> included in a tarball?
I also wonder about that. Usually, the license text is just copied & pasted anyway, so it should be quite regular.
If there isn't one, I could write one which would basically, per source file,
- try to find SPDX identifier, if that doesn't work:
- ignore newline, "#" or ";" or "*" or "//" at the beginning of the line
- lex that into words, where "word" is either [a-zA-Z0-9-]+ or [.,;]
- try to 1:1 match with all the licenses similarily mapped
- if that didn't work, try to find signal words and guess the license and print the difference in a short form.
I could do that program in maybe 2 hours and find and extract all the official license texts in a few more hours. But does such a thing already exist? [Seems like something obvious to have and I'm writing many other things already.]
A human would still have to review the non-1:1 things - there could always be strange exceptions in the README or whatever - but the majority of cases should work just fine.
See also <https://spdx.org/licenses/> (especially <https://github.com/triplecheck/>), <http://www.sciencedirect.com/science/article/pii/S0164121216300905> (also lists several license checkers; Fossology seems to be a whole webservice which does that).
[Prev in Thread] | Current Thread | [Next in Thread] |