[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: License auditing
From: |
Danny Milosavljevic |
Subject: |
Re: License auditing |
Date: |
Wed, 3 Aug 2016 19:55:11 +0200 |
On Wed, 3 Aug 2016 18:28:38 +0200
David Craven <address@hidden> wrote:
> How can I tell the difference between a lgpl2.1 and lgpl2.1+ license?
"or later"
> Is this a job that an automated tool could do? Detecting licenses
> included in a tarball?
I also wonder about that. Usually, the license text is just copied & pasted
anyway, so it should be quite regular.
If there isn't one, I could write one which would basically, per source file,
- try to find SPDX identifier, if that doesn't work:
- ignore newline, "#" or ";" or "*" or "//" at the beginning of the line
- lex that into words, where "word" is either [a-zA-Z0-9-]+ or [.,;]
- try to 1:1 match with all the licenses similarily mapped
- if that didn't work, try to find signal words and guess the license and print
the difference in a short form.
I could do that program in maybe 2 hours and find and extract all the official
license texts in a few more hours. But does such a thing already exist? [Seems
like something obvious to have and I'm writing many other things already.]
A human would still have to review the non-1:1 things - there could always be
strange exceptions in the README or whatever - but the majority of cases should
work just fine.
See also <https://spdx.org/licenses/> (especially
<https://github.com/triplecheck/>),
<http://www.sciencedirect.com/science/article/pii/S0164121216300905> (also
lists several license checkers; Fossology seems to be a whole webservice which
does that).