guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Add CRAN importer.


From: Ricardo Wurmus
Subject: [PATCH] Add CRAN importer.
Date: Fri, 24 Jul 2015 17:07:12 +0200

Hi Guix,

attached is a patch to add a new importer for R packages published on
CRAN.  (I still have to add tests for this.)

The importer fetches the HTML description of an R package from
http://cran.r-project.org, converts it to SXML, and then tries to
extract the relevant information from the SXML expression.

When I finished I thought that maybe this could be done with less effort
by downloading the tarball from CRAN, extracting the plain text
DESCRIPTION file containing the same information and parse that instead.
But in order to get the tarball I’d have to parse the HTML anyway, so
maybe that’s not so much better after all.

This generates package expressions using the r-build-system — this build
system does not yet exist, but I’ll prepare one soon, not least to test
the importer.

Here’s an example:

$ ./pre-inst-env guix import cran ggplot2
following redirection to `http://cran.r-project.org/web/packages/ggplot2/'...
starting download of `/tmp/guix-file.QGL35J' from 
`http://cran.rstudio.com/src/contrib/ggplot2_1.0.1.tar.gz'...
http://cran.rstudio.com/.../ggplot2_1.0.1.tar.gz        100.0% of 2296.1 KiB 
(1203. KiB/s)
(package
  (name "r-ggplot2")
  (version "1.0.1")
  (source
    (origin
      (method url-fetch)
      (uri (string-append
             "mirror://cran/src/contrib/ggplot2_"
             version
             ".tar.gz"))
      (sha256
        (base32
          "0794kjqi3lrxb33lr1mykd58959hlgkhdn259vj8fxrh65mqw920"))))
  (build-system r-build-system)
  (inputs
    `(("r-digest" ,r-digest)
      ("r-gtable" ,r-gtable)
      ("r-mass" ,r-mass)
      ("r-plyr" ,r-plyr)
      ("r-proto" ,r-proto)
      ("r-reshape2" ,r-reshape2)
      ("r-scales" ,r-scales)))
  (home-page "http://ggplot2.org";)
  (synopsis
    "An Implementation of the Grammar of Graphics")
  (description
    "An implementation of the grammar of graphics
in R.  It combines the advantages of both base and
lattice graphics: conditioning and shared axes are
handled automatically, and you can still build up a
plot step by step from multiple data sources.  It also
implements a sophisticated multidimensional
conditioning system and a consistent interface to map
data to aesthetic attributes.  See http://ggplot2.org
for more information, documentation and examples.")
  (license gpl2))

Without further ado: the patch.

>From 8cb9622785feb79fbbe593099105160617ff6acb Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <address@hidden>
Date: Fri, 24 Jul 2015 16:49:57 +0200
Subject: [PATCH] import: Add 'cran' importer.

* guix/import/cran.scm: New file.
* guix/scripts/import.scm: Add "cran" to 'importers'.
* guix/scripts/import/cran.scm: New file.
* Makefile.am (MODULES): Add 'guix/import/cran.scm' and
  'guix/scripts/import/cran.scm'.
* doc/guix.texi (Invoking guix import): Document it.
* po/guix/POTFILES.in: Add 'guix/scripts/import/cran.scm'.
---
 Makefile.am                  |   2 +
 doc/guix.texi                |  12 +++
 guix/import/cran.scm         | 186 +++++++++++++++++++++++++++++++++++++++++++
 guix/scripts/import.scm      |   2 +-
 guix/scripts/import/cran.scm |  92 +++++++++++++++++++++
 5 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 guix/import/cran.scm
 create mode 100644 guix/scripts/import/cran.scm

diff --git a/Makefile.am b/Makefile.am
index ada4cbe..b397962 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -98,6 +98,7 @@ MODULES =                                     \
   guix/import/gnu.scm                          \
   guix/import/snix.scm                         \
   guix/import/cabal.scm                                \
+  guix/import/cran.scm                         \
   guix/import/hackage.scm                      \
   guix/import/elpa.scm                         \
   guix/scripts/download.scm                    \
@@ -113,6 +114,7 @@ MODULES =                                   \
   guix/scripts/refresh.scm                     \
   guix/scripts/system.scm                      \
   guix/scripts/lint.scm                                \
+  guix/scripts/import/cran.scm                 \
   guix/scripts/import/gnu.scm                  \
   guix/scripts/import/nix.scm                  \
   guix/scripts/import/hackage.scm              \
diff --git a/doc/guix.texi b/doc/guix.texi
index 822aefa..c3e1bf0 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -3855,6 +3855,18 @@ Perl module:
 guix import cpan Acme::Boolean
 @end example
 
address@hidden cran
address@hidden CRAN
+Import meta-data from @uref{http://cran.r-project.org/, CRAN}.
+Information is extracted from the HTML package description.
+
+The command command below imports meta-data for the @code{Cairo}
+R package:
+
address@hidden
+guix import cran Cairo
address@hidden example
+
 @item nix
 Import meta-data from a local copy of the source of the
 @uref{http://nixos.org/nixpkgs/, Nixpkgs address@hidden
diff --git a/guix/import/cran.scm b/guix/import/cran.scm
new file mode 100644
index 0000000..6587960
--- /dev/null
+++ b/guix/import/cran.scm
@@ -0,0 +1,186 @@
+;;; GNU Guix --- Functional package management for GNU
+;;; Copyright © 2015 Ricardo Wurmus <address@hidden>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.
+
+(define-module (guix import cran)
+  #:use-module (ice-9 match)
+  #:use-module (ice-9 regex)
+  #:use-module (srfi srfi-1)
+  #:use-module (sxml simple)
+  #:use-module (sxml match)
+  #:use-module (sxml xpath)
+  #:use-module (guix http-client)
+  #:use-module (guix hash)
+  #:use-module (guix store)
+  #:use-module (guix base32)
+  #:use-module ((guix download) #:select (download-to-store))
+  #:use-module (guix import utils)
+  #:export (cran->guix-package))
+
+;;; Commentary:
+;;;
+;;; Generate a package declaration template for the latest version of an R
+;;; package on CRAN, using the HTML description downloaded from
+;;; cran.r-project.org.
+;;;
+;;; Code:
+
+(define string->license
+  (match-lambda
+   ("AGPL-3" 'agpl3)
+   ("Artistic-2.0" 'artistic2.0)
+   ("Apache License 2.0" 'asl2.0)
+   ("BSD_2_clause" 'bsd-2)
+   ("BSD_3_clause" 'bsd-3)
+   ("GPL-2" 'gpl2)
+   ("GPL-3" 'GPL3)
+   ("LGPL-2" 'lgpl2.0)
+   ("LGPL-2.1" 'lgpl2.1)
+   ("LGPL-3" 'lgpl3)
+   ("MIT" 'x11)
+   ((x) (string->license x))
+   ((lst ...) `(list ,@(map string->license lst)))
+   (_ #f)))
+
+(define (format-inputs names)
+  "Generate a sorted list of package inputs from a list of package NAMES."
+  (sort
+    (map (lambda (name)
+           (list name (list 'unquote (string->symbol name))))
+         names)
+    (lambda args
+      (match args
+        (((a _ ...) (b _ ...))
+         (string-ci<? a b))))))
+
+(define (maybe-inputs package-inputs)
+  "Given a list of PACKAGE-INPUTS, tries to generate the 'inputs' field of a
+package definition."
+  (match package-inputs
+    (()
+     '())
+    ((package-inputs ...)
+     `((inputs (,'quasiquote ,(format-inputs package-inputs)))))))
+
+(define (cran-fetch name)
+  "Return an sxml representation of the CRAN page for the R package NAME,
+or #f on failure.  NAME is case-sensitive."
+  ;; This API always returns the latest release of the module.
+  (let ((cran-url (string-append "http://cran.r-project.org/web/packages/"; 
name)))
+    (false-if-exception
+     (xml->sxml (http-fetch cran-url)
+                #:trim-whitespace? #t
+                #:namespaces '((xhtml . "http://www.w3.org/1999/xhtml";))
+                #:default-entity-handler
+                (lambda (port name)
+                  (case name
+                    ((nbsp) " ")
+                    ((ge) ">=")
+                    ((gt) ">")
+                    ((lt) "<")
+                    (else
+                     (format (current-warning-port)
+                             "~a:~a:~a: undefined entitity: ~a\n"
+                             cran-url (port-line port) (port-column port)
+                             name)
+                     (symbol->string name))))))))
+
+(define (cran-sxml->sexp sxml)
+  "Return the `package' s-expression for a CRAN package from the SXML
+representation of the package page."
+  (define (nodes->text nodeset)
+    (string-join ((sxpath '(// *text*)) nodeset) " "))
+
+  ;; Extract the datum node next to a LABEL in the sxml table TREE.
+  (define (table-datum tree label)
+    (let ((label-node ((sxpath `(xhtml:tr (equal? (xhtml:td ,label)))) tree)))
+      (if (null? label-node)
+          '()
+          ((node-pos 1)
+           ((take-after (node-eq? (car label-node)))
+            ((node-join
+              (node-parent tree)
+              (select-kids (node-typeof? '*)))
+             label-node))))))
+
+  (define (guix-name name)
+    (if (string-prefix? "r-" name)
+        (string-downcase name)
+        (string-append "r-" (string-downcase name))))
+
+  (sxml-match-let*
+   (((xhtml:html
+      ,head
+      (xhtml:body
+       (xhtml:h2 ,name-and-synopsis)
+       (xhtml:p ,description)
+       ,summary
+       (xhtml:h4 "Downloads:") ,downloads
+       . ,rest))
+     (cadr sxml)))
+   (let* ((name       (match:prefix (string-match ": " name-and-synopsis)))
+          (synopsis   (match:suffix (string-match ": " name-and-synopsis)))
+          (version    (nodes->text (table-datum summary "Version:")))
+          (license    ((compose string->license nodes->text)
+                       (table-datum summary "License:")))
+          (home-page  (nodes->text ((sxpath '((xhtml:a 1)))
+                                    (table-datum summary "URL:"))))
+          (source-url (string-append "mirror://cran/"
+                                     ;; Remove double dots, because we want an
+                                     ;; absolute path.
+                                     (regexp-substitute/global
+                                      #f "\\.\\./"
+                                      (string-join
+                                       ((sxpath '((xhtml:a 1) @ href *text*))
+                                        (table-datum downloads " Package 
source: ")))
+                                      'pre 'post)))
+          (tarball    (with-store store (download-to-store store source-url)))
+          (sysdepends (map match:substring
+                           (list-matches
+                            "[^ ]+"
+                            ;; Strip off comma and parenthetical
+                            ;; expressions.
+                            (regexp-substitute/global
+                             #f "(,|\\([^\\)]+\\))"
+                             (nodes->text (table-datum summary 
"SystemRequirements:"))
+                             'pre 'post))))
+          (imports    (map guix-name
+                           ((sxpath '(// xhtml:a *text*))
+                            (table-datum summary "Imports:")))))
+     `(package
+        (name ,(guix-name name))
+        (version ,version)
+        (source (origin
+                  (method url-fetch)
+                  (uri (string-append ,@(factorize-uri source-url version)))
+                  (sha256
+                   (base32
+                    ,(bytevector->nix-base32-string (file-sha256 tarball))))))
+        (build-system r-build-system)
+        ,@(maybe-inputs (append sysdepends imports))
+        (home-page ,home-page)
+        (synopsis ,synopsis)
+        ;; Use double spacing
+        (description ,(regexp-substitute/global #f "\\. \\b" description
+                                                'pre ".  " 'post))
+        (license ,license)))))
+
+(define (cran->guix-package package-name)
+  "Fetch the metadata for PACKAGE-NAME from cran.r-project.org, and return the
+`package' s-expression corresponding to that package, or #f on failure."
+  (let ((module-meta (cran-fetch package-name)))
+    (and=> module-meta cran-sxml->sexp)))
diff --git a/guix/scripts/import.scm b/guix/scripts/import.scm
index d0bdec1..9d8e5cb 100644
--- a/guix/scripts/import.scm
+++ b/guix/scripts/import.scm
@@ -73,7 +73,7 @@ rather than \\n."
 ;;; Entry point.
 ;;;
 
-(define importers '("gnu" "nix" "pypi" "cpan" "hackage" "elpa"))
+(define importers '("gnu" "nix" "pypi" "cpan" "hackage" "elpa" "cran"))
 
 (define (resolve-importer name)
   (let ((module (resolve-interface
diff --git a/guix/scripts/import/cran.scm b/guix/scripts/import/cran.scm
new file mode 100644
index 0000000..f11fa10
--- /dev/null
+++ b/guix/scripts/import/cran.scm
@@ -0,0 +1,92 @@
+;;; GNU Guix --- Functional package management for GNU
+;;; Copyright © 2014 Eric Bavier <address@hidden>
+;;; Copyright © 2015 Ricardo Wurmus <address@hidden>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.
+
+(define-module (guix scripts import cran)
+  #:use-module (guix ui)
+  #:use-module (guix utils)
+  #:use-module (guix import cran)
+  #:use-module (guix scripts import)
+  #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-11)
+  #:use-module (srfi srfi-37)
+  #:use-module (ice-9 match)
+  #:use-module (ice-9 format)
+  #:export (guix-import-cran))
+
+
+;;;
+;;; Command-line options.
+;;;
+
+(define %default-options
+  '())
+
+(define (show-help)
+  (display (_ "Usage: guix import cran PACKAGE-NAME
+Import and convert the CRAN package for PACKAGE-NAME.\n"))
+  (display (_ "
+  -h, --help             display this help and exit"))
+  (display (_ "
+  -V, --version          display version information and exit"))
+  (newline)
+  (show-bug-report-information))
+
+(define %options
+  ;; Specification of the command-line options.
+  (cons* (option '(#\h "help") #f #f
+                 (lambda args
+                   (show-help)
+                   (exit 0)))
+         (option '(#\V "version") #f #f
+                 (lambda args
+                   (show-version-and-exit "guix import cran")))
+         %standard-import-options))
+
+
+;;;
+;;; Entry point.
+;;;
+
+(define (guix-import-cran . args)
+  (define (parse-options)
+    ;; Return the alist of option values.
+    (args-fold* args %options
+                (lambda (opt name arg result)
+                  (leave (_ "~A: unrecognized option~%") name))
+                (lambda (arg result)
+                  (alist-cons 'argument arg result))
+                %default-options))
+
+  (let* ((opts (parse-options))
+         (args (filter-map (match-lambda
+                            (('argument . value)
+                             value)
+                            (_ #f))
+                           (reverse opts))))
+    (match args
+      ((package-name)
+       (let ((sexp (cran->guix-package package-name)))
+         (unless sexp
+           (leave (_ "failed to download description for package '~a'~%")
+                  package-name))
+         sexp))
+      (()
+       (leave (_ "too few arguments~%")))
+      ((many ...)
+       (leave (_ "too many arguments~%"))))))
-- 
2.1.0

~~ Ricardo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]