Re: hackage importer

guix-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hackage importer

From:	Federico Beffa
Subject:	Re: hackage importer
Date:	Sun, 22 Mar 2015 21:12:44 +0100

address@hidden (Ludovic Courtès) writes:

> I would use ‘ghc-’ right from the start, since that’s really what it
> is.  WDYT?

Agreed!

>> Obviously the tests in part 5 were used along the way and will be
>> removed.
>
> More precisely, they’ll have to be turned into tests/hackage.scm
> (similar to tests/snix.scm and tests/pypi.scm.)  :-)

I've created a test for hackage.  In doing so I've had an issue with the
locale and would like an advice: When I run "guix import hackage
package-name" from the shell, or when I invoke a REPL with Geiser
everything works fine.  When I initially did run the test-suite with

make check TESTS=tests/hackage.scm

the conditionals conversion looked wrong.  I've found that this is
related to the choice of locale.  To make the test work as desired I
added '(setlocale LC_ALL "en_US.UTF-8")' to the file declaring the
tests.

I suppose that running the 'guix import ...' command does get the locale
from the shell.  Should I somehow force the locale within the '(guix
import hackage)' module?

>> +;; List of libraries distributed with ghc (7.8.4).
>> +(define ghc-standard-libraries
>> +  '("haskell98"        ; 2.0.0.3
>> +    "hoopl"            ; 3.10.0.1
>
> Maybe the version numbers in comments can be omitted since they’ll
> become outdated eventually?

OK.

>> +(define guix-name-prefix "haskell-")
>
> s/guix-name-prefix/package-name-prefix/ and rather “ghc-” IMO.

Agreed.

>
>> +;; Regular expression matching "key: value"
>> +(define key-value-rx
>> +  "([a-zA-Z0-9-]+): *(\\w?.*)$")
>> +
>> +;; Regular expression matching a section "head sub-head ..."
>> +(define sections-rx
>> +  "([a-zA-Z0-9\\(\\)-]+)")
>> +
>> +;; Cabal comment.
>> +(define comment-rx
>> +  "^ *--")
>
> Use (make-regexp ...) directly, and then ‘regexp-exec’ instead of
> ‘string-match’.

I already had in mind to change these before sending the e-mail, but I
forgot.  It's now done.

>
>> +;; Check if the current line includes a key
>> +(define (has-key? line)
>> +  (string-match key-value-rx line))
>> +
>> +(define (comment-line? line)
>> +  (string-match comment-rx line))
>> +
>> +;; returns the number of indentation spaces and the rest of the line.
>> +(define (line-indentation+rest line)
>
> Please turn all the comments above procedures this into docstrings.

Done.

>
>> +;; Part 1 main function: read a cabal fila and filter empty lines and 
>> comments.
>> +;; Returns a list composed by the pre-processed lines of the file.
>> +(define (read-cabal port)
>
> s/fila/file/
> s/Returns/Return/
>
> I would expect ‘read-cabal’ to return a <cabal> record, say, that can be
> directly manipulated (just like ‘read’ returns a Scheme object.)  But
> here it seems to return an intermediate parsing result, right?

OK. I've renamed some functions and the new name should hopefully be a
better choice. What was 'read-cabal' has become 'strip-cabal' (which only
strips comments and empty lines).

>> +;; Parses a cabal file in the form of a list of lines as produced by
>> +;; READ-CABAL and returns its content in the following form:
>> +;;
>> +;; (((head1 sub-head1 ... key1) (value))
>> +;;  ((head2 sub-head2 ... key2) (value2))
>> +;;  ...).
>> +;;
>> +;; where all elements are strings.
>> +;;
>> +;; We try do deduce the format from the following document:
>> +;; https://www.haskell.org/cabal/users-guide/developing-packages.html
>> +;;
>> +;; Key values are case-insensitive. We therefore lowercase them. Values are
>> +;; case-sensitive.
>> +;;
>> +;; Currently only only layout structured files are parsed.  Braces {}
>
> “only indentation-structured files”

OK

>
>> +;; structured files are not handled.
>> +(define (cabal->key-values lines)
>
> I think this is the one I would call ‘read-cabal’.
>

... and I've followed you suggestion and renamed this function to 'read-cabal'.

>> +;; Find if a string represent a conditional
>> +(define condition-rx
>> +  (make-regexp "^if +(.*)$"))
>
> The comment should rather be “Regexp for conditionals.” and be placed
> below ‘define’.

OK

> I would expect the conversion of conditional expressions to sexps to be
> done in the parsing phase above, such that ‘read-cabal’ returns an
> object with some sort of an AST for those conditionals.
>
> Then this part would focus on the evaluation of those conditionals,
> like:
>
>   ;; Evaluate the conditionals in CABAL according to FLAGS.  Return an
>   ;; evaluated Cabal object.
>   (eval-cabal cabal flags)
>
> WDYT?

I think it's better to keep the parsing phase to a bare minimum and work
with layers to keep each function as manageable and simple as possible.

The way it works is as follows:

1. 'strip-cabal' reads the file, strips comment and empty lines and
return a list.  Each element of the list is a line from the Cabal file.

2. 'read-cabal' takes the above list and does the parsing required to
read the indentation based structure of the file. It returns a list
composed of list pairs. The pair is composed by a list of keys and a
list of values. For example, the following snippet

name: foo
version: 1.0
...

is returned as

((("name")    ("foo"))
 (("version") ("1.0"))
 ...)

This is enough for all the information that we need to build a Guix
package, but dependencies.

Dependencies are listed in 'Library' and 'Executable cabal' sections of
the Cabal file as in the following example snippet:

executable cabal
  build-depends:
    HTTP       >= 4000.2.5 && < 4000.3
...

and may include conditionals as in (continuing from above with the
proper indentation)

    if os(windows)
      build-depends: Win32 >= 2 && < 3
    else
      build-depends: unix >= 2.0 && < 2.8
...

Now, to make sense of the indentation based structure I need to keep a
state indicating how many indentation levels we currently have and the
name of the various sub-sections. For this reason I keep a one to one
correspondence between indentation levels and section titles. That means
that the above snipped is encoded as follows in my Cabal object:

((("executable cabal" "build-depends:")                  ("HTTP..."))
 (("executable cabal" "if os(windows)" "build-depends:") ("Win32 ..."))
 (("executable cabal" "else"           "build-depends:") ("unix ..."))
 ...)

If I split 'if' from the predicate 'os(windows)' then the level of
indentation and the corresponding section header loose synchronization
(different length).

For this reason I only convert the predicates from Cabal syntax to
Scheme syntax when I take the list of dependencies and convert the list
in the form expected in a guix package.

I hope to have clarified the strategy.

>> +(define (guix-name name)
>
> Rather ‘hackage-name->package-name’?

OK

>> +;; Split the comma separated list of dependencies coming from the cabal file
>> +;; and return a list with inputs suitable for the GUIX package.  Currently 
>> the
>> +;; version information is discarded.
>
> s/GUIX/Guix/

OK

>> +(define (split-dependencies ls)
>> +  (define (split-at-comma d)
>> +    (map
>> +     (lambda (m)
>> +       (let ((name (guix-name (match:substring m 1))))
>> +         (list name (list 'unquote (string->symbol name)))))
>> +     (list-matches dependencies-rx d)))
>
> I think it could use simply:
>
>   (map string-trim-both
>        (string-tokenize d (char-set-complement (char-set #\,))))

Actually, this wouldn't do.  On top of splitting at commas, the function also
translates the name of the package from hackage-name to guix-name.

>> +;; Genrate an S-expression containing the list of dependencies.  The 
>> generated
>
> “Generate”

Ouch... soo many typos :-(

>> +;; S-expressions may include conditionals as defined in the cabal file.
>> +;; During this process we discard the version information of the packages.
>> +(define (dependencies-cond->sexp meta)
>
> [...]
>
>> +                  (match (match:substring rx-result 1)
>> +                    ((? (cut member <>
>> +                             ;; GUIX names are all lower-case.
>> +                             (map (cut string-downcase <>)
>> +                                  ghc-standard-libraries)))
>
> s/GUIX/Guix/
>
> I find it hard to read.  Typically, I would introduce:
>
>  (define (standard-library? name)
>    (member name ghc-standard-libraries))
>
> and use it here (with the assumption that ‘ghc-standard-libraries’ is
> already lowercase.)

OK, I've followed your advice.  I initially kept ghc-standard-libraries
capitalized as the original name, but it doesn't really make any sense.

>> +;; Run some tests
>> +
>> +;; (display (cabal->key-values
>> +;;           (call-with-input-file "mtl.cabal" read-cabal)))
>> +;; (display (cabal->key-values
>> +;;           (call-with-input-file "/home/beffa/tmp/cabal-install.cabal" 
>> read-cabal)))
>> +;; (display (get-flags (pre-process-entries-keys (cabal->key-values 
>> test-5))))
>> +;; (newline)
>> +;; (display (conditional->sexp-like test-cond-2))
>> +;; (newline)
>> +;; (display
>> +;;  (eval-flags (conditional->sexp-like test-cond-6)
>> +;;              (get-flags (pre-process-entries-keys (cabal->key-values 
>> test-6)))))
>> +;; (newline)
>> +;; (key->values (cabal->key-values test-1) "name")
>> +;; (newline)
>> +;; (key-start-end->entries (cabal->key-values test-4) "Library" 
>> "CC-Options")
>> +;; (newline)
>> +;; (eval-tests (conditional->sexp-like test-cond-6))
>
> This should definitely go to tests/hackage.scm.

As mentioned above, I've created a test-suite for the hackage and added
a couple of thests.

>
>> +  (display (_ "Usage: guix import hackage PACKAGE-NAME
>> +Import and convert the Hackage package for PACKAGE-NAME.  If PACKAGE-NAME
>> +includes a suffix constituted by a dash followed by a numerical version (as
>> +used with GUIX packages), then a definition for the specified version of the
>
> s/GUIX/Guix/

OK

> Also, guix/scripts/import.scm must be added to po/guix/POTFILES.in, for
> i18n.

OK, I've added it.

Thanks for the review!
Fede

0001-import-Add-hackage-importer.patch
Description: Text Data

0002-import-Add-hackage-importer.patch
Description: Text Data

0003-import-Add-hackage-importer.patch
Description: Text Data

0004-import-Add-to-list-of-files-with-translatable-string.patch
Description: Text Data

0005-tests-Add-tests-for-the-hackage-importer.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

hackage importer, Federico Beffa, 2015/03/13
- Re: hackage importer, Ludovic Courtès, 2015/03/15
  - Re: hackage importer, Federico Beffa, 2015/03/15
    - Re: hackage importer, Federico Beffa <=
    - Re: hackage importer, Ludovic Courtès, 2015/03/26
    - Re: hackage importer, Federico Beffa, 2015/03/28
    - Re: hackage importer, Ludovic Courtès, 2015/03/29
    - Re: hackage importer, Federico Beffa, 2015/03/29
    - Re: hackage importer, Ludovic Courtès, 2015/03/31

Prev by Date: Re: [PATCH] gnu: curl: Update to 7.41.0. Fix #20121.
Next by Date: Re: [PATCH 3/3] gnu: Add ghc.
Previous by thread: Re: hackage importer
Next by thread: Re: hackage importer
Index(es):
- Date
- Thread