[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: hackage importer
From: |
Ludovic Courtès |
Subject: |
Re: hackage importer |
Date: |
Thu, 26 Mar 2015 14:09:55 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) |
Federico Beffa <address@hidden> skribis:
> I've created a test for hackage. In doing so I've had an issue with the
> locale and would like an advice: When I run "guix import hackage
> package-name" from the shell, or when I invoke a REPL with Geiser
> everything works fine. When I initially did run the test-suite with
>
> make check TESTS=tests/hackage.scm
>
> the conditionals conversion looked wrong. I've found that this is
> related to the choice of locale. To make the test work as desired I
> added '(setlocale LC_ALL "en_US.UTF-8")' to the file declaring the
> tests.
What’s the exact error you were getting? Character classes in regexps
are locale sensitive, I think, which could be one reason things behave
differently. There’s also the problem that, when running in the C
locale, ‘regexp-exec’ (and other C-implemented procedures) cannot be
passed any string that is not strictly ASCII.
Could you post the actual backtrace you get (?) when running the program
with LC_ALL=C?
[...]
>> I would expect the conversion of conditional expressions to sexps to be
>> done in the parsing phase above, such that ‘read-cabal’ returns an
>> object with some sort of an AST for those conditionals.
>>
>> Then this part would focus on the evaluation of those conditionals,
>> like:
>>
>> ;; Evaluate the conditionals in CABAL according to FLAGS. Return an
>> ;; evaluated Cabal object.
>> (eval-cabal cabal flags)
>>
>> WDYT?
>
> I think it's better to keep the parsing phase to a bare minimum and work
> with layers to keep each function as manageable and simple as possible.
>
> The way it works is as follows:
>
> 1. 'strip-cabal' reads the file, strips comment and empty lines and
> return a list. Each element of the list is a line from the Cabal file.
>
> 2. 'read-cabal' takes the above list and does the parsing required to
> read the indentation based structure of the file. It returns a list
> composed of list pairs. The pair is composed by a list of keys and a
> list of values. For example, the following snippet
>
> name: foo
> version: 1.0
> ...
>
> is returned as
>
> ((("name") ("foo"))
> (("version") ("1.0"))
> ...)
OK. I would rather have ‘read-cabal’ take an input port (like Scheme’s
‘read’) and return the list above; this would be the least surprising,
more idiomatic approach. ‘strip-cabal’ (or
‘strip-insignificant-lines’?) would be an internal procedure used only
by ‘read-cabal’.
WDYT?
> This is enough for all the information that we need to build a Guix
> package, but dependencies.
>
> Dependencies are listed in 'Library' and 'Executable cabal' sections of
> the Cabal file as in the following example snippet:
>
> executable cabal
> build-depends:
> HTTP >= 4000.2.5 && < 4000.3
> ...
>
> and may include conditionals as in (continuing from above with the
> proper indentation)
>
> if os(windows)
> build-depends: Win32 >= 2 && < 3
> else
> build-depends: unix >= 2.0 && < 2.8
> ...
>
> Now, to make sense of the indentation based structure I need to keep a
> state indicating how many indentation levels we currently have and the
> name of the various sub-sections. For this reason I keep a one to one
> correspondence between indentation levels and section titles. That means
> that the above snipped is encoded as follows in my Cabal object:
>
> ((("executable cabal" "build-depends:") ("HTTP..."))
> (("executable cabal" "if os(windows)" "build-depends:") ("Win32 ..."))
> (("executable cabal" "else" "build-depends:") ("unix ..."))
> ...)
>
> If I split 'if' from the predicate 'os(windows)' then the level of
> indentation and the corresponding section header loose synchronization
> (different length).
Would it be possible for ‘read-cabal’ to instead return a
tree-structured sexp like:
(if (os windows)
(build-depends (Win32 >= 2 && < 3))
(build-depends (unix >= 2.0 && < 2.8)))
That would use a variant of ‘conditional->sexp-like’, essentially.
(Of course the achieve that the parser must keep track of indentation
levels and everything, as you already implemented; I’m just commenting
on the interface here.)
Then, if I could imagine:
(eval-cabal '(("name" "foo")
("version" "1.0"
("executable cabal" (if (os windows) ...)))
=> #<cabal-package name: "foo" dependencies: '(unix)>
This way the structure of the Cabal file would be preserved, only
converted to sexp form, which is easier to work with.
Does that make sense?
> For this reason I only convert the predicates from Cabal syntax to
> Scheme syntax when I take the list of dependencies and convert the list
> in the form expected in a guix package.
>
> I hope to have clarified the strategy.
>
>>> +(define (guix-name name)
>>
>> Rather ‘hackage-name->package-name’?
>
> OK
>
>>> +;; Split the comma separated list of dependencies coming from the cabal
>>> file
>>> +;; and return a list with inputs suitable for the GUIX package. Currently
>>> the
>>> +;; version information is discarded.
>>
>> s/GUIX/Guix/
>
> OK
>
>>> +(define (split-dependencies ls)
>>> + (define (split-at-comma d)
>>> + (map
>>> + (lambda (m)
>>> + (let ((name (guix-name (match:substring m 1))))
>>> + (list name (list 'unquote (string->symbol name)))))
>>> + (list-matches dependencies-rx d)))
>>
>> I think it could use simply:
>>
>> (map string-trim-both
>> (string-tokenize d (char-set-complement (char-set #\,))))
>
> Actually, this wouldn't do. On top of splitting at commas, the function also
> translates the name of the package from hackage-name to guix-name.
Right, but then it’s best to have two separate procedures and two
compose them:
(map (compose hackage-name->package-name string-trim-both)
(string-tokenize d (char-set-complement (char-set #\,))))
> From 231ea64519505f84e252a4b3ab14d3857a9374c2 Mon Sep 17 00:00:00 2001
> From: Federico Beffa <address@hidden>
> Date: Sat, 7 Mar 2015 17:23:14 +0100
> Subject: [PATCH 1/5] import: Add hackage importer.
>
> * guix/scripts/import.scm (importers): Add hackage.
[...]
> From ad79b3b0bc8b08ec98f49b665b32ce9259516713 Mon Sep 17 00:00:00 2001
> From: Federico Beffa <address@hidden>
> Date: Sat, 7 Mar 2015 17:30:17 +0100
> Subject: [PATCH 2/5] import: Add hackage importer.
>
> * guix/scripts/import/hackage.scm: New file.
These two should be one patch, along with the modification of
POTFILES.in, because these 3 things correspond to a single logical
change–the addition of a sub-command and associated infrastructure. A
fourth part is also needed: the addition of a couple of paragraphs in
guix.texi.
> From 447c6b8f1ee6cc1d4edba44cabef1b28b75c7608 Mon Sep 17 00:00:00 2001
> From: Federico Beffa <address@hidden>
> Date: Sun, 8 Mar 2015 07:48:38 +0100
> Subject: [PATCH 3/5] import: Add hackage importer.
>
> * guix/import/hackage.scm: New file.
Please add tests/hackage.test as part of the same patch.
> +;; (use-modules (ice-9 match))
> +;; (use-modules (ice-9 regex))
> +;; (use-modules (ice-9 rdelim))
> +;; (use-modules (ice-9 receive))
> +;; (use-modules (ice-9 pretty-print))
> +;; (use-modules (srfi srfi-26))
> +;; (use-modules (srfi srfi-11))
> +;; (use-modules (srfi srfi-1))
Debugging leftover?
Nothing to add to what I wrote above.
[...]
> +(define test-cabal-1
> + "name: foo
> +version: 1.0.0
> +homepage: http://test.org
> +synopsis: synopsis
> +description: description
> +license: BSD3
> +executable cabal
> + build-depends:
> + HTTP >= 4000.2.5 && < 4000.3,
> + mtl >= 2.0 && < 3
> +")
> +
> +(define test-cond-1
> + "(os(darwin) || !(flag(debug))) && flag(cips)")
> +
> +(define read-cabal
> + (@@ (guix import hackage) read-cabal))
> +
> +(define strip-cabal
> + (@@ (guix import hackage) strip-cabal))
> +
> +(define eval-tests
> + (@@ (guix import hackage) eval-tests))
> +
> +(define eval-impl
> + (@@ (guix import hackage) eval-impl))
> +
> +(define eval-flags
> + (@@ (guix import hackage) eval-flags))
> +
> +(define conditional->sexp-like
> + (@@ (guix import hackage) conditional->sexp-like))
I think it would be fine to export ‘read-cabal’ and ‘eval-cabal’ once
these two have the desired interface.
> +(test-begin "hackage")
> +
> +(test-assert "hackage->guix-package"
> + ;; Replace network resources with sample data.
> + (mock
> + ((guix import hackage) hackage-fetch
> + (lambda (name-version)
> + (read-cabal
> + (call-with-input-string test-cabal-1
> + strip-cabal))))
> + (match (hackage->guix-package "foo")
> + (('package
> + ('name "ghc-foo")
> + ('version "1.0.0")
> + ('source
> + ('origin
> + ('method 'url-fetch)
> + ('uri ('string-append
> + "http://hackage.haskell.org/package/foo/foo-"
> + 'version
> + ".tar.gz"))
> + ('sha256
> + ('base32
> + (? string? hash)))))
> + ('build-system 'haskell-build-system)
> + ('inputs
> + ('quasiquote
> + (("ghc-http" ('unquote 'ghc-http))
> + ("ghc-mtl" ('unquote 'ghc-mtl)))))
> + ('home-page "http://test.org")
> + ('synopsis (? string?))
> + ('description (? string?))
> + ('license 'bsd-3))
> + #t)
Nice.
With the adjusted semantics, there could be lower-level tests:
(test-equal "read-cabal"
'(("name" "foo") ...)
(call-with-input-string test-cabal-1 read-cabal))
and:
(test-assert "eval-cabal, conditions"
(let ((p (eval-cabal '(("name" "foo")
("executable cabal" (if (os windows) ...))))))
(and (cabal-package? p)
(string=? "foo" (cabal-package-name p))
(equal? '(unix) (cabal-package-dependencies p)))))
Looks like we’re almost there. I hope the above makes sense!
Thank you,
Ludo’.
- hackage importer, Federico Beffa, 2015/03/13
- Re: hackage importer, Ludovic Courtès, 2015/03/15
- Re: hackage importer, Federico Beffa, 2015/03/15
- Re: hackage importer, Federico Beffa, 2015/03/22
- Re: hackage importer,
Ludovic Courtès <=
- Re: hackage importer, Federico Beffa, 2015/03/28
- Re: hackage importer, Ludovic Courtès, 2015/03/29
- Re: hackage importer, Federico Beffa, 2015/03/29
- Re: hackage importer, Ludovic Courtès, 2015/03/31