[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: hackage importer
From: |
Ludovic Courtès |
Subject: |
Re: hackage importer |
Date: |
Sat, 02 May 2015 14:48:10 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Federico Beffa <address@hidden> skribis:
> Please find attached a patch reorganizing the code as you suggest.
Woow, neat! Impressive work. I think this is a great improvement.
I have a bunch of stylistic comments below, and some open questions as
well.
> From bc8cdab1e322a25002a3d9cf33eddd856c8a81d8 Mon Sep 17 00:00:00 2001
> From: Federico Beffa <address@hidden>
> Date: Sun, 26 Apr 2015 11:22:29 +0200
> Subject: [PATCH] import: hackage: Refactor parsing code and add new option.
>
> * guix/import/cabal.scm: New file.
>
> * guix/import/hackage.scm: Update to use the new Cabal parsing module.
>
> * tests/hackage.scm: Update tests for private functions.
>
> * guix/scripts/import/hackage.scm: Add new '--cabal-environment' option.
>
> * doc/guix.texi: ... and document it.
>
> * Makefile.am (MODULES): Add 'guix/import/cabal.scm',
> 'guix/import/hackage.scm' and 'guix/scripts/import/hackage.scm'.
> (SCM_TESTS): Add 'tests/hackage.scm'.
No newlines between entries.
[...]
> +;; This record stores the state information needed during parsing of Cabal
> +;; files.
> +(define-record-type <cabal-parse-state>
> + (make-cabal-parse-state lines minimum-indent indents conditionals
> + true-group? true-group false-group
> + true-group?-stack true-group-stack
> false-group-stack)
[...]
> + (make-cabal-parse-state lines -1 '() '() #t '() '() '() '()
> '())))
I’m not claiming this must done now, but it may improve readability to
use ‘define-record-type*’. That way, with appropriate field default
values, one could write something like:
(cabal-parse-state
(lines lines))
That would also allow the use of ‘inherit’, which is slightly less
verbose than ‘set-fields’. WDYT?
Besides, could you add comments to explain the meaning of the various
fields? I’m particularly curious about ‘true-group?’ & co. ;-)
> +(define (parse-cabal result)
> + "Parse a Cabal file and append its content to RESULT (a list). Return the
> +updated result as a monadic value in the state monad."
> + (mlet* %state-monad ((state (current-state)))
> + (match state
> + (($ <cabal-parse-state> lines minimum-indent indents conditionals
> + true-group? true-group false-group
> + true-group?-stack true-group-stack
> + false-group-stack)
> + (let*-values
> + (((current-indent line)
> + (if (null? lines)
> + (values 0 "")
> + (line-indentation+rest (first lines))))
> + ((next-line-indent next-line)
> + (if (or (null? lines) (null? (cdr lines)))
> + (values 0 "")
> + (line-indentation+rest (second lines))))
> + ((key-value-rx-result) (has-key? line))
> + ((end-of-file?) (null? lines))
> + ((is-simple-key-value?) (and (= next-line-indent current-indent)
> + key-value-rx-result))
> + ((is-multi-line-key-value?) (and (> next-line-indent
> current-indent)
> + key-value-rx-result))
> + ((key) (and=> key-value-rx-result
> + (lambda (rx-res)
> + (string-downcase (match:substring rx-res 1)))))
> + ((value) (and=> key-value-rx-result (cut match:substring <> 2))))
> + (cond
> + (end-of-file? (return (reverse result)))
> + (is-simple-key-value?
> + (>>= (state-add-entry (list key `(,value)) result (cdr lines))
> + parse-cabal))
> + (is-multi-line-key-value?
> + (let*-values
> + (((value-lst lines)
> + (multi-line-value (cdr lines)
> + (if (string-null? value) '() `(,value)))))
> + (>>= (state-add-entry (list key value-lst) result lines)
> + parse-cabal)))
> + (else ; it's a section
> + (let* ((section-header (string-tokenize (string-downcase line)))
> + (section-type (string->symbol (first section-header)))
> + (section-name (if (> (length section-header) 1)
> + (second section-header)
> + "")))
> + (mbegin %current-monad
> + (set-current-state
> + (set-fields state
> + ((cabal-parse-state-minimum-indent)
> current-indent)
> + ((cabal-parse-state-lines) (cdr lines))))
> + (>>=
> + (>>= (parse-cabal-section '())
> + (lambda (section-contents)
> + (mlet* %state-monad ((state (current-state)))
> + (mbegin %current-monad
> + (set-current-state
> + (set-fields state
> + ((cabal-parse-state-minimum-indent)
> -1)))
> + (return
> + (cons (append
> + (if (string-null? section-name)
> + (list 'section section-type)
> + (list 'section section-type
> section-name))
> + (list section-contents))
> + result))))))
> + parse-cabal))))))))))
This procedure is intimidating. I think this is partly due to its
length, to the big let-values, the long identifiers, the many local
variables, nested binds, etc.
Would it be possible to create auxiliary procedures that would help?
I’m thinking of procedures that take a <cabal-parse-state> and return
the necessary info, like ‘cabal-parse-state-indentation’,
‘cabal-parse-state-key’, ‘cabal-parse-state-multiline?’,
‘cabal-parse-state-eof?’, etc. WDYT?
Also, please try hard to avoid car and cdr and use ‘match’ instead.
(BTW it’s a good idea to use the state monad here!)
> +(define (parse-cabal-section result)
> + "Parse a section of a cabal file and append its content to RESULT (a list).
> +Return the updated result as a value in the state monad."
> + (mlet* %state-monad ((state (current-state)))
> + (match state
> + (($ <cabal-parse-state> lines minimum-indent indents conditionals
> + true-group? true-group false-group
> + true-group?-stack true-group-stack
> + false-group-stack)
> + (let*-values
> + (((current-indent line)
> + (if (null? lines)
> + (values 0 "")
> + (line-indentation+rest (first lines))))
> + ((next-line-indent next-line)
> + (if (or (null? lines) (null? (cdr lines)))
> + (values 0 "")
> + (line-indentation+rest (second lines))))
> + ((key-value-rx-result) (has-key? line))
> + ((end-of-section?) (or (<= current-indent minimum-indent)
> + (null? lines)))
> + ;; If this is the last line of the section, then it can't be the
> + ;; start of a conditional or an 'else'.
> + ((last-line-of-section?) (<= next-line-indent minimum-indent))
> + ((is-simple-key-value?) (or (and (= next-line-indent
> current-indent)
> + key-value-rx-result)
> + (and (pair? conditionals)
> + (= next-line-indent (first
> indents))
> + (string-prefix? "else"
> next-line))))
> + ((is-multi-line-key-value?) (and (> next-line-indent
> current-indent)
> + key-value-rx-result))
> + ((end-of-cond?)
> + (and (pair? conditionals)
> + (or (and (= next-line-indent (first indents))
> + (not (string-prefix? "else" next-line)))
> + (< next-line-indent (first indents)))))
> + ((is-else?) (and (pair? conditionals)
> + (= current-indent (first indents))
> + (string-prefix? "else" line)))
> + ((condition) (cabal-conditional-line->sexp line))
> + ((key) (and=> key-value-rx-result
> + (lambda (rx-res)
> + (string-downcase (match:substring rx-res 1)))))
> + ((value) (and=> key-value-rx-result
> + (cut match:substring <> 2))))
> + (cond
> + (end-of-section?
> + (if (pair? indents)
> + (state-reduce-indentation (1- (length indents)) #f result
> lines)
> + (return result)))
> + (last-line-of-section?
> + (if (pair? indents)
> + (state-reduce-indentation
> + (1- (length indents)) (list key `(,value)) result (cdr
> lines))
> + (mbegin %current-monad
> + (set-current-state
> + (set-fields state ((cabal-parse-state-lines) (cdr lines))))
> + (return (cons (list key `(,value)) result)))))
> + (is-simple-key-value?
> + (>>= (state-add-entry (list key `(,value)) result (cdr lines))
> + parse-cabal-section))
> + (is-multi-line-key-value?
> + (let*-values
> + ;; VALUE-LST is the full multi-line value and LINES are the
> + ;; remaining lines to be parsed (from the line following the
> + ;; multi-line value). We need to check if we are at the end
> of
> + ;; a conditional or at the end of the section.
> + (((value-lst lines)
> + (multi-line-value (cdr lines)
> + (if (string-null? value) '() `(,value))))
> + ((ind line) (if (null? lines)
> + (values 0 "")
> + (line-indentation+rest (first lines))))
> + ((end-of-cond?) (and (pair? conditionals)
> + (or (and (= ind (first indents))
> + (not (string-prefix? "else"
> line)))
> + (< ind (first indents)))))
> + ;; If IND is not in INDENTS, assume that we are at the end of
> + ;; the section.
> + ((idx) (or (and=>
> + (list-index (cut = ind <>) indents)
> + (cut + <> (if (string-prefix? "else" line) -1
> 0)))
> + (1- (length indents)))))
> + (if end-of-cond?
> + (>>= (state-reduce-indentation idx (list key value-lst)
> + result lines)
> + parse-cabal-section)
> + (>>= (state-add-entry (list key value-lst) result lines)
> + parse-cabal-section))))
> + (end-of-cond?
> + (let ((idx (+ (list-index (cut = next-line-indent <>) indents)
> + (if (string-prefix? "else" next-line) -1 0))))
> + (>>= (state-reduce-indentation idx (list key `(,value)) result
> + (if (pair? lines) (cdr lines) '()))
> + parse-cabal-section)))
> + (is-else?
> + (mbegin %current-monad
> + (set-current-state
> + (set-fields state
> + ((cabal-parse-state-lines) (cdr lines))
> + ((cabal-parse-state-true-group?) #f)))
> + (parse-cabal-section result)))
> + (condition
> + (mbegin %current-monad
> + (state-add-conditional condition current-indent)
> + (parse-cabal-section result)))))))))
This one is also very intimidating and it seems to duplicate some of the
code of the previous one, so maybe the propose <cabal-state> procedures
will help here as well.
> +(define (state-reduce-indentation index entry result lines)
s/reduce/decrease/
> + "Given RESULT, if ENTRY is not #f, add it as appropriate and return the
> +updated result as a value in the state monad. Update the state according to
> +the reduction of the indentation level specified by INDEX, an index of an
s/reduction/decrease/
> +entry in the 'indentations' field of the state.
Could you explain what RESULT and ENTRY are? Also, it seems to do two
different things: compute a value function of RESULT and ENTRY, and
update the current indentation. Should it be two separate procedures?
> As an example, if there are
> +two nested conditional levels, the first starting at indentation 2 and the
> +second at indentation 4, then the 'indentations' field of state is '(4 2) and
> +an INDEX value of 0 means that the second conditional is finished. Set the
> +remaining lines to be parsed to LINES."
> + (lambda (state)
> + (match state
> + (($ <cabal-parse-state> _ minimum-indent indents conditionals
> + true-group? true-group false-group
> + true-group?-stack true-group-stack
> + false-group-stack)
> + ;; The suffix '-d' stays for 'drop'.
> + (let*-values (((inds-d inds) (split-at indents (1+ index)))
> + ((conds-d conds) (split-at conditionals (1+ index)))
> + ((t-g?-s-d t-g?-s)
> + (if (> (length true-group?-stack) index)
> + (split-at true-group?-stack (1+ index))
> + (values true-group?-stack '())))
> + ((t-g-s-d t-g-s)
> + (if (> (length true-group-stack) index)
> + (split-at true-group-stack (1+ index))
> + (values true-group-stack '())))
> + ((f-g-s-d f-g-s)
> + (if (> (length false-group-stack) index)
> + (split-at false-group-stack (1+ index))
> + (values false-group-stack '())))
> + ((t-g?)
> + (if (> (length true-group?-stack) index)
> + (last t-g?-s-d) #t))
> + ((t-g) (if (and true-group? entry)
> + (cons entry true-group)
> + true-group))
> + ((f-g) (if (or true-group? (not entry))
> + false-group
> + (cons entry false-group)))
> + ((res) result))
> + (let reduce-by-one ((conds-d conds-d) (t-g t-g) (f-g f-g) (res res)
> + (t-g?-s-d t-g?-s-d) (t-g-s-d t-g-s-d)
> + (f-g-s-d f-g-s-d))
This is somewhat scary ;-) but I’m not sure how to improve it.
> + (values '*unspecified*
Did you mean *unspecified*, without the quote, which evaluates to *the*
unspecified value?
> +(define-record-type <cabal-package>
> + (make-cabal-package name version license home-page source-repository
> + synopsis description
> + executables lib test-suites
> + flags eval-environment)
> + cabal-package?
> + (name cabal-package-name)
> + (version cabal-package-version)
> + (license cabal-package-license)
> + (home-page cabal-package-home-page)
> + (source-repository cabal-package-source-repository)
> + (synopsis cabal-package-synopsis)
> + (description cabal-package-description)
> + (executables cabal-package-executables)
> + (lib cabal-package-library) ; 'library' is a Scheme keyword
There are no keyboards in Scheme. :-)
[...]
> + (define (impl haskell)
> + (let* ((haskell-implementation (or (assoc-ref env "impl") "ghc"))
> + (impl-rx-result-with-version
> + (string-match "([a-zA-Z0-9_]+)-([0-9.]+)"
> haskell-implementation))
> + (impl-name (or (and=> impl-rx-result-with-version
> + (cut match:substring <> 1))
> + haskell-implementation))
> + (impl-version (and=> impl-rx-result-with-version
> + (cut match:substring <> 2)))
> + (cabal-rx-result-with-version
> + (string-match "([a-zA-Z0-9_-]+) *([<>=]+) *([0-9.]+) *" haskell))
> + (cabal-rx-result-without-version
> + (string-match "([a-zA-Z0-9_-]+)" haskell))
> + (cabal-impl-name (or (and=> cabal-rx-result-with-version
> + (cut match:substring <> 1))
> + (match:substring
> + cabal-rx-result-without-version 1)))
> + (cabal-impl-version (and=> cabal-rx-result-with-version
> + (cut match:substring <> 3)))
> + (cabal-impl-operator (and=> cabal-rx-result-with-version
> + (cut match:substring <> 2)))
> + (comparison (and=> cabal-impl-operator
> + (cut string-append "string" <>))))
Again I feel we need one or more auxiliary procedures and/or data types
here to simplify this part (fewer local variables), as well as shorter
identifiers. WDYT?
> --- a/tests/hackage.scm
> +++ b/tests/hackage.scm
> @@ -63,16 +63,13 @@ executable cabal
> ")
>
The existing tests here are fine, but they are more like integration
tests (they test the whole pipeline.) Maybe it would be nice to
directly exercise ‘read-cabal’ and ‘eval-cabal’ individually?
Thanks for all of this!
Ludo’.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: hackage importer,
Ludovic Courtès <=