[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bibulus-dev] Formatting bibliography entries
From: |
Thomas Widmann |
Subject: |
[Bibulus-dev] Formatting bibliography entries |
Date: |
Thu, 06 May 2004 20:14:59 +0100 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux) |
Hello, everybody,
although what follows is a bit long, I hope you will all read it and
comment on it. It concerns one of the areas most difficult to get
right.
BibTeX styles tend to be organised along the following lines: There is
a function for each entry type, and each function contains a sequence
for formatting instructions. An example from plain.bst in a syntax
that should make it easier to understand:
FUNCTION {article}
{
bibitem
authors
new_block
title
new_block
if missing(crossref)
{
emphasize(journal)
vol_num_pages
date_as_year
}
else
{
article_crossref
pages
}
new_block
note
fin_entry
}
I see two main problems with this approach:
1) Many entry types are very similar, but that is not expressed at all
(except for certain functions like vol_num_pages that are used by
more than one entry-formatting function).
2) In the example above, there is one *if* clause. However, if all
the bells and whistles of Custom-bib are implemented, the basic
structure is totally obscured by nested conditionals.
All of this means it becomes very difficult to implement and maintain.
I learnt this the hard way half a year ago when I tried to make one
massive formatting function. Basically I ended up with nearly a
thousand lines of stuff like this:
if ($self->{STYLE}{datepos} ne 'afterauthor'
and $self->{STYLE}{datepos} ne 'afternotes'
and $self->{STYLE}{datepos} ne 'endbutjournal'
and $self->{STYLE}{yearafternumber} ne 'space'
and $self->{STYLE}{yearafternumber} ne 'comma'
and $self->{STYLE}{yearaftervolume} ne 'spaceparentheses'
and $self->{STYLE}{yearaftervolume} ne 'parentheses') {
$self->formatdateasyear;
}
I'm therefore increasingly of the opinion that we have to come up with
a new formatting model. What do you all think about the following?
Basically, every element that is output has three associated
functions: location, punctuation and formatting. That is, to deal
with the <url> element, three functions would be involved:
sub l_url { ... }
The location function should determine where within the reference
this element should be placed (if output at all). I'm in doubt how
this should work -- I presume the end result should be a list, e.g.,
['author', 'year', 'title', 'journal', 'pages', 'note'], but I'm not
at all sure what the best way to arrive at this would be.
I can think of at least two approaches:
- Each l_ function could return relative elements, such as 'before
author', 'after journal' or 'at end', and we would then have to
write a function to make sense of this. However, this makes it
very important to call the l_ functions in the right order (to
take a simple case, just imagine if they all want to go 'at
end').
- Each l_ function could return a number, and afterwards we would
just sort on this field. That is, l_author might return 10,
l_title 500, and l_year 250 or 850 depending on the formatting
style. In this case, we get problems if two l_ function return
the same number, but that need never happen. I tend to favour
this approach, not least because it makes it very easy to add
additional fields.
sub p_url { ... }
This function should basically return the desired punctuation before
and after this element (by punctuation I mean 'new block', 'new
sentence', 'comma' and 'space' and such things). There would be a
hierarchy, so that if the element to the left asks for a new block
while the one to the right just wants a comma, the new block would
win.
I'm not sure whether these should be separate functions, but it
doesn't really seem to fit in well with the location functions.
sub f_url { ... }
The formatting function would be normally be defined by the output
modules. For instance, Bibulus::LaTeX would probably output the
contents wrapped in \url{...} or perhaps \texttt{...}, while
Bibulus::HTML would wrap it in an <a href="..."></a>.
To sum up, there would be three passes when formatting each reference:
1) The l_ functions would be called in random order, followed by a
sort. The result would be a list of elements.
2) The p_ functions would be called on each element of the list,
inserting punctuation elements into it.
3) The f_ functions would be called in turn on each element, actually
outputting things.
Taken together, this would mean that if a user wants to output the url
before everything else, they would just write the following (if using
the numerical location option described above):
sub l_url {
return 1;
}
Furthermore, it makes it very easy to extend the system. Let's say
that a user wants an extra field containing the library in which the
item is found. All they would have to do would be to extend the DTD
with <library>, add the relevant data to their XML files, and then
provide the functions l_library, p_library and f_library.
The only major problem I can think of is that it might be difficult to
group and split things -- that is, if two fields should be output
together, or if a field should be output twice. I'm not sure whether
this would be a huge problem, though. Hmmm, thinking about it again,
this might not even be a problem since the grouping and splitting can
be done by manipulating the XML tree before calling the formatting
functions.
I'm looking forward to hearing from you!
/Thomas
--
Thomas Widmann Bye-bye to BibTeX: join the Bibulus project now!
address@hidden <http://www.nongnu.org/bibulus/>
Glasgow, Scotland, EU <http://savannah.nongnu.org/projects/bibulus/>
- [Bibulus-dev] Formatting bibliography entries,
Thomas Widmann <=