[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
lout2bibtex -- just to make it certainly available
From: |
Matej Cepl |
Subject: |
lout2bibtex -- just to make it certainly available |
Date: |
Mon, 4 Sep 2000 09:30:46 +0400 (MSD) |
Hi,
just, when I looked for lout2bibtex script I couldn't anything simple
enough for lawyer. Than later on I have found one (to the best of
my knowledge not available on the Web) in awk. For future
unfortunates, I would like to send it to this conference to make it
easily available to anybody (and myself, if I again forgett where I put
my scripts :-).
Have a nice day
Matej
#!/bin/gawk -f
# Convert lout entries into bibtex style entries, taking care of some
# lout -> LaTeX/TeX text changes
#
# David Middleton (address@hidden)
# 10 Dec 1996
#
# Based on bibtex2refer by
# Doug Arnold (address@hidden)
# 10 May 1995
#
# who says (quote):
#
# Inspired by bibtex2ref by Bernd Fritzkes
# (address@hidden)
#
#
# This file can be freely copied, and changed
# so long as you preserve my name on it
# and indicate any changes you make.
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# No guarantees for the correctness is given and no 'support' is provided
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#
# unquote. Too many changes to annotate sensibly. Diffs available on request.
#
#
# usage: lout2bibtex file.ld > file.bib
#
# for testing use:
#
# lout2bibtex test=1 file.ld > file.bib
#
#
# If lout2bibtex encounters unknown field names, it will ignore them but store
their
# names in a list, which can be displayed after the conversion process (cf
testing).
#
# With this list the program can easily be extended by adding entries to the
# associative array "bibtex".
#
#
#################################################################################
#
# The basic procedure is to read through the lout file, taking entries to begin
# with a line like { @Reference... and to end with a blank line.
# the entry is read into an array indexed by the lout field name (e.g. @Author)
#
# the individual field values are assembled and translated into tex
# (taking care of fonts, etc.) when the end of any entry is found
# the array is printed out, using bibtex fields
#
# Notes:
# This is basically a hack. It works with a some example files I have, but
# probably won't work with all. It does not check for matching {} etc.
#
# Not all of the origibal bibtex2refer code has yet been translated so some bits
# simply remain commented out. Special characters not handled yet.
#
#######################################################################
# ----------------- function defintions ------------------------------#
#######################################################################
# edit (i.e. format) and output an entry
function output_entry(entry){
if (entry["@Tag"]){ # ignore empty ref generated by main
prog
doneentries++
printf "\n"
gsub(/\{/,"",entry["@Type"]) # lout citation types are single words,
so remove
gsub(/\}/,"",entry["@Type"]) # any braces
gsub(/ /,"",entry["@Type"]) # or spaces
printf "%s{", citetype[entry["@Type"]]
printf "%s,\n", bibtexify(entry["@Tag"])
for ( f in entry ){
if (!(unknown[f])){
# sub(/, *$/ , "", entry[f]) # deleting commas at end of line
# cash out abbreviations:
# for (a in abbrev){
# if ( entry[f] == a ){ entry[f] = abbrev[a] }
# }
if ( f == "author" ){ fix_authors( entry[f] ) } # fix and print
authors
if ( f == "editor" ){ fix_editors( entry[f] ) } # fix and print
editors
if ( f == "pages" ){ sub("--", "-", entry[f]) } # is this
necessary ??
if (! ( f == "author" || f == "editor" )){
printf " %s ", bibtex[f]
printf "= {"
printf "%s", bibtexify(entry[f])
printf "},\n"
}
}
delete entry[f]
}
printf "}\n"
}
}
# bibtexify:
# turn the text in entries into bibtex
function bibtexify( ent ) {
sub(/}*$/,"",ent) # deleting the brace marking end of record
sub(/^*{/,"",ent) # deleting the brace marking start of record
# fix TeX accents (no braces):
# ent = tex_accent( ent )
# fix accents, font changes, etc. from LaTeX:
ent = fix_fonts( ent )
# fix the non-breaking space character: ~ -> \0
# when tilde is not an accent mark (by now an nroff mark)
while ( match(ent, /[^\*]~/) ){
ent = substr(ent,0,RSTART) "\\0" substr(ent,RSTART+RLENGTH,length(ent))
}
sub(/"\) *$/ , "", ent) # deleting possible record final ")"
sub(/" *$/ , "", ent) # deleting " at end of line
sub(/^[ ]*"/, "", ent) # deleting " at start of line
sub(/^*[ ]/,"",ent) # delete leading spaces
sub(/[ ]*$/,"",ent) # delete trailing spaces
# this does not look very good, so don't do it:
# gsub(/--/, "\\(em", ent) # dashes, etc.
# gsub(/--/, "-", ent) # might be better?
gsub(/\\-/, "-", ent) # hyphenation commands
gsub(/\\\//, "", ent) # the \/ construct
return( ent )
}
# fix_curlies:
# deal with stuff set off by non-intersecting curly brackets
# such as font changes and LaTeX accents:
# Find a minimal string of the form {....} -- the "target"
# extract it, edit it, and splice it back in.
function fix_fonts( ln, pre, post, target ){
gsub(/@I[ ]\{/, "{\\it ", ln )
gsub(/@I\{/, "{\\it ", ln )
gsub(/@F\{/, "{\\tt ", ln )
gsub(/@F[ ]\{/, "{\\tt ", ln )
gsub(/@B\{/, "{\\bf ", ln )
gsub(/@B[ ]\{/, "{\\bf ", ln )
# if ( sub(/{\\sc[ ]*/, "\\s-2", target ) ){
# sub(/}/, "\\s+2", target) }
# sub(/\\i/, "i", target) # \i construct
# sub(/\\o/, "o", target) # norwegian "o"
# b. LaTeX accents: accents are marked in the "pre" part,
# and must be transposed to "post" part
# if ( sub(/\\'$/, "", pre) ){ post = "\\\*'" post }
# if ( sub(/\\`$/, "", pre) ){ post = "\\\*`" post }
# if ( sub(/\\\^/, "", pre) ){ post = "\\\*\^" post }
# if ( sub(/\\~$/, "", pre) ){ post = "\\\*~" post }
# if ( sub(/\\"$/, "", pre) ){ post = "\\\*:" post }
# if ( sub(/\\c$/, "", pre) ){ post = "\\\*," post }
return(ln)
}
# tex_accent:
# handle TeX accents, e.g. \'e, \~n, etc. (i.e. no braces)
function tex_accent( ln , pre, post, accent, letter){
sub(/\\i/, "i", ln) # the \i construct
while ( match(ln, /\\['`^~"c][A-z]/ ) ){
pre = substr(ln,1,RSTART-1) # before accent+letter
post = substr(ln,RSTART+RLENGTH,length(ln)) # after accent+letter
accent=substr(ln,RSTART,RLENGTH-1) # accent itself
sub(/\\/, "&\*", accent) # insert "*"
sub(/c/ , "," , accent) # cedilla c -> ","
sub(/"/ , ":" , accent) # umlaut
letter=substr(ln,RSTART+RLENGTH-1,1) # accented letter
ln = pre letter accent post
}
return(ln)
}
# handle multiple editors, and deal with editors of collections
# when the whole book is being referenced (i.e. they are the
# "authors" as far as refer is concerned
function fix_editors(ent, eds, ed){
eds = split(ent, ed, "[ ]+and[ ]+")
if ( entrytype == "@book" ){ # add eds./ed. at end and treat as authors
if ( eds > 1 ){ sub(/"$/, ", eds.\"", ent) }
else { sub(/"$/, ", ed.\"", ent) }
fix_authors( ent )
} # otherwise, keep them on one line
# but add commas and "and" as necessary
else{
printf "%%E %s", bibtexify( ed[1] ) # single ed.
if ( eds >= 2 ){ # more than one
ed
for ( i=2 ; i < eds-1 ; i++ ){
printf ", %s", bibtexify( ed[i] ) # up to penult
ed
}
printf " and %s", bibtexify( ed[eds] ) # last ed
}
printf "\n"
}
}
# fix_authors:
# deal with all the complexitites of how BiBtex represents
# multiple authors by separating them with "and"
function fix_authors(ent, author){
# split the authors as author[1], author[2], etc.
authmax = split(ent, author, "[ ]+and[ ]+")
for ( a=1 ; a <= authmax ; a++ ){
printf "%%A %s\n", fix_name( bibtexify( author[a] ))
}
}
# The complexities of authors names:
# deal with the complexities of each author, assumed to be
# a piece of nroff text now; it consists of "components",
# separated by commas (e.g. "Andrews, III, Avery" )
# these components may need to be reordered
# (this will go wrong with "J. Smith, Jr. III", no doubt)
function fix_name( author, last, auth){
last = split(author, name, ", *") # split into components names
if ( last == 1 ){ return(author) } # no commas, no problems
# similarly if the last component
# is Jr. or something
else { if ( name[last] in Jrs ){ return(author) }
else { auth = name[last] # final component first
for (n = 1 ; n < last ; n++ ){
if ( name[n] in Jrs ){
auth = auth ", " name[n] }
else { auth = auth " " name[n] }
}
return (auth)
}
}
}
#--------------Program Begins----------------------------------------
BEGIN {
FS = " "
# # citekey is not really a bibtex field, I use it
# # to store the "key" or \cite of the record
# bibtex["@Tag"] = "citekey"
# lout keywords and associated bibtex fields
bibtex["@Address"] = "address"
bibtex["@Author"] = "author"
bibtex["@InAuthor"] = "editor"
bibtex["@InTitle"] = "booktitle"
bibtex["@Institution"] = "institution"
bibtex["@Journal"] = "journal"
bibtex["@Keywords"] = "keywords"
bibtex["@Note"] = "note"
bibtex["@Number"] = "number"
bibtex["@Organization"] = "organization"
bibtex["@Pages"] = "pages"
bibtex["@Publisher"] = "publisher"
bibtex["@Title"] = "title"
bibtex["@TitleNote"] = "series"
bibtex["@Volume"] = "volume"
bibtex["@Year"] = "year"
#
# Citation types - not all the mappings chosen may be the most
appropriate
citetype["Article"] = "@article"
citetype["PhDThesis"] = "@phdthesis"
citetype["MastersThesis"] = "@mastersthesis"
citetype["Book"] = "@book"
citetype["Proceedings"] = "@proceedings"
citetype["TechReport"] = "@techreport"
citetype["Misc"] = "@misc"
citetype["InBook"] = "@inbook"
citetype["InProceedings"] = "@inproceedings"
#
# words to be treated specially in names:
Jrs["ed."]
Jrs["eds."]
Jrs["III"]
Jrs["II"]
Jrs["II"]
Jrs["Jr"]
Jrs["Jr."]
}
# lines that start with comments are just copied
# (after changing the comment character):
# *Note*, other kinds of comment are not handled!
/[\t ]*#/{ sub("^#", "%" ); print ; next }
# handle abbreviations (string definitions)
# *note* abbreviations are Case Sensitive!
/^[ address@hidden/{
sub(/^[ address@hidden/, "") # discard "@string" and left brace
sub(/}[ ]*$/, "") # discard right brace and trailing spaces
split($0, st, "[ ]*\=[ ]*") # split line
abbrev[st[1]]=st[2]
next }
# the start of a record triggers output of the previous record
/^[ address@hidden/{ output_entry(entry) }
# so does end of file
END { output_entry(entry) }
# lout entries begin with "{ @Reference", and end with a blank line
/^[ address@hidden/ , /^$/{
# if ( match($0, /[ address@hidden/) == 1 ){
# # establish type of entry:
# split($1, x, "{|," )
# entrytype=x[1]
# # get the citation label:
# entry["citekey"]=x[2]
# next
# }
sub(/^[ ]*/ , "" ) # delete leading spaces
sub(/[ ]*$/ , "" ) # delete trailing spaces
# split field at first "{"
c[1] = substr($0,1,index($0,"\{")-1)
sub(/[ ]*$/ , "", c[1] ) # delete trailing spaces
c[2] = substr($0,index($0,"\{"))
# if there are two Fields, the second is the value
if ( c[2] ){
CF = c[1]
# if ( entry[CF] ) { multdefs[entry["citekey"]] = CF }
# if ( ! bibtex[CF] ){ unknown[CF] = unknown[CF] " "
entry["citekey"] }
if ( ! bibtex[CF] ){ unknown[CF] = unknown[CF] " "}
entry[CF] = c[2]
}
# otherwise, we continue the current Field
else { entry[CF] = entry[CF] " " c[1] }
}
END { printf "\n\n"
# do diagnostics
if ( test ){
printf "%s Records Processed.\n", doneentries
printf "Potential Problems:\n"
for ( f in multdefs ){
printf "--- %-16s :\tduplicate keyword: \t%s\n", f,
multdefs[f]
}
for ( f in unknown ){
if ( unknown[f] ){
printf "--- unknown keyword:\t%s in: %s\n", f, unknown[f]
}
}
}
}
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- lout2bibtex -- just to make it certainly available,
Matej Cepl <=