lout2bibtex -- just to make it certainly available

lout-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lout2bibtex -- just to make it certainly available

From:	Matej Cepl
Subject:	lout2bibtex -- just to make it certainly available
Date:	Mon, 4 Sep 2000 09:30:46 +0400 (MSD)

Hi,

just, when I looked for lout2bibtex script I couldn't anything simple 
enough for lawyer. Than later on I have found one (to the best of 
my knowledge not available on the Web) in awk. For future 
unfortunates, I would like to send it to this conference to make it 
easily available to anybody (and myself, if I again forgett where I put 
my scripts :-).

                                Have a nice day

                                                                        Matej

#!/bin/gawk -f 
# Convert lout entries into bibtex style entries, taking care of some
# lout -> LaTeX/TeX text changes
#
# David Middleton (address@hidden)
# 10 Dec 1996
#
# Based on bibtex2refer by
# Doug Arnold (address@hidden)
# 10 May 1995
#
# who says (quote):
#
# Inspired by bibtex2ref by Bernd Fritzkes
# (address@hidden)
# 
#
# This file can be freely copied, and changed
# so long as you preserve my name on it
# and indicate any changes you make.
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# No guarantees for the correctness is given and no 'support' is provided 
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#
# unquote.  Too many changes to annotate sensibly.  Diffs available on request.
#
#
# usage: lout2bibtex file.ld > file.bib
#
# for testing use:
#
# lout2bibtex test=1 file.ld > file.bib 
#
#
# If lout2bibtex encounters unknown field names, it will ignore them but store 
their
# names in a list, which can be displayed after the conversion process (cf 
testing).
#
# With this list the program can easily be extended by adding entries to the 
# associative array "bibtex".
#
#
#################################################################################
#
# The basic procedure is to read through the lout file, taking entries to begin
# with a line like { @Reference... and to end with a blank line.
# the entry is read into an array indexed by the lout field name (e.g. @Author)
#
# the individual field values are assembled and translated into tex
# (taking care of fonts, etc.) when the end of any entry is found
# the array is printed out, using bibtex fields
# 
# Notes:
# This is basically a hack.  It works with a some example files I have, but
# probably won't work with all.  It does not check for matching {} etc.
#
# Not all of the origibal bibtex2refer code has yet been translated so some bits
# simply remain commented out.  Special characters not handled yet.
#
#######################################################################
# ----------------- function defintions ------------------------------#
#######################################################################

        # edit (i.e. format) and output an entry
function output_entry(entry){
    if (entry["@Tag"]){                 # ignore empty ref generated by main 
prog
        doneentries++
        printf "\n"
        gsub(/\{/,"",entry["@Type"])    # lout citation types are single words, 
so remove
        gsub(/\}/,"",entry["@Type"])    # any braces
        gsub(/ /,"",entry["@Type"])             # or spaces
        printf "%s{", citetype[entry["@Type"]]
        printf "%s,\n", bibtexify(entry["@Tag"])
        for ( f in entry ){
        if (!(unknown[f])){
#           sub(/, *$/      , "", entry[f])    # deleting commas at end of line
            # cash out abbreviations:
#           for (a in abbrev){
#                 if ( entry[f] == a ){ entry[f] = abbrev[a] }
#                          }
            if ( f == "author" ){ fix_authors( entry[f]  ) }   # fix and print 
authors
            if ( f == "editor" ){ fix_editors( entry[f]  ) }   # fix and print 
editors
            if ( f == "pages"  ){ sub("--", "-", entry[f]) }   # is this 
necessary ??
            if (! ( f == "author" || f == "editor" )){ 
                        printf "        %s ",  bibtex[f] 
                        printf "= {" 
                        printf "%s", bibtexify(entry[f])
                        printf "},\n" 
                        }
                }
                delete entry[f]
        }
        printf "}\n"
    }
}


        # bibtexify:
        # turn the text in entries into bibtex
function bibtexify( ent ) {
    sub(/}*$/,"",ent)           # deleting the brace marking end of record
    sub(/^*{/,"",ent)           # deleting the brace marking start of record
        # fix TeX accents (no braces):
#    ent = tex_accent( ent ) 
        # fix accents, font changes, etc. from LaTeX:
    ent = fix_fonts( ent )      
            # fix the non-breaking space character: ~ -> \0
            # when tilde is not an accent mark (by now an nroff mark)
    while ( match(ent, /[^\*]~/) ){ 
        ent = substr(ent,0,RSTART) "\\0" substr(ent,RSTART+RLENGTH,length(ent)) 
                        }
    sub(/"\) *$/    , "", ent)          # deleting possible record final ")" 
    sub(/" *$/      , "", ent)          # deleting " at end of line
    sub(/^[     ]*"/, "", ent)          # deleting " at start of line
    sub(/^*[     ]/,"",ent)             # delete leading spaces
    sub(/[       ]*$/,"",ent)           # delete trailing spaces
    # this does not look very good, so don't do it:
    # gsub(/--/,       "\\(em", ent)    # dashes, etc.
    # gsub(/--/,       "-",     ent)    # might be better?
    gsub(/\\-/,      "-",     ent)    # hyphenation commands
    gsub(/\\\//,     "",      ent)    # the \/ construct
    return( ent )
}

        # fix_curlies: 
        # deal with stuff set off by non-intersecting curly brackets
        # such as font changes and LaTeX accents:
        # Find a minimal string of the form {....} -- the "target"
        # extract it, edit it, and splice it back in.

function fix_fonts( ln,    pre, post, target ){
        gsub(/@I[ ]\{/,  "{\\it ", ln ) 
        gsub(/@I\{/,  "{\\it ", ln ) 
        gsub(/@F\{/,  "{\\tt ", ln )
        gsub(/@F[ ]\{/,  "{\\tt ", ln )
        gsub(/@B\{/,  "{\\bf ", ln )
        gsub(/@B[ ]\{/,  "{\\bf ", ln )

#        if ( sub(/{\\sc[       ]*/,  "\\s-2", target ) ){
#                       sub(/}/, "\\s+2", target) }
#        sub(/\\i/,      "i", target)  # \i construct
#        sub(/\\o/,      "o", target)  # norwegian "o"
         # b. LaTeX accents: accents are marked in the "pre" part,
         #    and must be transposed to "post" part
#       if ( sub(/\\'$/, "", pre) ){ post = "\\\*'" post }
#       if ( sub(/\\`$/, "", pre) ){ post = "\\\*`" post }
#       if ( sub(/\\\^/, "", pre) ){ post = "\\\*\^" post }
#       if ( sub(/\\~$/, "", pre) ){ post = "\\\*~" post }
#       if ( sub(/\\"$/, "", pre) ){ post = "\\\*:" post }
#       if ( sub(/\\c$/, "", pre) ){ post = "\\\*," post }
   return(ln)
  }

        # tex_accent:
        # handle TeX accents, e.g. \'e, \~n, etc. (i.e. no braces)
function tex_accent( ln ,       pre, post, accent, letter){
   sub(/\\i/,      "i", ln)  # the \i construct
   while ( match(ln, /\\['`^~"c][A-z]/ ) ){
        pre  = substr(ln,1,RSTART-1)                 # before accent+letter
        post = substr(ln,RSTART+RLENGTH,length(ln))  # after accent+letter
        accent=substr(ln,RSTART,RLENGTH-1)           # accent itself
        sub(/\\/, "&\*", accent)                     # insert "*"
        sub(/c/ , ","  , accent)                     # cedilla c -> ","
        sub(/"/ , ":"  , accent)                     # umlaut
        letter=substr(ln,RSTART+RLENGTH-1,1)         # accented letter
        ln = pre letter accent post
                        }
   return(ln)
        }


        # handle multiple editors, and deal with editors of collections
        # when the whole book is being referenced (i.e. they are the 
        # "authors" as far as refer is concerned
function fix_editors(ent,       eds, ed){
  eds = split(ent, ed, "[        ]+and[  ]+")
  if ( entrytype == "@book" ){      # add eds./ed. at end and treat as authors
        if ( eds > 1 ){ sub(/"$/, ", eds.\"", ent) }
        else { sub(/"$/, ", ed.\"", ent) }
        fix_authors( ent )
                        }           # otherwise, keep them on one line
                                    # but add commas and "and" as necessary
  else{ 
        printf "%%E %s", bibtexify( ed[1] )                     # single ed.
        if ( eds >= 2 ){                                        # more than one 
ed
                for ( i=2 ; i < eds-1 ; i++ ){ 
                        printf ", %s", bibtexify( ed[i] )        # up to penult 
ed
                                        }
                printf " and %s", bibtexify( ed[eds] )           # last ed
                }
        printf "\n"
           }
        }

        # fix_authors:
        # deal with all the complexitites of how BiBtex represents
        # multiple authors by separating them with "and"
function fix_authors(ent,       author){
                # split the authors as author[1], author[2], etc.
        authmax = split(ent, author, "[  ]+and[  ]+")
        for ( a=1 ; a <= authmax ;  a++ ){ 
                printf "%%A %s\n", fix_name( bibtexify( author[a] ))
                        }
                }

        # The complexities of authors names:
        # deal with the complexities of each author, assumed to be
        # a piece of nroff text now; it consists of "components",
        # separated by commas (e.g. "Andrews, III, Avery" )
        # these components may need to be reordered
        # (this will go wrong with "J. Smith, Jr. III", no doubt)
function fix_name( author,      last, auth){
        last = split(author, name, ", *")   # split into components names
        if ( last == 1 ){ return(author) }  # no commas, no problems
                                            # similarly if the last component 
                                            # is Jr. or something
        else {  if ( name[last] in Jrs ){ return(author) }
                else {  auth = name[last]       # final component first
                        for (n = 1 ; n < last ; n++ ){ 
                                if ( name[n] in Jrs ){
                                        auth = auth ", " name[n] }
                                else { auth = auth " " name[n] }
                                                    }
                        return (auth) 
                        }
               }
        }

#--------------Program Begins----------------------------------------

BEGIN {
    FS = " "
#       # citekey is not really a bibtex field, I use it 
#       # to store the "key" or \cite of the record
#    bibtex["@Tag"] = "citekey"
        # lout keywords and associated bibtex fields
    bibtex["@Address"] = "address"
    bibtex["@Author"] = "author"
    bibtex["@InAuthor"] = "editor"
    bibtex["@InTitle"] = "booktitle"
    bibtex["@Institution"] = "institution"
    bibtex["@Journal"] = "journal"
    bibtex["@Keywords"] = "keywords"
    bibtex["@Note"] = "note"
    bibtex["@Number"] = "number"
    bibtex["@Organization"] = "organization"
    bibtex["@Pages"] = "pages"
    bibtex["@Publisher"] = "publisher"
    bibtex["@Title"] = "title"
    bibtex["@TitleNote"] = "series"
    bibtex["@Volume"] = "volume"
    bibtex["@Year"] = "year"
        #
        # Citation types - not all the mappings chosen may be the most 
appropriate
    citetype["Article"] = "@article"
    citetype["PhDThesis"] = "@phdthesis"
    citetype["MastersThesis"] = "@mastersthesis"
    citetype["Book"] = "@book"
    citetype["Proceedings"] = "@proceedings"
    citetype["TechReport"] = "@techreport"
    citetype["Misc"] = "@misc"
    citetype["InBook"] = "@inbook"
    citetype["InProceedings"] = "@inproceedings"
        #       
        # words to be treated specially in names:
    Jrs["ed."]
    Jrs["eds."]
    Jrs["III"]
    Jrs["II"]
    Jrs["II"]
    Jrs["Jr"]
    Jrs["Jr."]
}

        # lines that start with comments are just copied 
        # (after changing the comment character):
        # *Note*, other kinds of comment are not handled!
/[\t ]*#/{ sub("^#", "%" ); print ; next }

        # handle abbreviations (string definitions)
        # *note* abbreviations are Case Sensitive!
/^[      address@hidden/{ 
        sub(/^[  address@hidden/, "") # discard "@string" and left brace
        sub(/}[  ]*$/,        "") # discard right brace and trailing spaces
        split($0, st, "[         ]*\=[   ]*")  # split line
        abbrev[st[1]]=st[2]
        next } 
        

        # the start of a record triggers output of the previous record
/^[      address@hidden/{       output_entry(entry)     }
        # so does end of file
END             {       output_entry(entry)     }


        # lout entries begin with "{ @Reference", and end with a blank line
/^[      address@hidden/ , /^$/{
#       if ( match($0, /[        address@hidden/) == 1 ){
#                               # establish type of entry:
#               split($1, x,  "{|," )
#               entrytype=x[1]
#                               # get the citation label:
#               entry["citekey"]=x[2]
#               next
#               }
        sub(/^[  ]*/        ,  "" )  # delete leading spaces
        sub(/[   ]*$/       ,  "" )  # delete trailing spaces
                # split field at first "{"
        c[1] = substr($0,1,index($0,"\{")-1)
        sub(/[   ]*$/       ,  "", c[1] )  # delete trailing spaces
        c[2] = substr($0,index($0,"\{"))

                # if there are two Fields, the second is the value
        if ( c[2] ){
                CF = c[1]
#               if ( entry[CF] )  { multdefs[entry["citekey"]] = CF }
#               if ( ! bibtex[CF] ){ unknown[CF] = unknown[CF] " " 
entry["citekey"] }
                if ( ! bibtex[CF] ){ unknown[CF] = unknown[CF] " "}
                entry[CF] = c[2]
              }
                # otherwise, we continue the current Field
        else { entry[CF] = entry[CF] " " c[1] }
            }

END {   printf "\n\n"
                # do diagnostics
        if ( test ){
        printf "%s Records Processed.\n", doneentries
        printf "Potential Problems:\n"
        for ( f in multdefs ){ 
                printf "--- %-16s :\tduplicate keyword:   \t%s\n", f, 
multdefs[f]
                        }
        for ( f in unknown ){ 
            if ( unknown[f] ){
                printf "--- unknown keyword:\t%s in: %s\n", f, unknown[f] 
                             }
                           }
                    }
   }

[Prev in Thread]

Current Thread

[Next in Thread]

lout2bibtex -- just to make it certainly available, Matej Cepl <=

Next by Date: Colour separation
Next by thread: Colour separation
Index(es):
- Date
- Thread