lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev cernrules.txt


From: Klaus Weide
Subject: lynx-dev cernrules.txt
Date: Fri, 5 Feb 1999 08:07:57 -0600 (CST)

Below a file suggested for samples directory.

Not everything is completely tested - please help test whether it
actually works. :)  feedback appreciated.


# This files contains examples and an explanation for the RULESFILE / RULE
# feature.
#
# Rules for Lynx are experimental.  They provide a rudimentary capability
# for URL rejection and substitution based on string matching.
# Most users and most installations will not need this feature, it is here
# in case you find it useful.  Note that this may change or go away in
# future releases of Lynx; if you find it useful, consider describing your
# use of it in a message to <address@hidden>.
#
# Syntax:
# =======
# As you may have guessed, comments are introduced by a '#' character.
# Rules have the general form
#   Operator  Operand1  [Operand2]
# with words separated by whitespace.
#
# Recognized operators are
#
#   Fail  URL1
# Reject access to this URL, stop processing further rules.
#
#   Map   URL1  URL2
# Change the URL to URL2, then continue processing.
#
#   Pass  URL1  [URL2]
# Accept this URL and stop processing further rules; if URL2
# is given, apply this as the last mapping.
#
# Rules are processed sequentially first to last, a rule applies
# if the current URL (for the resource the user is trying to access)
# matches URL1.  case-sensitive (!) string comparison is used, in addition
# URL1 can contain one '*' which is interpreted as a wildcard matching
# 0 or more characters.  So if for example
# "http://example.com/dir/doc.html"; is requested, it would matches any of
# the following:
#   Pass  http:*
#   Pass  http://example.com/*.html
#   Pass  http://example.com/*
#   Pass  http://example*
#   Pass  http://*/doc.html
# but not:
#   Pass  http://example/*
#   Pass  http://Example.COM/dir/doc.html
#   Pass  http://Example.COM/*
#
# If a URL2 is given and also contains a '*', that character will be
# replaced by whatever matched in URL1.  Processing stops with the
# first matching "Fail" or "Pass" or when the end of the rules is reached.
# If the end is reached without a "Fail" or "Pass", the URL is allowed
# (equivalent to a final "Pass *").
#
# The requested URL will have been transformed to Lynx's normal
# representation.  This means that local file resources should be
# expected in the form "file://localhost/<path using slash separators>",
# not in the machine's native representation for filenames.
#
# Anyone with experience configuring the venerable CERN httpd server will
# recognize the syntax - in fact, the code implementing rules goes back
# to a common ancestor.  But note the differences: all URLs and URL-
# patterns here have to be given as absolute URLs, even for local files.
# (Absolute URLs don't imply proxying - you cannot control that from here.)
#
# CAVEAT
# ======
# First, to squash any false expectations, and example for what NOT TO DO.
# It might be expected that a rule like
#   Fail  file://localhost/etc/passwd           # <- DON'T RELY ON THIS
# could be used to prevent access to the file "/etc/passwd".  This might
# fool a naive user, but the more sophisticated user could still gain
# access, by experimenting with other forms like (@@@ untested)
# "file://<machine's domain name>/etc/passwd" or "/etc//passwd"
# or "/etc/p%61asswd" or "/etc/passwd?" or "/etc/passwd#X" and so on.
# There are many URL forms for accessing the same resource, and Lynx
# just doesn't guarantee that URLs for the same resource will look the
# same way.
#
# The same reservation applies to any attempts to block access to unwanted
# sites and so on.  This isn't the right place for implementing it.
# (Lynx has a number of mechanisms documented elsewhere to restrict access,
# see the INSTALLATION file, lynx.cfg, lynx -help, lynx -restrictions.)
#
# Some more useful applications:
#
# 1. Disabling URLs by access scheme
# ----------------------------------
#   Fail  gopher:*
#   Fail  finger:*
#   Fail  lynxcgi:*
#   Fail  LYNXIMGMAP:*
# This should work (but no guarantees) because Lynx canonicalizes
# the case of recognized access schemes and does not interpret
# %-escaping in the scheme part (@@@ always?)
#
# Note that for many access schemes Lynx already has mechanisms to
# restrict access (see lynx.cfg, -help, -restrictions, etc.), others
# have to be specifically enabled.  Those mechanisms should be used
# in preference.
# Note especially Limitation 1 below.
# This can be used for the remaining cases, or in addition by the
# more paranoid.  Note that disabling "file:*" will also make many
# of the special pages generated by lynx as temporary files (INFO,
# history, ...) inaccessible, on the other hand it doesn't prevent
# _writing_ of various temp files - probably not what you want.
#
# You could also direct access for a scheme to a brief text explaining
# why it's not available:
#   Map news:*   http://localhost/texts/newsserver-is-broken.html
# (That text shouldn't contain any relative links, they would be
# broken.)
#
# 2. Preventing accidental access
# -------------------------------
# If there is a page or site you don't want to access for whatever
# reason (say there's a link to it that crashes Lynx [don't forget to
# report a bug], or it that starts sending you a 5 Mb file you don't
# want, or you just don't like the people...), you can prevent yourself
# from accidentally accessing it:
#    Fail  http://bad.site.com/*
#
# 3. Compressed files
# -------------------
# You have downloaded a bunch of HTML documents, and compressed them
# to save space.  Then you discover that links between the files don't
# work, because they all use the names of the uncompressed files.  The
# following kind of rule will alow you to navigate, invisibly accessing
# the compressed files:
#   Map file://localhost/somedir/*.html file://localhost/somedir/*.html.gz
#
# 4. Use local copies
# -------------------
# You have downloaded a tree of HTML documents, but there are many links
# between them that still point to the remote location.  You want to access
# the local copies instead, after all that's why you downloaded them.  You
# could start editing the HTML, but the following might be simpler:
#  Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html
# Or even combine this with compressing the files:
#  Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html.gz
#
# 5. Broken links etc.
# --------------------
# A user has moved from http://www.siteA.com/~jdoe to http://siteB.org/john,
# or http://www.provider.com/company/ has moved to their own server
# http://www.company.com, but there are still links to the old location
# all over the place; they now are broken or lead to a stupid "this page
# has moved, please update your bookmarks. Refresh in 5 seconds" page
# which you're tired of seeing.  This will not fix your bookmarks, and
# it will let you see the outdated URLs for longer (Limitation 3 below),
# but for a quick fix:
#   Map   http://www.siteA.com/~jdoe/*      http://siteB.org/john/*
#   Map   http://www.provider.com/company/* http://www.company.com/*
# But note that you are likely to create invalid links if no all documents
# from a site are mapped (Limitation 3).
#
# 6. DNS troubles
# ---------------
# A special case of broken links.  If a site is inaccessible because the
# name cannot be resolved (your or their name server is broken, or the
# name registry once again made a mistake, or they really didn't pay in
# time...) but you still somehow know the address; or if name lookups are
# just too slow:
#   Map   http://www.somesite.com/*  http://10.1.2.3/*
# (You could do the equivalent more cleanly by adding an entry to the hosts
# file, if you have access to it.)
#
# Or, if a name resolves to several addresses of which one is down, and the
# DNS hasn't caught up:
#   Map   http://www.w3.org/*    http://www12.w3.org/*
#
# Note that this can break access to some name-based virtually hosted sites.


# Limitations
# ===========
# First, see CAVEAT above.  There are other limitations:
#
# 1. Applicable URL schemes
# -------------------------
# Rules processing does not apply to all URL schemes.  Some are
# handled differently from the generic access code, therefore rules
# for such URLs will never bee "seen".  This limitation applies at
# least to lynxexec:, lynxprog:, mailto:, and LYNXHIST: URLs.
#
# Also, a scheme has to be known to Lynx in order to get as far as
# applying rules - you cannot just define your own new foobar: scheme
# and then map it to something here.
#
# 2. No re-checking
# -----------------
# When a URL is mapped to a different one, the new URL is not checked
# again for compliance with most restrictions established by -anonymous,
# -restrictions, lynx.cfg and so on.  This can be regarded as a feature:
# it allows specific exceptions.  Of course it means that users for
# whom any restrictions must be enforced cannot have write access to a
# personal rules file, but that should be obvious anyway!
#
# 3. Mappings are invisible
# -------------------------
# Changing the URL with "Map" or "Pass" rules will in general not be
# visible to the user, because it happens at a late stage of processing
# a request (similar to directing a request through a proxy).  One
# can think of two kinds of URL for every resource: a "Document URL" as
# the user sees it (on INFO page, history list, status line, etc.), and
# a "physical URL" used for the actual access.  Rules change only the
# physical URL.  This is different from the effect of HTTP redirection.
# Often this is bad, sometimes it may be desirable.  
#
# Changing the URL can create broken links if a document has relative URLs,
# since they are taken to be relative to the "Document URL" (if no BASE tag
# is present) when the HTML is parsed.
#
# 4. Interaction with proxying
# ----------------------------
# Rules processing is done after most other access checks, but before
# proxy (and gateway) settings are examined.  A "Fail" rule works
# as expected, but when the URL has been mapped to a different one,
# the subsequent proxy checking can get confused.  If it decides that
# access is through a proxy or gateway, it will generally use the
# original URL to construct the "physical" URL, effectively overriding
# the mapping rules.  If the mapping is to a different access scheme
# or hostname, proxy checking could also be fooled to use a proxy when
# it shouldn't, to not use one when it should, or (if different proxies
# are used for different schemes) to use the wrong proxy.  So "just
# don't do that"; in some cases setting the no_proxy variable will help.
# Example 3 happens to work nicely if there is a http_proxy but no
# ftp_proxy.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]