[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Spellchecking the Classpath API documentation
From: |
Julian Scheid |
Subject: |
Spellchecking the Classpath API documentation |
Date: |
Thu, 16 Dec 2004 22:10:39 +0100 |
User-agent: |
Mozilla Thunderbird 0.8 (X11/20040913) |
I've written a doclet which runs all source code comments through Ispell
(default US dictionary) and accumulates the results in a text file. The
current results for Classpath CVS can be found here:
http://cpx.sektor37.de/classpath-ispell.txt (230 K)
For producing above results, many words were ignored in order to reduce
false positives:
- words only consisting of uppercase letters and underscores
- words in CamelCase
- words with length <= 2
- all class, method, field, and parameter names, as well as package name
components
- all words enclosed in <pre>...</pre>
The doclet uses a naive algorithm to ignore plurals of class names as
well (...y -> ...ies, ...s -> ...ses, ... -> ...s)
In addition, I've manually put together a Classpath-specific dictionary.
You can view the current version here:
http://cpx.sektor37.de/ignore.txt (2 K)
This dictionary is debatable, take it for what it is: a rough draft made
up by a non-native speaker. Comments and corrections are welcome.
From above results, I filtered a list of obvious typos (and UK->US
english replacements) and put together a tool which corrects these in
all classpath sources - observing comment boundaries of course, so no
code should be modified.
The replacement list is here:
http://cpx.sektor37.de/replacements.txt (7 K)
The resulting patch for all classes in java.* and javax.* is here:
http://cpx.sektor37.de/classpath-typos.patch.txt (312 K)
Apart from the obvious UK/US-english question, a couple of the
replacements may be debatable, namely:
onscreen => on-screen
offscreen => off-screen
threadsafe => thread-safe
hightech => high-tech
systemwide => system-wide
I refrained from adding similar replacements for Java lingo like
"subclassing" (cf ignore.txt) but I felt that at least the above are
more correct with a hyphen.
If you have objections to to any of these replacement, let me know and I
can prepare an updated patch.
A next step might be to pick up a suggestion from Thomas Zanders and
build a simple interactive tool similar to a spellchecker in a text
editor which, for each questionable word, shows the context in which it
appears and asks for action (usual spellchecker approach: replace with
suggestion a,b,c, replace with other string, ignore for now, always
ignore, add to dictionary.)
I'd be happy to write such a tool, but I would probably not be the right
person to operate it, not being a native speaker and all. Any volunteers?
-Julian
PS: In case the misspelling "mispelled" in the description of
KeyEvent.VK_SEPERATER was intended as a pun, I apologize for correcting it.
- Spellchecking the Classpath API documentation,
Julian Scheid <=