classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Spellchecking the Classpath API documentation


From: Julian Scheid
Subject: Spellchecking the Classpath API documentation
Date: Thu, 16 Dec 2004 22:10:39 +0100
User-agent: Mozilla Thunderbird 0.8 (X11/20040913)

I've written a doclet which runs all source code comments through Ispell (default US dictionary) and accumulates the results in a text file. The current results for Classpath CVS can be found here:
http://cpx.sektor37.de/classpath-ispell.txt (230 K)

For producing above results, many words were ignored in order to reduce false positives:

- words only consisting of uppercase letters and underscores
- words in CamelCase
- words with length <= 2
- all class, method, field, and parameter names, as well as package name components
- all words enclosed in <pre>...</pre>

The doclet uses a naive algorithm to ignore plurals of class names as well (...y -> ...ies, ...s -> ...ses, ... -> ...s)

In addition, I've manually put together a Classpath-specific dictionary. You can view the current version here:
http://cpx.sektor37.de/ignore.txt (2 K)

This dictionary is debatable, take it for what it is: a rough draft made up by a non-native speaker. Comments and corrections are welcome.

From above results, I filtered a list of obvious typos (and UK->US english replacements) and put together a tool which corrects these in all classpath sources - observing comment boundaries of course, so no code should be modified.

The replacement list is here:
http://cpx.sektor37.de/replacements.txt (7 K)

The resulting patch for all classes in java.* and javax.* is here:
http://cpx.sektor37.de/classpath-typos.patch.txt (312 K)

Apart from the obvious UK/US-english question, a couple of the replacements may be debatable, namely:
onscreen => on-screen
offscreen => off-screen
threadsafe => thread-safe
hightech => high-tech
systemwide => system-wide

I refrained from adding similar replacements for Java lingo like "subclassing" (cf ignore.txt) but I felt that at least the above are more correct with a hyphen.

If you have objections to to any of these replacement, let me know and I can prepare an updated patch.

A next step might be to pick up a suggestion from Thomas Zanders and build a simple interactive tool similar to a spellchecker in a text editor which, for each questionable word, shows the context in which it appears and asks for action (usual spellchecker approach: replace with suggestion a,b,c, replace with other string, ignore for now, always ignore, add to dictionary.)

I'd be happy to write such a tool, but I would probably not be the right person to operate it, not being a native speaker and all. Any volunteers?

-Julian

PS: In case the misspelling "mispelled" in the description of KeyEvent.VK_SEPERATER was intended as a pun, I apologize for correcting it.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]