I have a medical search engine project at Vanderbilt. We
use the vocabulary/ontology by the NIH Unified Medical Language System
(UMLS). We use the UMLS's Specialist Lexicon as a list of terms (200k
terms, most of them medical, but it also includes general English use terms) and
UMLS Metathesaurus (a list of 800k unique single and multiword concepts; when
broken down into words I think there are about 600k unique words (much of them
genes, chemical names, etc). see http://www.nlm.nih.gov/research/umls/.
Licenses can be had free for research purposes.