Hi,
I partly agree with your points.
libmagic and file are general purpose tools to detect file types, and they are week in detecting source file type.
but we can improve it by updating the magic-database or create a customized magic-database for this purpose.
On the other way, current gtags/htags codes bare a heavy duty on deciding which language the file they are currently processing is, this makes the code very complex and hard to patch.
eg, for my patch, I have traced around 10 functions in 4 files to modify.
My suggestion is to tailor the code of gtags and htags, make light version of them:
for gtags-lite and htags-lite, they read from stdin or a file for file-list, the format is :
index filepath lang
and gtags-lite will create/update GTAGS database and htags will produce rendered html-files.
functionalities like guess and decide language can be put into a separate tool which can be compatible with current gtags.rc file and be integrated with find then output a filelist consumable by gtags-lite.