|
From: | Shigio YAMAGUCHI |
Subject: | Re: gtags bug report: issue with S-JIS encoding files |
Date: | Fri, 17 Nov 2023 16:38:33 +0900 |
Hi,I found that if a file contains a specific CJK characters sequence, the parser seems fail to continue parsing the file.See the follow example source file, let’s say `test.c` in encoding of Shift-JIS (cp932).
extern void printf(char * msg, ...);
void Foo() {
char msg[] = "機能";
printf(msg);
}
void Hello() {
return;
}
(In case of mojibake due to encoding issue for Kanji, screenshots are also provided below.)
- What was occurred? (as is)
Now if you run `gtags` command in same folder follow by `global -f test.c`, you only get one tag, which is `Foo`, but `Hello` shall also be found.
- What did you expect from it?
However, if I modify the source a little bit, then tag `Hello` is found. See variations I tried in the table below.
Cases Table
Cases
Source Code Screenshot
global -f test.c
Bad Case
(Encoding is cp932, or shift-jis)
Foo 4 test.cpp void Foo() {
Good Cases
<image001.png>(Encoding is utf8)
(Encoding is cp932, or shift-jis)
(Encoding is cp932, or shift-jis)
Foo 4 test.cpp void Foo() {
Hello 9 test.cpp void Hello() {
My environment
OS
Windows 11 Enterprise 22H2 64bit Build 22621.2428
gtags --version
gtags (Global) 6.6.9
Powered by Berkeley DB 1.85.
Copyright (c) 1996-2022 Tama Communications Corporation
License GPLv3+: GNU GPL version 3 or later http://www.gnu.org/licenses/gpl.html
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Possible Solutions
- Add a command line encoding option to read the file properly.
- Find out why such file cannot be fully parsed, ignore such special error, and continue parsing.
Also, if such case happens, at least print out some error message to inform user that some files are not fully parsed.
Johnny Cheng
[Prev in Thread] | Current Thread | [Next in Thread] |