Re: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issu

lilypond-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issu

From:	reinhold . kainhofer
Subject:	Re: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issue 4908043)
Date:	Mon, 15 Aug 2011 18:36:53 +0000

Reviewers: lemzwerg,

Message:
On 2011/08/15 18:14:21, lemzwerg wrote:

Could you please tell me what this patch is good for?  A BOM not at

the

beginning of a file is no longer a BOM...

I don't oppose to emitting a warning if U+FEFF is encountered, and we
subsequently ignore it (since its use as zero width no-break space is
deprecated), but only within strings...

What am I missing?


RFC 3629 says that U+FEFF is a zero-width non-breakable space, which is
also used as BOM. It also says:
" This character
   can be used as a genuine "ZERO WIDTH NO-BREAK SPACE" within text,"
...
"  It is important to understand that the character U+FEFF appearing at
   any position other than the beginning of a stream MUST be interpreted
   with the semantics for the zero-width non-breaking space, and MUST
   NOT be interpreted as a signature."

Also, our lilypond files are text, so I would understand this that we
should treat the U+FEFF inside the file contents as normal whitespace.


Description:
Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file

Please review this at http://codereview.appspot.com/4908043/

Affected files:
  A input/regression/bom-mark.ly
  M lily/include/lily-lexer.hh
  M lily/lexer.ll
  M lily/lily-lexer.cc


Index: input/regression/bom-mark.ly
diff --git a/input/regression/bom-mark.ly b/input/regression/bom-mark.ly
new file mode 100644

index0000000000000000000000000000000000000000..19895a5af8151d00f7656ea5e51df0d214cd5b5d

--- /dev/null
+++ b/input/regression/bom-mark.ly
@@ -0,0 +1,11 @@
+ \version "2.15.9"
+
+#(ly:set-option 'warning-as-error #f)
+
+\header {

+ texidoc = "This input file contains a UTF-8 BOM not at the verybeginning,

+  but on the first line after the first byte. LilyPond should gracefully
+  ignore this BOM as specified in RFC 3629, but print a warning."
+}
+
+{ c }
Index: lily/include/lily-lexer.hh
diff --git a/lily/include/lily-lexer.hh b/lily/include/lily-lexer.hh

index72391a087748cdd676739a8ed2b3646547f077c7..9729ca701664d8cbaa28277408e62c6cc1e434aa100644

--- a/lily/include/lily-lexer.hh
+++ b/lily/include/lily-lexer.hh
@@ -110,6 +110,7 @@ public:
   void push_note_state (SCM tab);
   void pop_state ();
   void LexerError (char const *);
+  void LexerWarning (char const *);
   void set_identifier (SCM path, SCM val);
   int get_state () const;
   bool is_note_state () const;
Index: lily/lexer.ll
diff --git a/lily/lexer.ll b/lily/lexer.ll

index7cda144e263c9720868330a988904f7fd45dee89..9cb706ebdcaf2f04f4ef32526779aa636d597da1100644

--- a/lily/lexer.ll
+++ b/lily/lexer.ll
@@ -189,8 +189,8 @@ BOM_UTF8    \357\273\277
 <INITIAL,chords,lyrics,figures,notes>{BOM_UTF8}/.* {

if (this->lexloc_->line_number () != 1 || this->lexloc_->column_number() != 0)

     {
-      LexerError (_ ("stray UTF-8 BOM encountered").c_str ());
-      exit (1);
+      LexerWarning (_ ("stray UTF-8 BOM encountered").c_str ());
+      // exit (1);
     }
   debug_output (_ ("Skipping UTF-8 BOM"));
 }
Index: lily/lily-lexer.cc
diff --git a/lily/lily-lexer.cc b/lily/lily-lexer.cc

index5d87c83872d25052496f800de539760a71264c69..ba6429c3ea2798344702178363f200071c0f73cc100644

--- a/lily/lily-lexer.cc
+++ b/lily/lily-lexer.cc
@@ -310,7 +310,7 @@ void
 Lily_lexer::LexerError (char const *s)
 {
   if (include_stack_.empty ())
-    message (_f ("error at EOF: %s", s) + "\n");
+    non_fatal_error (s, _f ("%s:EOF", s));
   else
     {
       error_level_ |= 1;
@@ -319,6 +319,18 @@ Lily_lexer::LexerError (char const *s)
     }
 }

+void
+Lily_lexer::LexerWarning (char const *s)
+{
+  if (include_stack_.empty ())
+    warning (s, _f ("%s:EOF", s));
+  else
+    {
+      Input spot (*lexloc_);
+      spot.warning (s);
+    }
+}
+
 char
 Lily_lexer::escaped_char (char c) const
 {

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issue 4908043), reinhold . kainhofer <=
- Re: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issue 4908043), paconet . org, 2011/08/15
- Re: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issue 4908043), lemzwerg, 2011/08/15
- Re: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issue 4908043), pkx166h, 2011/08/24

Prev by Date: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issue 4908043)
Next by Date: Re: GOP-PROP 8: issue priorities (probable decision)
Previous by thread: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issue 4908043)
Next by thread: Re: Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file (issue 4908043)
Index(es):
- Date
- Thread