[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xliff-comment] XLIFF vs. PO vs. Trolltech
From: |
Oswald Buddenhagen |
Subject: |
Re: [xliff-comment] XLIFF vs. PO vs. Trolltech |
Date: |
Mon, 19 May 2008 13:44:48 +0200 |
User-agent: |
KMail/1.9.9 |
Hello Asgeir, *,
thanks for the replies.
On Saturday 17 May 2008 06:39:32 Asgeir Frimannsson wrote:
> On Saturday 17 May 2008 03:05:11 am Oswald Buddenhagen wrote:
> > Trolltech is looking into implementing/improving XLIFF support in Qt's
> > Linguist tool chain. Interoperability with PO files is an item, too.
> > This is what I've come up with. Please sanity-check it, so we don't set
> > a faulty de-facto standard in case we go for it. ;)
>
> In terms of PO interoperability and the representation of TS in PO, it
> would probably be wise to discuss this on the GNU gettext mailinglist
> (address@hidden) see http://savannah.gnu.org/projects/gettext/ .
>
OK, on CC now.
> Also, the Translate Toolkit (translate.sf.net) have some existing ts<->po
> converters but I'm not sure what the status of these are.
>
Somewhat rudimentary, it seems after a quick test.
> > - The PO representation guide says that everything should be put into one
> > <file> element and PO references should be represented as <context
> > context-type="sourcefile">. This is in accordance with the XLIFF spec
> > (see "sourcefile" value doc). However, that means that if I create an
> > .xlf file directly from sources I get a different representation than if
> > I create a .po file and convert it to .xlf later. I find this
> > inconsistency not justified, so I think I would opt for the "native"
> > representation with multiple <file> elements. Only if the PO message has
> > additional references to other files, sourcefile contexts would be used.
>
> The main issue with representing this as multiple <file> elements is that
> in XLIFF, there is no concept of meta-data above the <file> level.
>
Right. I just mapped the .po file header to a message with an empty source
coming from a file with no name, i.e., basically doing what .po does.
This is sort of hacky, but OTOH it requires no special support from tools, so I
expect less trouble from this approach than some more or less arbitrary other
mapping.
> We used
> a single <file> element for representing a PO, as a PO is a single file.
> If
> e.g. gettext implemented support natively for XLIFF, the data model would
> be very different, as the source would be a set of source-files with
> extracted translatable text, rather than a single resource file.
>
This is basically what I proposed, right?
> (this might be a bit Qt/Trolltech specific from here:)
>
> From what I understand from your mail you are trying to accomplish
> something like
>
> # generates a single .xlf for the project with mutiple <file> elements
> lupdate -xlf myproject.pro
>
> # generates a single .po for the project
> lupdate -po myproject.pro
>
> # generates a single .ts for the project
> lupdate -ts myproject.pro
>
> So you are saying that if you take the PO generated above and create an
> XLIFF from it using the representation guide, it will be different from the
> XLIFF created by lupdate directly?
>
Yes.
> If so, I don't see anything wrong with
> that, as they are technically representing two rather different
> data-models.
>
Yes ... however, one of our aims is having lossless conversion between the
formats (*) for smooth integration into existing systems (and to simplify
internal testing :). This should happen as naturally as possible, without
introducing magic meta data unless unavoidable.
(*) OK, so converting from XLIFF to something else and back to XLIFF is not
going to work losslessly, but you get the idea. :)
> As a side-note: In some of my work, I've found it more beneficial to
> represent PO files as a hierarchy of <group> elements based on the PO
> references rather than the flat structure we have defined in the PO
> representation guide. This structure gives a much better contextual
> hierarchy for both translators and processing tools. This approach takes
> more processing though, as you have inter-trans-unit references, and the PO
> would have to be fully read before starting to write the XLIFF file.
> Howerver, you might find this
> representation closer to what you're trying to accomplish,
>
Yes.
> although I'm not sure how it matches with the ts <context> element.
>
That's fine - .ts contexts are basically nested into files (well, actually, it
is not unlikely to have the same context both in a .ui file and in the
associated .cpp file, but that's not really a tragedy).
> PO:
> #:src/MyDialog.cpp:23 src/MyOtherDialog.cpp:12
> msgid "Hello World"
> msgstr ""
>
> XLIFF representation:
> <group restype='x-directory' resname='src'>
> <group restype='x-file' resname='MyDialog.cpp'>
> <trans-unit id='1'>
> <source>Hello World</source>
> </trans-unit>
> </group>
> <group restype='x-file' resname='MyOtherDialog.cpp'>
> <trans-unit id='2' translate='no'>
> <source><ph id='x' xid='1'/></source>
> </trans-unit>
> </group>
> </group>
>
Hmm, this approach didn't occur to me, as it basically contradicts the expected
usage of <file> elements, no? Something to change for XLIFF 2.0?
> > - Gettext's new msgctxt keyword was brought up before. Incidentally, the
> > <comment> element in Qt's own .ts files maps pretty well to it. There
> > is no standardized mapping for .xlf yet, though. I would pick up a
> > previously suggested approach and do it like that:
> >
> > <trans-unit>
> > <source>foobar</source>
> > <target>irgendwas</target>
> > <context-group purpose="match information">
> > <context context-type="x-gettext-msgctxt"
> > match-mandatory="yes">some context info</context>
> > </context-group>
> > </trans-unit>
> >
> > For plural forms, the context would be attached to the plural group.
> > The exact value for purpose= is not clear to me - the values suggested
> > seem to refer to TM only. I think I would simply skip the purpose ...
>
> Translator editors can e.g. display the context to the translator only
> if 'purpose' is set to 'information', and hide it otherwise.
>
Oh, right - I misread the spec. So "information" is definitely correct.
> Similarly, a
> TM processor can chose to perform additional 'context matching' based on
> the the 'match' purpose-value. This would e.g. be useful if you had two
> identical translation units, but with different contexts, and the TM
> processor could automatically match better based on these.
>
Yes, except that I need it to apply not only to the TM processor, but also to
the tool that generates the output for the translator library in the program. I
suppose it won't hurt if I slightly stretch the definition for the linugist
tools, but it seems to me that something formally approved would be cleaner.
> > - .ts files know a <context> element. I consider it stronger than
> > msgctxt: it is not optional; every message is in a context. Therefore I
> > would map it to nested groups:
> >
> > <group restype="x-trolltech-ts-context">
> > <context-group purpose="match information">
> > <context context-type="x-trolltech-ts-context"
> > match-mandatory="yes">the
> > context</context>
> > </context-group>
> > <trans-unit .../>
> > </group>
> >
> > FWIW, the mapping to PO would be via a magic extracted comment:
> > #. ts:context <the context>
>
> This sounds sensible to me.
>
Good.
> > - As the repr. guide says, .po files do not encode the (target) language.
> > Therefore I would add an X-Language: header to the initial msgstr. It
> > would be implanted and extracted during conversion. When converting from
> > an .xlf file which does not have a first message that seems to be a .po
> > file header, a message would be generated and marked with
> > X-Virgin-Header:; if this header is found on converting back, the message
> > would be zapped.
>
> Not sure I understand the use-case for this.
>
That's again for the lossless conversion. Simply because .ts needs the target
language for the same purpose that .po uses the "Plural-Forms:" header -
unfortunately, no unambiguous reverse mapping is possible.
> > - Gettext's #| msgid (previous source in fuzzy translation) would be
> > mapped to <alt-trans> elements as suggested on this list before: Each
> > previous source is tacked onto a current source. If more previous sources
> > than current sources exist (plural to singular "downgrade"), the source
> > gets two alt-trans elements, the second one with an empty target marked
> > with restype="x-dummy".
> > - Gettext's #| msgctxt would get mapped just like msgctxt, only that the
> > context-type would be x-gettext-previous-msgctxt.
> > - Contrary to the guide, I would store obsolete messages, marking the
> > <trans-unit> resp. the containing plural <group> with translate="no".
> > I see no harm in doing this and it yields a more faithful conversion.
> > The messages would go into a <file> with the imaginary original name
> > Obsolete_PO_entries.
>
> I'm not sure if we really need to go to this extent. I guess it's more a
> design-question if XLIFF was really meant to be a replacement for all
> features that a format supports, rather than an extraction-format. E.g.
> obsolete entries in PO is a way of storing translation that was used in
> previous versions of the project, but are no longer used (however they may
> pop up in later versions of the project, that's why they are stored). XLIFF
> was not intended to be a storage container for these (I guess TMs replace
> this functionality), and I'm not sure if trying to mold XLIFF into such a
> storage container would break processing tools etc (wrong statistics, word
> counts, file counts etc).
>
Good point. But we need it for the lossless roundtrips again. :)
Luckily, lupdate has an option -noobsolete already - I guess adding that to the
anticipated lconvert would not be exceedingly hard. :-)
> > - The guide does not specify how to map fuzzy plurals. I guess one should
> > require approval of all <trans-unit>s in the <group> for non-fuzziness.
>
> Yes, this is a design-limitation of the current XLIFF specification. This
> approach sounds reasonable to me.
>
OK
Regards,
--
Oswald Buddenhagen
Trolltech GmbH
Rudower Chaussee 13
12489 Berlin
Germany
Fon: +49 (030) 6392 3255
Fax: +49 (030) 6392 3256
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [xliff-comment] XLIFF vs. PO vs. Trolltech,
Oswald Buddenhagen <=