phpgroupware-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Phpgroupware-developers] Re: utf-8 vs iso-8859-1


From: Dave Hall
Subject: [Phpgroupware-developers] Re: utf-8 vs iso-8859-1
Date: Fri, 24 Feb 2006 00:08:00 +1100

On Thu, 2006-02-23 at 12:50 +0100, Sigurd Nes wrote:
> > On Thu, 2006-02-23 at 09:08 +0100, Sigurd Nes wrote:
> > > > 
> > > > The conversion to utf-8 is giving me problems.
> > > > I have a database with more than 5000 dwellings, 35000
> > > workorders ...
> > > > The language is norwegian - and I really would like to keep the
> > > character set (at least for norwegian) - this way I can use what
> query
> > > tool (as M$access) I like to make anaylis without the need for
> > > postprocessing.
> > > > Please enlighten me if I am missing something.
> > > > 
> > 
> > There are several reasons for the switch to utf-8.  The main one is
> that
> > from db to the user interface we can know that we are always dealing
> > with utf-8.  We can then remove things like lang('chartset').  
> > 
> > Unicode also means we can have multi lingual installs.  For example
> if a
> > company has operations across Europe they can not use a single phpgw
> > install, as we currently use at least 3 different charsets for
> > translations.  I would also like to hardcode urf-8 into stuff
> instead of
> > having to keep track of charsets which potentially causes problems.
> It
> > is also easier if everyone knows to use utf-8 compliant tools.
> > 
> > I haven't used M$ Access since O2k days, but I know that OO.o2 Base
> > allows you to specify the charset for the database connection.
> Maybe M$
> > Access has the same option tucked away somewhere
> > 
> > What are the problems you have?  I am happy to see if we can find a
> way
> > of fixing the problems instead of switching back to encoding soup :)
> > 
> > Cheers
> > 
> > Dave
> > 
> 
> I'm not sure I grasp all the consequenses - this is from some testing:
> 
> I seems that postgres has an unicode odbc-driver so that "should" be
> ok - but it don't seems to work (if there is any converted characters
> - I got 'ODBC -- called failed').
> 

I am not sure what the issue is here.  Is it when the db contains
unicode chars or iso-8859-1 ?

> I will need to convert all the characters in the database to unicode -
> I figure I can dump the database, convert the characters (there is a
> tool ?) and reload the data into an empty database. At this point I
> will most certainly run into problems - 'cause the fields will be to
> short in many cases.
> 

check out iconv.  That is what I used to convert the lang files.  It is
pretty simple. You should be able to convert a full db dump on the
command line, then reimport it.  On average, how many non ascii
characters do you have in a field?  How much slack do your fields have?

> Writing lang-files will be somewhat more difficult ?
> When saving a file with gedit as unicode it is ok when reopened in
> gedit and TexPad (my favorite) but not in emacs.
> 

What? you don't use vi? ;)  Soemone suggested trying " C-x RET f utf-8
RET" in emacs, but I have no idea when it comes to emacs.

> When insterting new values to the database - do I need to filter the
> values trough a converter?
> I certainly cannot edit records with webmin.
> 

You mean manual inserts?  For that I use phpmyadmin or mysql query
browser as I use mysql not pgsql.  Does webmin set the charset based on
a language?

> I thought that the lang-table combined with the users preferences took
> care of multilanguage issue.
> 

Not completely.  AFAIK Unless we use unicode we can't use say different
charsets in 1 install.  For example we can list languages in that
language's local language and charset.

> If there is special functions in the api the reqiure unicode - I'm
> more than willing to convert the input to that function to unicode at
> demand.
> 
> All in all - As I see it - there is a number of limitations compared
> to allow iso-8859-1 for the xsl:stylesheet
> 

What limitations are there for the stylesheets?  From what I understand
it is best to use utf-8 for xml.

Cheers

Dave





reply via email to

[Prev in Thread] Current Thread [Next in Thread]