phpgroupware-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Phpgroupware-developers] SQL garbage in contacts backend


From: Alex Borges
Subject: Re: [Phpgroupware-developers] SQL garbage in contacts backend
Date: Wed, 25 Feb 2004 15:35:10 -0600

I have a horrible tendency for large emails....now please, read it
all....i really set my mind and soul into them...

El mié, 25-02-2004 a las 13:26, Markus Kaemmerer escribió:
> Alex, 
> 
> thank you for the quick response. 
> 
> >Indexes shouldve been created upon setup....where yout testing from a
> >clean vanilla install or what kind of upgrade path did it follow?
> 
> This was a clean installation, but some days old. I do not know, how
> old. I recreated my databases yet and found, that there are some
> indicis for some tables. I found that phpgw_contact_person has no
> index at all, but I can not decide if e.g. the main key person_id
> should have an index. This should be carfully verified for all tables.

person_id is a primary key so the index should be implicitly created (or
thats the rationale, if this is not the case, then we should change this
to reflect the right thing).

> 
> >Anyhow, this is fixed by adding indexes.
> 
> This should not be a big problem and adds overall speed.
> 
> >> The first 5 SQL statements are fetching the WHOLE table and the last
> >> 3500 bytes SQL statement is nealy self describing :-).  In this
> >> statement there are things like: "(CASE WHEN ' ' IS NULL THEN '' ELSE
> >> ' ' END)". The SQL builder should suppress such things.
> >
> >1.- 
> >ITs the way its handling the empty strings and all.... whomever you
> >appoint to this part of your project should get in touch with me so we
> >can work out a solution for sql builder working well for NULLS in
> >multiple databases (thats what its trying to automatically fix in that
> >case).
> 
> I think that this part of the SQL string does not make any sense. This
> "(CASE WHEN last_name IS NULL THEN '' ELSE last_name
> END))" should work. 
Yes, you are right.... i got mislead myself. This is actually for the
full name implementation since the table has no full name field. 

> 
> Is there a reason why not add a "nullable => false and default => ''
> in database description? Here in my SQL book this definition seems to
> be a standard SQL command and should be supported from every database
> (we can verify this). 

Most is handled the right way, i dont have the details right now.

> 
> When the table definition is updates this way, it should not be needed
> anymore to test the field against NULL. This is no big perfomance
> problem, but reduces complexity quite much.
> 

> >I disagreee that this query is unacceptably slow. I would call it
> >acceptably unreadable, but not necesarily inneficient (it has to run in
> >all databases).

> I'll do some additional tests, how slow this statement is. But I need
> more test data for this. We will do a profile on sync in the next days
> or weeks.

Ok. Let me know if i can help with that.

> 
> >Communication types and contact types are catalogs (really small ones,
> >tops 50 entries). In the actual addressbook code, the whole contents of
> >them are cached. So, this is just a 'first hit' performance hit
> 
> Because our synchronisition works with XML-RPC calls, this 'first hit'
> hits us every call. I do not see a reason, why this tables should be
> cached. In addition I think this can lead to unconsistency of the
> cached values and the database values.

No, they hold no real values. They are catalogs per design. Anyhow, this
first hit SHOULD (im not shure if under xml-rpc is, but normally it is),
be session wide....that means only the first time contacts_sql is
instanced should this stuff be brought in from the database.

> 
> >Catalog, should be small, executed upon contacts_sql instantiation. You
> >can also tell it to lazyly not-cache it, but that would be stupid.
> 
> is there a switch or something like this we can set if we get our data
> trough IPC and XML-RPC?

Yes. Upon instantiation of sql_contacts you can tell it not to prefetch
the catalogs. Although im not shure thats the behaviour you want.
Catalogs are cached in the session cache. Those queries should be
executed only once per whole phpgw session. Actually, their prefetching
could be done in the login hook.

> 
> >About the CASE and all... first understand why they are there: 
> >- We dont have a full name field, the concat and case gets the full name
> >if youre asking for it. By the way, if youre trying to sync the full
> >name back to phpgw, then you have a problem....:). 
> 
> Dirk is working on that, he is fixing the vCard interface at the
> moment. Maybe he can say more about this topic.

ITs like this. We have no full_name in the person table. Just a
first_name, last name, middle name...etc. This query concatenates them
when you ask for the full_name field in any of the functions wanting to
get anything from contacts


> 
> >Recomendation Marcus:
> >
> >Get your facts straight before just calling other peoples work
> >'garbage'. Ill be the first to admit the design has problems and that
> >the way we tackled those problems was not through a KISS design. 
> 
> It was not my intention to attac your person or such things. I know,
> this is not a simple topic, but we have to do much work to fix the
> remaining problems.
> 

Yes, i agree. Got a bit carried away myself too. Anyhow, let me tell you
some performance notes ive been gathering while deploying this thing for
some clients worth about 500 concurrent workstations in a 100mps
network:

1.- Php is just a language, and apache is just a web server. Neither are
an application server or has any natural provisions for thread creation
or subprocess delegation. Jakarta/tomcat (for example) does have this
kind of provisions thus making it a DIFFERENT (not better or worse)
scaling model.

2.- With this kind of setup, you have to think about performance as a
functionality <-> performance tradeoff. All queries in phpgw have to run
in databases that do not support subselects (otherwise, we would reduce
the hit rate to the databases by a huge factor). But still, we want to
have a really robust framework for managing all the basic groupware
features:
-we want big contacts, not just one table...and it still is missing some
functionality) 
-we want a good calendar with a great interface for searching, setting
appointments, alarms, connecting data to appointments...etc.
-we want a good acl system to control access to everything in our
framework
-we want a generic preferences system that can link to all apps,
provides an easy add/delete/edit interface for programmers...

Well, we want groupware. And all that, makes for a big ass database, and
potentially, a need to manage large ammounts of memory in the php app
(look at acl fetching when you login). 

Where am i driving at... well, since we have no secondary application
server to offload processes to (like in jakarta/tomcat) or an inhouse,
inmemory database/webserver (like zope/gadfly), the only way to make
phpgw scale is to the sides. THat is, to buy more servers and cluster it
with load balancing.

That wont make it faster, it will only make it scale better. 
If you want to make it faster, you need to look at the model of php apps
and ask yourself what is taking the most resources. In my experience:

Problem                                 Solution
- Loading of php code and execution     Turck,apc or other cache
- Big apache processes due to           Lingerd/khttpd -> apache
php extensions
- Exponential use of db connections     Cluster db
                                        Use non-persistant connections

If phpgroupware is to provide real groupware services, so it can do
tasks that commercial  groupware suites can (contact/communication
center, CRM, document management, and basic pim), it needs big
databases. And the bigger the buisness the bigger the database.



> >Thx again though for pointing out the indexes problem (chances are thats
> >90% of your problem). 
> 
> I added some (quick choosen) indici manually with phpMyAdmin, this
> gave me a 30% performance boost but does not solved all my problemes.

Well thats a lot! It shows that not all the indexes that we should be
creating are created. Gather a list and lets put them in!

Another tip....use php4 sessions, not database sessions. Db sessions
ammount to a, potentially large,  extra hit to the database.

> The sync is working fine, but really slow. There performance problems
> are on many parts of phpGW and we have to find a way to strip them
> down as much as possible, because I think no user has time to wait 30
> minutes before his 300 adresses are synchronized :)
> 
I think that part of this is that its never been performance tunned for
xml-rpc..... thats the bottleneck. It makes no sense to think that the
bottleneck is the addressbook since ive just tested the vcard list
import and it took 30 seconds to add 147 contacts (say 3 minutes to do
all the 300?)

So, it makes no sense to think that the addressbook isnt importing fast
enough, only that the xml-rpc model for addition will make us revise the
way we are writing the import code.

> Markus
> 
> --
> Markus Kämmerer         Team Software Solutions
> pro|business AG, EXPO Plaza 1, 30539 Hannover
> E-Mail: address@hidden,  Phone.: 0511/60066-0
> WWW: http://www.probusiness.de/,    Mobile: 0177/5990932
> 
> **********************************************
>            address@hidden 2004
> 
> *     Halle 1, Stand 3k4 (Magirus Deutschland GmbH)
> *     Halle 1, Stand 7f2 (EMC Deutschland GmbH)
> *     Halle 6, Stand D46 (Land Niedersachsen)
> **********************************************
> 
> 
> _______________________________________________
> Phpgroupware-developers mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/phpgroupware-developers
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]