pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Sorting issues - example


From: Chris Petersen
Subject: Re: [Pan-users] Sorting issues - example
Date: 12 Dec 2002 13:37:00 -0800

> It's damn near impossible to get perfect.  With 200+ post series, reposts,
> half-finished series, renumbered and restarted series, headers beginning 
> with 1-100 vs 001-100, series that stop and start on different days with
> different titling, people posting pars with headers that start with
> "oh, yeha, forgot da pras!", or, even better:

this kind of thing is a little extreme..  but basic stuff like grabbing
article counts wouldn't be hard.  I can do it in perl with something
simple like (ok, so I used to write regex's for a living):

($seq, $tot) =
  $line =~ /[\[\(\{][\s0]*(\d+?)[a-z_\-\s\/]+0*(\d+?)\s*[\]\)\}]/i;

It's just a matter of HOW smart you want to be when you extract things. 
detecting par files is a bit overkill, but detecting sequence counts and
useful text should be doable.  the problem is that it starts to
seriously bog down a machine when you want to do this kind of sorting on
100,000 articles.

I'd probably have started working on this sorting stuff on my own, but I
haven't touched C code in like 6 years, and I was just learning it back
then.

I'm more than happy to devise patterns if Charles or someone else wants
to put some effort into designing a regex-based sort routine, but that's
obviously up to them....

-Chris




reply via email to

[Prev in Thread] Current Thread [Next in Thread]