[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] Sorting issues - example
From: |
Chris Petersen |
Subject: |
Re: [Pan-users] Sorting issues - example |
Date: |
12 Dec 2002 13:37:00 -0800 |
> It's damn near impossible to get perfect. With 200+ post series, reposts,
> half-finished series, renumbered and restarted series, headers beginning
> with 1-100 vs 001-100, series that stop and start on different days with
> different titling, people posting pars with headers that start with
> "oh, yeha, forgot da pras!", or, even better:
this kind of thing is a little extreme.. but basic stuff like grabbing
article counts wouldn't be hard. I can do it in perl with something
simple like (ok, so I used to write regex's for a living):
($seq, $tot) =
$line =~ /[\[\(\{][\s0]*(\d+?)[a-z_\-\s\/]+0*(\d+?)\s*[\]\)\}]/i;
It's just a matter of HOW smart you want to be when you extract things.
detecting par files is a bit overkill, but detecting sequence counts and
useful text should be doable. the problem is that it starts to
seriously bog down a machine when you want to do this kind of sorting on
100,000 articles.
I'd probably have started working on this sorting stuff on my own, but I
haven't touched C code in like 6 years, and I was just learning it back
then.
I'm more than happy to devise patterns if Charles or someone else wants
to put some effort into designing a regex-based sort routine, but that's
obviously up to them....
-Chris