chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] hash table questions


From: Kon Lovett
Subject: Re: [Chicken-users] hash table questions
Date: Fri, 23 Nov 2012 14:19:10 -0800


On Nov 23, 2012, at 1:58 PM, Kevin Wortman <address@hidden> wrote:

Hi,

This may or may not be helpful, but I'll add that, if it were me, I'd represent an IP address as a list of four fixnums rather than a u8vector. Then the default hash function will work out of the box, and many of your functions can be written more concisely. E.g.

(define IPv4-addr= equal?)

(define (IPv4-address->string a)
  (string-intersperse (map number->string a) "."))

It's true that a u8vector will be a bit more space-efficient, but IMO it's wise to refrain from that level of optimization unless and until it's the only way to deal with an observed bottleneck.

Normally I wouldn't disagree with the basic idea here, although I would advocate a vector rather than list.

However the existing srfi-69 implementation doesn't follow a pair's references. (Also I think there is a bug where "list?" beats "pair?" in the type discrimination.)

But vectors (structures are treated as vectors) and vector-like objects do recurse into the contents, at least for 4 levels.


Regards,
Kevin Wortman



On Thu, Nov 22, 2012 at 4:56 PM, Claude Marinier <address@hidden> wrote:
Greetings fellow Schemers,

Having established in a previous post that using a u8vector as a key for a hash table is expected to work, I have some specific questions.

I am still learning how to post questions properly. I hope this is clear and contains enough information.

By the way, how can I make the included code display properly? In Mozilla Firefox, it uses a proportional font.


1) Is using a hash function other than the default worth the trouble?

I define the hash table this way.

(define IPv4-table
  (make-hash-table IPv4-addr= (make-fixnum-bounded-hash FNVAHash)))

(declare (inline IPv4-addr=))
(define IPv4-addr=
  (lambda (addr1 addr2)
    (and (= (u8vector-ref addr1 0) (u8vector-ref addr2 0))
         (= (u8vector-ref addr1 1) (u8vector-ref addr2 1))
         (= (u8vector-ref addr1 2) (u8vector-ref addr2 2))
         (= (u8vector-ref addr1 3) (u8vector-ref addr2 3)))))

Would it be simpler, as effective, and almost as efficient to use the default?

P.S. Is this the correct way to specific an alternate hash function?


2) Is this the correct way to add new records to the table and update existing records?

(define update-IPv4-counters
  (lambda (addr direction proto tokens)
    (let ((IP-record (hash-table-ref/default IPv4-table addr #f)))
      (cond
        ((not (vector? IP-record))
          ; addr is not in the table, this is a new entry
          (hash-table-set! IPv4-table addr
            (update-IP-record (make-IP-record) direction proto tokens)))
        (else
          ; addr is in the table, increment packets and add bytes
          (hash-table-set! IPv4-table addr
            (update-IP-record IP-record direction proto tokens)))))))

Note that an IP record is a seven elements vector (i.e. created with make-vector).

If I understand SRFI-69 correctly, hash-table-set! will replace a record with the same key. Is this correct? There cannot be duplicate keys, right?


3) This is how I dump the hash table.

(define dump-data
  (lambda ()

    (let*  ((outport (open-output-file (make-dump-file-name)))
            (dump-IPv4-record (lambda (address IP-record)
                (write-with-indent (IPv4-address->string address) 1 outport)
                (dump-IP-protocols IP-record 1 outport))))

      (write-with-indent "IPv4" 0 outport)
      (hash-table-for-each IPv4-table dump-IPv4-record)

      ...

(define IPv4-address->string
  (lambda (addr)
    (string-append
      (number->string (u8vector-ref addr 0)) "."
      (number->string (u8vector-ref addr 1)) "."
      (number->string (u8vector-ref addr 2)) "."
      (number->string (u8vector-ref addr 3))))

This code displays two records with the same key and different data. This has happened more than once. I have kept the output files.

If I call hash-table-set! with the same address as another record but in a different format (say a vector instead of a u8vector), would it choke or would it just create a new record without complaining? This would look like duplicate keys but they would be different.

I assume that hash-table-for-each will present each key / record pair once and do so in some arbitrary order.

Can I assume that dump-IPv4-record, when it calls IPv4-address->string, will choke on a key which is not a u8vector?



I am looking for the mistake I made which is producing the illusion of a duplicate key in the hash table. I believe that confirming my assumptions is a good starting point.

I hope this sort of question is not abusing the list members.

Thank you for your patience.

--
Claude Marinier


_______________________________________________
Chicken-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-users


_______________________________________________
Chicken-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-users


reply via email to

[Prev in Thread] Current Thread [Next in Thread]