[flow-tools] flow-capture reports PDUs out-of-sequence w/Juniper

Mark Fullmer maf@eng.oar.net
Wed, 12 Sep 2001 18:45:30 -0400


On Wed, Sep 12, 2001 at 04:09:45PM -0500, Dave Plonka wrote:
> ~800 flow PDUs in one second.  Each PDU except the last is usually 1416
> bytes, which works out to about 1MB of data on the wire in 1 second.
> Perhaps the socket receive buffer is being over-run?

Probably.  With FreeBSD this is easy to measure, netstat -s will
report dropped UDP datagrams due to full socket buffers.

> I've been examining the flow-capture code again and comparing it with
> cflowd's method of dynamically setting the SO_RCVBUF, since the two
> receivers sometimes show different behavior.

bigsockbuf() was added after 0.53, which will attempt to get the
largest socket buffer available.  Prior to 0.53 it was a hardcoded
minimum value that would work on various Solaris, Linux, and FreeBSD
configurations.  Unfortunately there's no portable way to inquire
the limits from the kernel so bigsockbuf() tries a large value and 
continues, decrementing the request by 512, until the setsockopt()
no longer fails.

> This is quite weird, esp. since I'm seeing exactly the opposite when
> collecting flows from just one Juniper.  When I use flow-capture I get
> more data than when I use cflowd.

A few ideas:

  o Turn of compression on flow-capture.

  o Try using multiple copies of flow-capture, there may be
    issues with many bursty sources trying to use one socket
    buffer.

  o Run flow-capture at a high priority.  Under FreeBSD they
    have something called rtprio (pseudo realtime process
    scheduling).  This helped a lot on older Pentium 166
    collectors when we were running reports on the same 
    box as flow-capture.

  o Instrument bigsockbuf() to log the value it acquires.  It
    may be that the other processes running on the machine
    that use UDP are using up the buffer space.

> > Dealing with
> > out of order exports isn't that much work to fix...
> 
> Not quite sure what you mean.  Doesn't flow-capture already doing the
> Right Thing(tm) with respect to out of order PDUs?  From my perusal of
> the code it seems to collect and record the flows regardless of the
> sequence number, right?  From my reading, it appears to just decide
> whether or not to _count_ them as out-of-sequence based on the
> magnitude of the delta value betw. the expected and received sequence
> number.  Did I miss something?

Currently an out of order packet will emit two log messages, both
erroneously indicating a lost PDU.  Adding a buffer in front of
the sequence number detector could eliminate the messages and
provide more accurate stats on lost or out of order PDU's.

mark