[flow-tools] flow-capture reports PDUs out-of-sequence w/Juniper

Dave Plonka plonka@doit.wisc.edu
Thu, 13 Sep 2001 12:31:49 -0500


On Wed, Sep 12, 2001 at 06:45:30PM -0400, Mark Fullmer wrote:
> On Wed, Sep 12, 2001 at 04:09:45PM -0500, Dave Plonka wrote:
> > ~800 flow PDUs in one second.  Each PDU except the last is usually 1416
> > bytes, which works out to about 1MB of data on the wire in 1 second.
> > Perhaps the socket receive buffer is being over-run?
> 
> Probably.  With FreeBSD this is easy to measure, netstat -s will
> report dropped UDP datagrams due to full socket buffers.

Ya, Linux now has that too (didn't used to have "-s"), but there is no
statistic called "udpInOverflows" like Solaris does.

> > I've been examining the flow-capture code again and comparing it with
> > cflowd's method of dynamically setting the SO_RCVBUF, since the two
> > receivers sometimes show different behavior.
> 
> bigsockbuf() was added after 0.53, which will attempt to get the
> largest socket buffer available.  Prior to 0.53 it was a hardcoded
> minimum value that would work on various Solaris, Linux, and FreeBSD
> configurations.  Unfortunately there's no portable way to inquire
> the limits from the kernel so bigsockbuf() tries a large value and 
> continues, decrementing the request by 512, until the setsockopt()
> no longer fails.

Yes, I saw this all in the code.  Similar to what cflowd does.

> > This is quite weird, esp. since I'm seeing exactly the opposite when
> > collecting flows from just one Juniper.  When I use flow-capture I get
> > more data than when I use cflowd.
> 
> A few ideas:

Thanks for the suggestions.

>   o Turn of compression on flow-capture.

This may be it.  I use "-z0" on the collector where I haven't seen the
problem, and WiscNet, which did see the problem, was using the default
compression.

>   o Try using multiple copies of flow-capture, there may be
>     issues with many bursty sources trying to use one socket
>     buffer.

Well, the goal is to write them all to one file (for FlowScan) so I'll
exhaust other options first.

>   o Run flow-capture at a high priority.  Under FreeBSD they
>     have something called rtprio (pseudo realtime process
>     scheduling).  This helped a lot on older Pentium 166
>     collectors when we were running reports on the same 
>     box as flow-capture.

I've not used any real-time priority stuff under Linux, but I have had
experience with the 'pset' patches (as in 'p'rocessor 'set') that allow
you to assign a process to a processor on SMP Linux machines.  That's
how I got FlowScan to run as fast as possible - by giving it its own
processor.

>   o Instrument bigsockbuf() to log the value it acquires.  It
>     may be that the other processes running on the machine
>     that use UDP are using up the buffer space.

I just used strace(1)/truss(1) to see what value it got.  When using
flow-tools 0.55 it did get 229376 (as you coded it 224*1024).

> > > Dealing with
> > > out of order exports isn't that much work to fix...
> > 
> > Not quite sure what you mean.  Doesn't flow-capture already doing the
> > Right Thing(tm) with respect to out of order PDUs?  From my perusal of
> > the code it seems to collect and record the flows regardless of the
> > sequence number, right?  From my reading, it appears to just decide
> > whether or not to _count_ them as out-of-sequence based on the
> > magnitude of the delta value betw. the expected and received sequence
> > number.  Did I miss something?
> 
> Currently an out of order packet will emit two log messages, both
> erroneously indicating a lost PDU.  Adding a buffer in front of
> the sequence number detector could eliminate the messages and
> provide more accurate stats on lost or out of order PDU's.

Still not following you exactly, but no matter...  Maybe you mean for
it to remember recently received sequence numbers so that it can detect
that they were out of order rather than lost, and log an appropriate
message?  I didn't want to get into a mess of waiting for the right
sequence number before writing it to a file, esp. since I switch files
fairly often every five minutes.

Dave

-- 
plonka@doit.wisc.edu  http://net.doit.wisc.edu/~plonka  ARS:N9HZF  Madison, WI