Version 1.3
-
Last Modified
on: Thu Nov 11 18:18:19 PST 1999
-
The master copy
of this FAQ is currently kept at
-
http://www.whitefang.com/rin/
-
The webpage also
contains material that supplements this FAQ, along with a very spiffy
html version.
-
If you wish to
mirror it officially, please contact me for details.
Copyright
I, Thamer Al-Herbish
reserve a collective copyright on this FAQ. Individual contributions
made to this FAQ are the intellectual property of the contributor.
I am responsible
for the validity of all information found in this FAQ.
This FAQ may contain
errors, or inaccurate material. Use it at your own risk. Although an
effort is made to keep all the material presented here accurate, the
contributors and maintainer of this FAQ will not be held responsible
for any damage -- direct or indirect -- which may result from inaccuracies.
You may redistribute
this document as long as you keep it in its current form, without any
modifications. Please keep it updated if you decide to place it on a
publicly accessible server.
Introduction
The following FAQ
attempts to answer questions regarding raw IP or low level IP networking,
including raw sockets, and network monitoring APIs such as BPF and DLPI.
Additions and
Contributions
If you find anything
you can add, have some corrections for me or would like a question answered,
please send email to:
Thamer Al-Herbish
<shadows@whitefang.com>
Please remember
to include whether or not you want your email address reproduced on
the FAQ (if you're contributing). Also remember that you may want to
post your question to Usenet, instead of sending it to me. If you get
a response which is not found on this FAQ, and you feel is relevant,
mail me both copies and I'll attempt to include it.
Also a word on raw
socket bugs. I get approximately a couple of emails a month about them,
and sometimes I just can't verify if the bug exists on a said system.
Before mailing in the report, double check with my example source code.
If it looks like it's a definite bug, then mail it in.
Special thanks to
John W. Temples <john@whitefang.com>
for his constant healthy criticism and editing of the FAQ.
Credit is given
to the contributor as his/her contribution appears in the FAQ, along
with a list of all contributors at the end of this document.
A final note, a
Raw IP Networking mailing list is up. You can join by sending an empty
message to rawip-subscribe@whitefang.com
Caveat
This FAQ covers
only information relevant to the UNIX environment.
Table of Contents
-
-
Depending
on your operating system, the following is an incomplete list
of available tools:
| tcpdump:
|
Found out-of-the-box on most BSD variants, and also available
separately from
ftp://ftp.ee.lbl.gov/tcpdump.tar.Z along with libpcap
(see below) and various other tools. This tool, in particular,
has been ported to multiple platforms thanks to libpcap.
|
| ipgrab
|
Compatible with many systems. ipgrab displays link level,
transport level, and network level information on packets
captured verbosely.
http://www.xnet.com/~cathmike/MSB/Software/ |
| Ethereal
|
(GUI) A network packet analyzer (uses GTK+). Supports many
systems. Available at:
http://ethereal.zing.org/ |
| tcptrace:
|
http://jarok.cs.ohiou.edu/software/tcptrace/tcptrace.html
Not an actual sniffer, but can read from the logs produced
by many other well known sniffers to produce output in different
formats and in adjustable details (includes diagnostics).
|
| tcpflow
|
http://www.circlemud.org/~jelson/software/tcpflow/
tcpflow is a program that captures data transmitted as part
of TCP connections (flows), and stores the data in a way
that is convenient for protocol analysis or debugging. |
| snoop:
|
Solaris, IRIX. |
| etherfind:
|
SunOS. |
| Packetman:
|
SunOS, DEC-MIPS, SGI, DEC-Alpha, and Solaris. Available
at ftp://ftp.cs.curtin.edu.au:/pub/netman/
|
| nettl/ntfmt:
|
HP/UX |
-
Depending
on your operating system (different versions may vary):
| BPF:
|
Berkeley Packet Filter. Commonly found on BSD variants.
|
| DLPI:
|
Data Link Provider Interface. Solaris, HP-UX, SCO Openserver.
|
| NIT:
|
Network Interface Tap. SunOS 3. |
| SNOOP:
|
(???). IRIX. |
| SNIT:
|
STREAMS Network Interface Tap. SunOS 4. |
| SOCK_PACKET:
|
Linux. |
| LSF:
|
Linux Socket Filter. Is available on Linux 2.1.75 onwards.
|
| drain:
|
Used to snoop packets dropped by the OS. IRIX. |
-
Yes. libpcap
from ftp://ftp.ee.lbl.gov/libpcap.tar.Z
attempts to provide a single API that interfaces with different
OS-dependent packet capturing APIs. It's always best, of course,
to learn the underlying APIs in case this library might hide
some interesting features. It's important to warn the reader
that I have seen different versions of libpcap break backward
compatibility.
-
The exact
details are dependent on the operating system. However, the
following will attempt to illustrate the usual technique used
in various implementations:
The user
process opens a device or issues a system call which gives it
a descriptor with which it can read packets off the wire. The
kernel then passes the packets straight to the process.
However,
this wouldn't work too well on a busy network or a slow machine.
The user process has to read the packets as fast as they appear
on the network. That's where buffering and packet filtering
come in.
The kernel
will buffer up to X bytes of packet data, and pass the packets
one by one at the user's request. If the amount exceeds a certain
limit (resources are finite), the packets are dropped and are
not placed in the buffer.
Packet
filters allow a process to dictate which packets it's interested
in. The usual way is to have a set of opcodes for routines to
perform on the packet, reading values off it, and deciding whether
or not it's wanted. These opcodes usually perform very simple
operations, allowing powerful filters to be constructed.
BPF filters
and then buffers; this is optimal since the buffer only contains
packets that are interesting to the process. It's hoped that
the filter cuts down the amount of packets buffered to stop
overflowing the buffer, which leads to packet loss.
NIT, unfortunately,
does not do this; it applies the filter after buffering, when
the user process starts to read from the buffered data.
According
to route <route@infonexus.com>
Linux' SOCK_PACKET does not do any buffering and has no
kernel filtering.
Your mileage
may vary with other packet capturing facilities.
-
If you're
experiencing a lot of packet loss, you may want to limit the
scope of the packets read by using filters. This will only work
if the filtering is done before any buffering. If this still
doesn't work because your packet capturing facility is broken
like NIT, you'll have to read the packets faster in a user process
and send them to another process -- basically attempt to do
additional buffering in user space.
Another
way of improving performance, is by using a larger buffer. On
Irix using SNOOP, the man page recommends using SO_RCVBUF. On
BSD with BPF one can use the BIOCSBLEN ioctl call to increase
the buffer size. On Solaris bufmod and pfmod can be used for
altering buffer size and filters respectively.
Remember,
the longer your process is busy and not attending the incoming
packets, the quicker they'll be dropped by the kernel.
-
(Question
suggested by Michael T. Stolarchuk
<mts@rare.net> along with some suggestions for the
answer.)
-
Network
diagnostics such as the verification of a network's setup,
examples are tools like arp, that report the ARP messages
sent from hosts.
-
Reconstruction
of end to end sessions. tcpshow attempts to do this, but
more sophisticated examples are the array of security tools
which try to keep tabs on network connections.
-
Monitoring
network load. Probably one of the most practical uses, a
lot of commercial products usually use specialized hardware
to accomplish this.
-
No, the
packet capturing facilities mentioned make copies of the packets,
and do not remove them from the system's TCP/IP stack. If you
wish to prevent packets from reaching the TCP/IP stack you need
to use a firewall, (which should be able to do packet filtering).
Don't confuse the packet filtering done by packet capturing
facilities with those done by firewalls. They serve different
purposes.
-
Yes, route
<route@infonexus.com>
maintains Libnet, a library that provides an API for low
level packet writing and handling. It serves as a good compliment
for libpcap, if you wish to read and write packets. The project's
webpage can be found at:
http://www.packetfactory.net/libnet/
-
A PERL module
that gives access to raw sockets is available at:
http://quake.skif.net/RawIP/
A Python
library "py-libpap" can be found at:
ftp://ftp.python.org/pub/python/contrib/Network/
Back
to Top
-
-
The BSD
socket API allows one to open a raw socket and bypass layers
in the TCP/IP stack. Be warned that if an OS doesn't support
correct BSD semantics (correct is used loosely here), you're
going to have a hard time making it work. Below, an attempt
is made to address some of the bugs or surprises you're in store
for. On almost all sane systems only root (superuser) can open
a raw socket.
-
-
Depending
on what you want to send, you initially open a socket and
give it its type.
sockd
= socket(AF_INET,SOCK_RAW,<protocol>);
You
can choose from any protocol including IPPROTO_RAW. The
protocol number goes into the IP header verbatim. IPPROTO_RAW
places 0 in the IP header.
Most
systems have a socket option IP_HDRINCL which allows you
to include your own IP header along with the rest of the
packet. If your system doesn't have this option, you may
or may not be able to include your own IP header. If it
is available, you should use it as such:
char
on = 1;
setsockopt(sockd,IPPROTO_IP,IP_HDRINCL,&on,sizeof(on));
Of
course, if you don't want to include an IP header, you can
always specify a protocol in the creation of the socket
and slip your transport level header under it.
You
then build the packet and use a normal sendto().
-
Examples
can be found at
http://www.whitefang.com/rin/ which attempt to illustrate
the details involved. They also illustrate some of the bugs
mentioned below.
Briefly,
you need to actually write the packet out in memory and
hand it over to the socket where it will hopefully fire
it away and await more packets.
-
Traditionally
the BSD socket API did not allow you to listen to just any
incoming packet via a raw socket. Although Linux (2.0.30
was the last version I had a look at), did allow this, it
has to do with their own implementation of the TCP/IP stack.
Correct BSD semantics allow you to get some packets which
match a certain category (see below).
There's
a logical reason behind this; for example TCP packets are
always handled by the kernel. If the port is open, send
a SYN-ACK and establish the connection, or send back a RST.
On the other hand, some types of ICMP (I compiled a small
list below), the kernel can't handle. Like an ICMP echo
reply, is passed to a matching raw socket, since it was
meant for a user program to receive it.
The
solution is to firewall that particular port if it was a
UDP or TCP packet, and sniff it with a packet capturing
API (a list is mentioned above). This prevents the TCP/IP
stack from handling the packet, thus it will be ignored
and you can handle it yourself without intervention.
If
you don't firewall it, and reply yourself you'll wind up
having additional responses from your operating system!
Here's
a concise explanation of the semantics of a raw BSD socket,
taken from a Usenet post by W. Richard Stevens
From
<rstevens@kohala.com>
(Sun Jul 6 12:07:07 1997) :
"The
semantics of BSD raw sockets are:
|
- |
TCP and UDP: no one other than the kernel gets these.
|
|
- |
ICMP: a copy of each ICMP gets passed to each matching
raw socket, except for a few that the kernel generates
the reply for: ICMP echo request, timestamp request,
and mask request. |
|
- |
IGMP: all of these get passed to all matching raw sockets.
|
|
- |
all other protocols that the kernel doesn't deal with
(OSPF, etc.): these all get passed to all matching raw
sockets." |
After
looking at the icmp_input() routine from the 4.4BSD's TCP/IP
stack, it seems the following ICMP types will be passed
to matching raw sockets:
-
Echo
Reply: (0)
-
Router
Advertisement (9)
-
Time
Stamp Reply (13)
-
Mask
Reply (18)
-
-
Systems
derived from 4.4BSD have a bug in which the ip_len and ip_off
members of the ip header have to be set in host byte order
rather than network byte order. Some systems may have fixed
this. I've confirmed this bug has been fixed on OpenBSD
2.1.
-
Thanks
to Michael Masino <mmasino@mitre.org>
, Lamont Granquist
<lamontg@hitl.washington.edu> , and route
<route@infonexus.com> for the submission of bug
reports.
Some
systems will process some of the fields in the IP and transport
headers. I've attempted to verify the reports I've received
here's what I can verify for sure.
Solaris
(at least 2.5/2.6) and changes the IP ID field, and adds
a Do Not Fragment flag to the IP header (IP_DF). It also
expects the checksum to contain the length of the transport
level header, and the data.
Further
reports which I cannot verify (can't reproduce), consist
of claims that Solaris 2.x and Irix 6.x will change the
sequence and acknowledgment numbers. Irix 6.x is also believed
to have the problem mentioned in the previous paragraph.
If you experience these problems, double check with the
example source code.
You'll
save yourself a lot of trouble by just getting Libnet
http://www.packetfactory.net/libnet/
-
Various
UNIX utilities use raw sockets, among them are: traceroute,
ping, arp. Also, a lot of Internet security tools make use of
raw sockets. However in the long run, raw sockets have proven
bug ridden, unportable and limited in use.
-
-
libpcap
was written so that applications could do packet capturing portably.
Since it's system independent and supports numerous operating
systems, your packet capturing application becomes more portable
to various other systems.
-
Yes, libpcap
will only use in-kernel packet filtering when using BPF, which
is found on BSD derived systems. This means any packet filters
used on other operating systems which don't use BPF will be
done in user space, thus losing out on a lot of speed and efficiency.
This is not what you want, because packet loss can increase
when sniffing a busy network.
DEC OSF/1
has an API which has been extended to support BPF-style filters;
libpcap does utilize this.
In the
future, libpcap may translate BPF style filters to other packet
capturing facilities, but this has not been implemented yet
as of version 0.3
Refer to
question 1.4 to see how packet filters help in reliably monitoring
your network.
-
A lot of
the source code found at LBNL's ftp archive
ftp://ftp.ee.lbl.gov/ uses libpcap. More specifically,
ftp://ftp.ee.lbl.gov/tcpdump.tar.Z
probably demonstrates libpcap to a large extent.
-
- Thamer Al-Herbish
<shadows@whitefang.com>
- W. Richard
Stevens <rstevens@kohala.com>
- John W. Temples
(III) <john@whitefang.com>
- Michael Masino
<mmasino@mitre.org>
- Lamont Granquist
<lamontg@hitl.washington.edu>
- Michael T.
Stolarchuk <mts@rare.net>
- Mike Borella
<Mike_Borella@mw.3com.com>
- route <route@infonexus.com>
- Derrick J
Brashear <shadow@dementia.org>
|