Hi,
- Trying to nsis-ping through (from host1 to host3) my test network
(see http://users.skynet.be/ilove/lab.png) I got a "invalid ping exception" in the node "router1"... I don't understand what does that mean, can someone help me? (I attached the trace from router1).
Can you try using the diagnostics NSLP instead of nsis-ping? See below why ping is failing.
- I wonder how nsis "learn" the routing table. In my test network, I
use zebra as a routing daemon and then I run nsis (automatically using netkit). But the nsis doesn't work out of the box... I mean I have to kill nsis and run it again to have to result.. Otherwise I get:
*** GistException: findOutgoingLocalAddress failed: No local address was found for this destination address (100.0.0.1). Please check your configuration *
As soon as I have killed and restarted nsis, I can ping the "next" node...
I know nothing about netkit, but nsis learns the routing table from reading the same file that "route" reads. Maybe you have a race between starting nsis and setting up your routing?
- On the node router3 or router5, depending on how the routing is
made, I got sometimes an error saying that I'm using a "legacy NAT"... Where does this come from?
Check the GIST spec. There are some checks to detect a legacy NAT, i.e. a NAT that is not GIST-aware. They depend on checking IP addresses. See section 7.2.1 of http://www.ietf.org/internet-drafts/draft-ietf-nsis-ntlp-15.txt "If S=1, the receiver MUST check the source address in the IP header against the interface-address in the NLI, and if these addresses do not match this indicates that a legacy NAT has been found."
I recommend using our wireshark/ethereal dissector to capture the packets and take a closer look at the packets. The legacy NAT detection is not well tested yet, so you MAY be hitting a bug. If you can identify who is complaining about which packet, I would like to take a look at this packet. You can also take a look at the debug output right before the error message. Copy/Pasted from your mail:
(warn) *** Dmode message from : 100.0.2.2:4> *** (info) GIST message: Version = 1 Hop Count = 4 NSLP ID = 3 Message length = 112 (info) Type = RESPONSE (info) S flag = 0 (info) R flag = 0 (info) E flag = 0 (info) GistMriPathCoupled: length = 56 (info) Direction = Upstream (info) Source IP = 100.0.0.1 (info) Dest IP = 100.0.9.2 (info) GistSessionId: length = 20 (info) 0a000000 0a000000 0a000000 0a000000 (info) GistNetworkLayerInfo: length = 28 (info) IntAddr = 100.0.2.2 (info) RouteStateValidityTime = 100000 (info) GistQueryCookie: length = 36
Here you can see what the source IP address (first line), what the S-flag is set to (8th line), and what the IP address in the NLI (GistNetworkLayerInfo) is. Look at these values in the packet that triggers the legacy NAT error message and you might find out what is going on.
Just to make sure: You have no NAT in your setup, right?
DataSize: 268, HeaderLength: 12, Calc: 268(crit) *** GistException: Invalid ping message. Length invalid ***
The code checks whether or not all 3 of these values are equal, i.e. DataSize == HeaderLength == Calc. Obviously this is wrong in this message. *sigh* I just figured out what the problem is. 268 = 256 + 12. HeaderLength is defined as 8 bits, i.e. wraps around at 256, hence the value of 12. If I am reading my code correctly, the diagnostics NSLP will have the same problem. Seems like we never tried so many nodes.
Are you familiar with C? If so, you can go into nslp/ping/PingMessage.h and modify struct ping_header_t. I suggest something like: struct ping_header_t { unsigned char version : 8; unsigned char version : 8; unsigned short length; unsigned int seq; }
That should help.
Christian
nsis_imp@informatik.uni-goettingen.de