Note: This is a long post, but I expect it to bring in some questions for many Linux people; I advise you to read this when you have enough time.
After the last events related to connectivity at home which have been lasting since Thursday evening and the weird fact that it seems that NAT doesn't work at home, but does at work for the same setup, I called the ISP's support and wasted 15 minutes trying to convince them at least to send a technical team with another modem just to test if that is the reason for the breakage.
Of course, talking to the ISP support and trying to convince them that there might be a problem on their side since NAT works with another provider but doesn't with them, was fruitless.
Tonight I was stuck and tried another approach, convinced I would confirm my suspicion that there is something wrong on ISP's side. Still I am not yet sure what to conclude from what happened.
So, to get a better view of what is going on, I'll describe the setup I have and what are its limitations and characteristics. People in a hurry can skip to the paragraph starting with "I tested" and stare in wonder at will.
So the connection I have is done through a DSL modem which gets the MAC of the network card connected to it and exposes that as its own to the ISP's network. This MAC seems to be quite persistent and special measures must be taken in order to be able to use another NIC to connect. The modem (or probably some machine in ISP's; the IP is 10.10.0.1) offers a DHCP address and everything should work fine.
Because of this "your MAC is my MAC" issue, when I connected the first time, I used an USB NIC since a broken router or some temporary failure would have allowed me to use the Internet connection directly from my laptop. I can say this decision has proven in time to be wise.
The router I used until now is a NSLU2 with Debian installed on it. The built in network interface always faced the internal network.
The router (which I call ritter) served as a NAT router for two machines inside the network, my laptop and my apartment mate's laptop. All until last Thursday, after which it never got back properly.
I tested (doing NAT on my laptop; the laptop shouldn't have been affected by the power problems):
- with ritter behind the laptop NAT, at home
- with two different virtual machines as NAT "clients", at home
- with ritter behind the laptop NAT, at work
- with a virtual machine as NAT client, at home
- directly from the laptop
- NAT made through a SNAT rule
- NAT made through a MASQUERADING rule
- with TTL mangled (increased by one, although is was never in the ballpark of a low TTL)
Of course, I have n-checked that /proc/sys/net/ipv4/ip_forward was set to 1, the tables had policy ACCEPT and there were no extra rules, except the basic NAT-ting stuff, the routes were correctly set on both the clients and the machine doing the NAT.
All I could see is that:
- the machine doing the NAT was always working fine
- at work all NAT clients worked fine
- at home any of the NAT "clients" were
- able to resolve addresses while the DNS server was in ISP territory and another LAN
- ping the outside world (if ping was available - in D-I is not)
- hanging when trying to get a http page
- telnet-ting directly to the port 80 was ok (but I didn't try to "GET / HTTP/1.0")
So after all of this, I was thinking of trying to see if a new client (the laptop of my apartment mate, a Windows XP machine) would work using the connection at home. It didn't work.
Then I thought of trying to do "Internet Connection Sharing" as is called in Windows. Of course, there was some pain to find Windows XP drivers for the ASIX AX88172 network card (remember, the modem needed to see the MAC of that NIC), but I managed to find the proper one.
And, almost sure the NAT wouldn't work for this case either, I configured the new connection as a shared one. I didn't even disabled the firewall, as I was thinking I could take those down gradually.
I wasn't expecting this, not even by pure chance, but my laptop which was a NAT "client" now was able to browse, ping and do whatever was normal through NAT, while the Windows machine was doing the "Connection sharing".
I was utterly flabbergasted. And that was just the beginning.
I was expecting that the problem coincidentally went away, but after a minute I was proven otherwise. It still didn't work with Linux as the NAT-ting machine. I connected back the USB NIC to the Windows machine and I saw the same thing. NAT was just working.
At that point I was to observe an even more shocking fact: the IP that the Windows machine got was different from the one the Linux machine received, in spite of the fact that the network card was the same, so it would have made sense to get the same one. More than that, the IP that the Windows machine got was from an entirely different network, although it was a valid IP belonging to my ISP.
I was thinking that one reason why it works with Windows might be that there could be some TCP protocol twist that is differently implemented in Windows and the equipment from my ISP gets along better with the Windows network stack.
As a way to test that, I am thinking of forcing somehow the IP on the Linux machine to see if anything changes. But before doing that, I felt the urge to post these, maybe some kind soul will shed some light on this issue for me or drop a hint.
Another reason might be different DHCP servers answering, but I don't know how I can see in Windows who offered the lease.
If anyone has any clue why these weird things happen, please drop a line. I would greatly appreciate it. TIA.