• Please review our updated Terms and Rules here

help, TCP wizards! 4.2BSD vs Linux congestion control?

stepleton

Veteran Member
Joined
Jan 1, 2020
Messages
533
Location
London, UK
Greetings all,

I'm trying to copy some files from a 4.2BSD machine (a Whitechapel MG-1 running 42nix 2.5, for what it's worth) over ethernet via good old-fashioned rsh.

I'm encountering a strange problem: the first ten packets seem to transfer fine, but subsequent packets are increasingly delayed. The delay climbs by an increasing, linear amount with each packet until it levels off at around 30 seconds between packets. This is, of course, very slow---it takes many minutes to download files much larger than a few dozen kilobytes.

I suspect the culprit is some form of congestion control on the Linux side. I've tried disabling various options mentioned in the ip-sysctl reference, or reverting to the old "reno" congestion control algorithm, but nothing seems to help. The problem also holds for other programs besides rsh, so it really does appear to be at the TCP level (or maybe lower?). Hoping a networking expert might have a suggestion for what to do...

( Linux TCP globals changed, for the record: )
Code:
  net.ipv4.tcp_sack = 0
  net.ipv4.tcp_timestamps = 0
  net.ipv4.tcp_allowed_congestion_control = reno
  net.ipv4.tcp_congestion_control = reno
  net.ipv4.tcp_ecn = 0
  net.ipv4.tcp_slow_start_after_idle = 0
 
I've used Linux machines to develop and debug my DOS TCP/IP programs and have not noticed problems like this.

If you run tcpdump on the Linux side do you see anything unusual? Are you on a hub or a switch?

TCP should be measuring the round trip time and adjusting it for errors. On a local network it's hard to imagine that it's degrading to 30 seconds. You might try to see if all TCP sockets behave like this, or only RSH ones.
 
I'll have to try the tcpdump investigation tomorrow... For now, the `ss -ti` command shows no particular statistic correlating well with the packet delay increase after 10 packets.

Other protocols seem to suffer the same problem: FTP also slows down considerably if you try to transfer large files.
 
I still have yet to resume my MG-1 experiments and try tcpdump---but I forgot to add that I'm using a good ethernet cable connected directly between the ethernet port on my laptop and the MG-1.

I don't think it's thrashing---the Linux side still operates fine, and the MG-1 still allows you to log in on the console or over rsh. The `uptime` command shows low system load, and the machine itself feels fairly responsive.
 
Not an simple process, but if you can isolate the machines as best as possible and install WireShark (https://www.wireshark.org/) on a third to monitor packet traffic it might give some insight. I used it for debugging my Apple II IP stack and was quite helpful.
 
I would definitely recommend taking a capture and looking at it with wireshark.

One thing I'm wondering is if you might have a duplex mismatch between the two ends, the older machine isn't capable of full duplex, and as a result it's dropping acks from the linux machine and falling into a kind of retransmit spiral of death.
 
The solution has been uncovered :D

Although I wasn't clever enough to understand it on my own, I dumped some data with tcpdump and shared it with a wiser friend. They observed the presence of ethernet frames with the strange ethertype of 0x1002, which this page identified as participating in trailer encapsulation as defined in RFC 893. My Linux laptop (Ubuntu 18.04.4 LTS) doesn't understand trailer encapsulation, apparently not even if you `ifconfig ethwhatever trailers`, so these packets are ignored, causing the MG-1 to space them out longer and longer. However, before giving up altogether, the MG-1 tries the regular 0x0800 non-encapsulated IP frame, and the packet goes through---very, very delayed.

Fortunately, on the MG-1, root can say `ifconfig lance0 -trailers` to disable this behaviour, which is enabled by default on the Whitechapel. Everything works just fine after that.

So, the next time you see this kind of gradual linear slogging on a 4.2BSD box, consider trailer encapsulation.
 
Back
Top