#4434 closed defect (fixed)
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
回報者: | Costin Grigoras | 負責人: | |
---|---|---|---|
元件: | network | 版本: | VirtualBox 3.0.0 |
關鍵字: | 副本: | ||
Guest type: | Linux | Host type: | Linux |
描述
I see quite frequent this kind of message in the guest, 2.6.28.3, 2.6.29.4 or 2.6.31-rc2 (e1000 ver 7.3.20-k3-NAPI and 7.3.21-k3-NAPI):
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue <0> TDH <41> TDT <46> next_to_use <46> next_to_clean <41> buffer_info[next_to_clean] time_stamp <ffffc090> next_to_watch <42> jiffies <ffffc120> next_to_watch.status <0>
附加檔案 (3)
更動歷史 (18)
comment:2 15 年 前 由 編輯
Might be, though in this particular case the guest doesn't hang. Let's see then in the next version.
comment:3 15 年 前 由 編輯
狀態: | closed → reopened |
---|---|
處理結果: | duplicate |
I got the error message, too. But with different values. The network would freeze, but the server would run just fine. Sometimes the network managed to get unstuck.
My setup:
Host: Linux 2.6.30-amd64, 8 cpus. Running virtualbox-ose 3.0.4 Guest: Linux 2.6.30-amd64, 8 cpus. Bridged network.
I can provoke the error by rsyncing a large directory (100 GB) to the guest. This causes sustained inbound traffic of 80 Mbps.
If I run the guest with 8 cpus the error consistently occurs after 1000-2000 seconds. If the guest was run with 1 cpu it took 8500 seconds before it occurred.
If I run the guest with NAT'ed network I managed to provoke the error as well, but after around 6000 seconds on 8 cpu.
I have tested bridging on 8 cpu with virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb and get the same problem after 1500 seconds. So the problem is not fixed.
comment:4 15 年 前 由 編輯
I wondered if increasing the number of CPUs past the number of physical CPUs would provoke this problem earlier. I just ran: Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.
This provoked the problem after 130 seconds in first try, and after 1200 seconds in second try.
I have wondered if the problem can be caused by a flaky clock.
[ 0.036000] Spurious LAPIC timer interrupt on cpu 0 [ 0.196001] Measured 15867 cycles TSC warp between CPUs, turning off TSC clock. [ 0.196001] Marking TSC unstable due to check_tsc_sync_source failed [ 9.548093] PCSP: Timer resolution is not sufficient (4000250nS) [ 10.116099] intel8x0_measure_ac97_clock: measured 59999 usecs (11276 samples) [ 10.120017] intel8x0: measured clock 187936 rejected
comment:5 15 年 前 由 編輯
The problem gets more and more peculiar: I have tested this setup:
Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.
The after 25 minutes of 80 Mbit/s sustained traffic the network broke. I even got at dump.
But as soon as I pressed enter in the console window the network worked again.
The network broke again after 3 minutes. Pressing space in the console solved it.
The network broke again after 4 minutes. Pressing 'f' in the console solved it.
It seems pressing any key in the console makes the network run again.
I have been able to get at better screenshot of the watchdog error messages from the kernel. These are attached.
comment:6 15 年 前 由 編輯
Because of my experience of having to press a key I got the idea, that the fault may be in the VirtualBox (the GUI). So I ran VBoxHeadless on the same virtual machine. It has now been running for 5000 seconds with 80 Mbit/s sustained without a hiccup. This leads me to believe the problem is in the interaction with VirtualBox (the GUI).
I have now installed virtualbox-ose 3.0.6 (r52128), and will try to see if the VBoxHeadless solves the issue here aswell.
costing: You reported this bug. Can you reproduce it today? Is it gone if you run the vm with VBoxHeadless?
comment:7 15 年 前 由 編輯
I was always running under VBoxHeadless. The messages were there in 3.0.4 for sure. Since upgrading to 3.0.6 I haven't seen them any more (yet?). But for me they didn't cause any major problems, the system was still working without intervention. Just that dmesg is full of such errors.
comment:8 15 年 前 由 編輯
virtualbox-ose 3.0.6 (r52128) running VBoxHeadless drops the network connection after 1200 sec. But since it is headless I cannot tell if it was due to the same bug.
comment:9 15 年 前 由 編輯
VBoxHeadless crashed the network, too. I have tested this setup:
Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.
Running under VBoxHeadless the network broke down after 5900 seconds. If I rdesktop'ed into the server after the network had stopped working and pressed 'enter' then the network worked again immediately, so it seems the GUI is not the cause of the problem afterall.
Will it be helpful if I make a snapshot of the server in the broken down state?
comment:10 15 年 前 由 編輯
No. Btw, you should upgrade to the final 3.0.6 though I don't think this will solve your problem. Do you have more than one guest CPU enabled?
comment:11 15 年 前 由 編輯
Yes: As mentioned I have tried with both 8 CPU and 12 CPU on guest. The 4 extra CPUs did not change anything - neither for good nor bad.
As a workaround is it possible to press 'enter' using a program on the host machine? (I.e. can I write a script that presses 'enter' every minute?).
comment:12 15 年 前 由 編輯
Also as mentioned in initial report: "If the guest was run with 1 cpu it took 8500 seconds before it occurred." So just using 1 CPU does not solve the issue, but seems to postpone it somewhat.
comment:13 15 年 前 由 編輯
One of the things I seem to have forgotten to mention is that both host and guest is 64-bit.
Most probably a duplicate of #4343.