#4434 closed defect (fixed)

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

回報者:	Costin Grigoras	負責人:
元件:	network	版本:	VirtualBox 3.0.0
關鍵字:		副本:
Guest type:	Linux	Host type:	Linux

描述

I see quite frequent this kind of message in the guest, 2.6.28.3, 2.6.29.4 or 2.6.31-rc2 (e1000 ver 7.3.20-k3-NAPI and 7.3.21-k3-NAPI):

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <41>
  TDT                  <46>
  next_to_use          <46>
  next_to_clean        <41>
buffer_info[next_to_clean]
  time_stamp           <ffffc090>
  next_to_watch        <42>
  jiffies              <ffffc120>
  next_to_watch.status <0>

附加檔案 (3)

panic.png (36.7 KB ) - 15 年前，由 Ole Tange 新增: Screenshot of kernel panic
watchdog1.png (35.5 KB ) - 15 年前，由 Ole Tange 新增: Watchdog output part 1
watchdog2.png (35.6 KB ) - 15 年前，由 Ole Tange 新增: Watchdog output part 2

下載所有附檔: .zip

更動歷史 (18)

comment:1 15 年前由 Frank Mehnert 編輯

狀態:	new → closed
處理結果:	→ duplicate

Most probably a duplicate of #4343.

comment:2 15 年前由 Costin Grigoras 編輯

Might be, though in this particular case the guest doesn't hang. Let's see then in the next version.

comment:3 15 年前由 Ole Tange 編輯

狀態:	closed → reopened
處理結果:	duplicate

I got the error message, too. But with different values. The network would freeze, but the server would run just fine. Sometimes the network managed to get unstuck.

My setup:

Host: Linux 2.6.30-amd64, 8 cpus. Running virtualbox-ose 3.0.4 Guest: Linux 2.6.30-amd64, 8 cpus. Bridged network.

I can provoke the error by rsyncing a large directory (100 GB) to the guest. This causes sustained inbound traffic of 80 Mbps.

If I run the guest with 8 cpus the error consistently occurs after 1000-2000 seconds. If the guest was run with 1 cpu it took 8500 seconds before it occurred.

If I run the guest with NAT'ed network I managed to provoke the error as well, but after around 6000 seconds on 8 cpu.

I have tested bridging on 8 cpu with virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb and get the same problem after 1500 seconds. So the problem is not fixed.

comment:4 15 年前由 Ole Tange 編輯

I wondered if increasing the number of CPUs past the number of physical CPUs would provoke this problem earlier. I just ran: Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

This provoked the problem after 130 seconds in first try, and after 1200 seconds in second try.

I have wondered if the problem can be caused by a flaky clock.

[    0.036000] Spurious LAPIC timer interrupt on cpu 0
[    0.196001] Measured 15867 cycles TSC warp between CPUs, turning off TSC clock.
[    0.196001] Marking TSC unstable due to check_tsc_sync_source failed
[    9.548093] PCSP: Timer resolution is not sufficient (4000250nS)
[   10.116099] intel8x0_measure_ac97_clock: measured 59999 usecs (11276 samples)
[   10.120017] intel8x0: measured clock 187936 rejected

15 年前由 Ole Tange 編輯

附檔:	新增 panic.png

Screenshot of kernel panic

comment:5 15 年前由 Ole Tange 編輯

The problem gets more and more peculiar: I have tested this setup:

Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

The after 25 minutes of 80 Mbit/s sustained traffic the network broke. I even got at dump.

But as soon as I pressed enter in the console window the network worked again.

The network broke again after 3 minutes. Pressing space in the console solved it.

The network broke again after 4 minutes. Pressing 'f' in the console solved it.

It seems pressing any key in the console makes the network run again.

I have been able to get at better screenshot of the watchdog error messages from the kernel. These are attached.

15 年前由 Ole Tange 編輯

附檔:	新增 watchdog1.png

Watchdog output part 1

15 年前由 Ole Tange 編輯

附檔:	新增 watchdog2.png

Watchdog output part 2

comment:6 15 年前由 Ole Tange 編輯

Because of my experience of having to press a key I got the idea, that the fault may be in the VirtualBox (the GUI). So I ran VBoxHeadless on the same virtual machine. It has now been running for 5000 seconds with 80 Mbit/s sustained without a hiccup. This leads me to believe the problem is in the interaction with VirtualBox (the GUI).

I have now installed virtualbox-ose 3.0.6 (r52128), and will try to see if the VBoxHeadless solves the issue here aswell.

costing: You reported this bug. Can you reproduce it today? Is it gone if you run the vm with VBoxHeadless?

comment:7 15 年前由 Costin Grigoras 編輯

I was always running under VBoxHeadless. The messages were there in 3.0.4 for sure. Since upgrading to 3.0.6 I haven't seen them any more (yet?). But for me they didn't cause any major problems, the system was still working without intervention. Just that dmesg is full of such errors.

comment:8 15 年前由 Ole Tange 編輯

virtualbox-ose 3.0.6 (r52128) running VBoxHeadless drops the network connection after 1200 sec. But since it is headless I cannot tell if it was due to the same bug.

comment:9 15 年前由 Ole Tange 編輯

VBoxHeadless crashed the network, too. I have tested this setup:

Host: 8 CPU, virtualbox-3.0_3.0.6_BETA1-51790_Debian_lenny_amd64.deb, Guest: 12 CPU.

Running under VBoxHeadless the network broke down after 5900 seconds. If I rdesktop'ed into the server after the network had stopped working and pressed 'enter' then the network worked again immediately, so it seems the GUI is not the cause of the problem afterall.

Will it be helpful if I make a snapshot of the server in the broken down state?

comment:10 15 年前由 Frank Mehnert 編輯

No. Btw, you should upgrade to the final 3.0.6 though I don't think this will solve your problem. Do you have more than one guest CPU enabled?

comment:11 15 年前由 Ole Tange 編輯

Yes: As mentioned I have tried with both 8 CPU and 12 CPU on guest. The 4 extra CPUs did not change anything - neither for good nor bad.

As a workaround is it possible to press 'enter' using a program on the host machine? (I.e. can I write a script that presses 'enter' every minute?).

comment:12 15 年前由 Ole Tange 編輯

Also as mentioned in initial report: "If the guest was run with 1 cpu it took 8500 seconds before it occurred." So just using 1 CPU does not solve the issue, but seems to postpone it somewhat.

comment:13 15 年前由 Ole Tange 編輯

One of the things I seem to have forgotten to mention is that both host and guest is 64-bit.

comment:14 14 年前由 Frank Mehnert 編輯

Still relevant with VBox 4.0.6? Perhaps related to #8755?

comment:15 14 年前由 Frank Mehnert 編輯

狀態:	reopened → closed
處理結果:	→ fixed

No response, closing.

注意: 瀏覽 TracTickets 來幫助您使用待辦事項功能

以其他格式下載:

#4434 closed defect (fixed)

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

描述

附加檔案 (3)

更動歷史 (18)

comment:1 15 年 前 由 Frank Mehnert 編輯

comment:2 15 年 前 由 Costin Grigoras 編輯

comment:3 15 年 前 由 Ole Tange 編輯

comment:4 15 年 前 由 Ole Tange 編輯

15 年 前 由 Ole Tange 編輯

comment:5 15 年 前 由 Ole Tange 編輯

15 年 前 由 Ole Tange 編輯

15 年 前 由 Ole Tange 編輯

comment:6 15 年 前 由 Ole Tange 編輯

comment:7 15 年 前 由 Costin Grigoras 編輯

comment:8 15 年 前 由 Ole Tange 編輯

comment:9 15 年 前 由 Ole Tange 編輯

comment:10 15 年 前 由 Frank Mehnert 編輯

comment:11 15 年 前 由 Ole Tange 編輯

comment:12 15 年 前 由 Ole Tange 編輯

comment:13 15 年 前 由 Ole Tange 編輯

comment:14 14 年 前 由 Frank Mehnert 編輯

comment:15 14 年 前 由 Frank Mehnert 編輯

以其他格式下載:

comment:1 15 年前由 Frank Mehnert 編輯

comment:2 15 年前由 Costin Grigoras 編輯

comment:3 15 年前由 Ole Tange 編輯

comment:4 15 年前由 Ole Tange 編輯

15 年前由 Ole Tange 編輯

comment:5 15 年前由 Ole Tange 編輯

15 年前由 Ole Tange 編輯

15 年前由 Ole Tange 編輯

comment:6 15 年前由 Ole Tange 編輯

comment:7 15 年前由 Costin Grigoras 編輯

comment:8 15 年前由 Ole Tange 編輯

comment:9 15 年前由 Ole Tange 編輯

comment:10 15 年前由 Frank Mehnert 編輯

comment:11 15 年前由 Ole Tange 編輯

comment:12 15 年前由 Ole Tange 編輯

comment:13 15 年前由 Ole Tange 編輯

comment:14 14 年前由 Frank Mehnert 編輯

comment:15 14 年前由 Frank Mehnert 編輯