StoreDocumentationSpecialsLatest PostsContactOther Stuff
Last Update: Mar 27th, 2024

Troubleshooting (v8)

BWMGR Won't Start

If your bwmgr won't start, try to start it manually to get an error code:

# bwmgr start

If you get an "expired" message and you've just installed a new license, your software may be too old; if it's more than a year old the license may not work. There is no reason to "install" a new license with older software; only install your license after you've upgraded.

Other error codes should be reported to Support.

Can't Log In to GUI

The GUI password should be set to the same as the root password in bwmgrSetup (or saturn5 if it's a new appliance). You can change/set the gui passwors from the command line


bwmgr guipassword admin PASSWORD

If you know you're using the correct credentials check for js errors i

File System Full

If you get the message "File System Full", this can stop any number of programs from running correctly, most notably the mySQL database. If you haven't maintained your system in a long time, you could have log files or left over .img or .tgz files that you've downloaded and never removed. To find large files, use the following command:

find {/} -type f -size +{2}G -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

This command will find all files over 2GB on the system. Make sure you only have your primary disk mounted. You'll get output something like this:

/usr/local/images/bigfile.img: 3.7G
/var/log/somelog.log: 3.1G

Use the "df" command to see the status of your files systems. Note that when you delete a file, it will NOT be immediately shown in the df output.

etbwmgr# df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ada0s1a 2031132 403652 1464992 22% /
devfs 1 1 0 100% /dev
/dev/ada0s1e 2031132 692800 1175844 37% /var
/dev/ada0s1f 144424956 71843644 61027316 54% /usr

For large log files, you should add an entry to the /etc/newsyslog.conf file so that they will be automatically rotated when they get too large or at specified time intervals.

(see "man newsyslot.conf")

Rules Missing

If you have rules "missing" from your /etc/rc.bwmgr file (or you see Can't Allocate Stats Structure in your system log), make sure you have the latest code. As things are reported we find some variations of rebuilt startup rules that don't work.

There is a default limit of 8000 named protocols or rules with stats enabled (stats are also maintained for Protocols). If you have over 2000 rules with stats enabled, you'll have to increase the table size. In /root/loader.conf:

bwmgr.protocol_stats_table=10000

You'll need to add or increase the value to something higher. It pre-allocates a chunk of memory so keep it to a number only a bit larger than what you'll need.

Missing Some/All Rules After Reboot

To begin, it's important to know how the rules are added at boot time: the startup script /etc/rc.bwmgr this file stores all of your ET/BWMGR rules, groups, and settings. If the rule in question is not in this file, it will not be added at boot time.

Make sure the rule exists

First, check to see that the rule is in the startup file. If it's not in the file, it won't be added at boot. You can either open /etc/rc.bwmgr in an editor (ie vi), or you can search for it with the grep utility:

# grep text /etc/rc.bwmgr

where text is unique text within the rule, such as the rule name or index.

If the rule has an entry the startup file, but doesn't exist in the ruleset following a boot, run the rule manually from the command line to see why. If there's a syntax error or bad option, an error code should be displayed, along with the usage guide for the bwmgr utility.

If the rule is not in the file, it may be because:

  • You didn't manually rebuild your ruleset after adding the rule
  • You don't have auto-rebuild enabled

To enable auto-rebuild, make sure that Auto Rebuild is set to 1 in the Settings tab in the GUI. Note that the ruleset is updated once every 5 minutes, so if you reboot right after adding rules they may not have been added to your ruleset.

If you have lots of rules that aren't being added, you may have general syntax errors. This is only likely to happen when a large ruleset is imported from an older appliance that is not properly updated to match v5 syntax. This situation can be can be harder to track down, since some of the rules will load, and not others.

To simplify the task of identifying rule(s) that are not loading due to syntax errors, run the startup script with the -x option which will cause each command to be printed as it is run.

# sh -x /etc/rc.bwmgr

Another check is to use

bwmgr rebuild
bwmgr rebuild userules

"rebuild" shows the rules in the database. "rebuild userules" shows the rules actually loaded into the system. If a rule is missing, check to see if the rule is or isn't in both of those outputs.

Sometimes, the database can become corrupt or out of sync. To rebuild the database from the rules in your system:

bwmgr flushdb
bwmgr rebuild userules | sh

Note that the rules shown in the GUI are rules in the database.

WARNING: Outside interface not set. Limiting Disabled

The Outside Interface is a setting that tells the ET/BWMGR which of your bridge ports is the "outside," that is, connected to your upstream network. It's a required setting - as the error message indicates, no bandwidth limiting can be performed until the software can identify the outside interface.

If you've used bwmgr_setup to set up networking, you should not see this message; but if you have done the setup manually, or modified it after the fact, you may have inadvertently cleared this setting, in which case you will see this on the console and also in /var/log/messages.

To set the "outside" flag:

  • go to the "Interfaces" tab in the ET/BWMGR
  • select the interface using the check-box to the left of the interface name
  • click "Edit"
  • Select the first check-box, "Outside"
  • click "Save" to apply the setting.

High Latency Through Bridge / Bridge Not Passing Traffic

Check the Hardware

First, check the basic hardware connections. A bad ethernet cable, or a loose connection can cause these symptoms. Check the link status at both ends of each side of the bridge ( both bypass ports, and the equipment that they are connected to. )

Next, check for errors on your bridge interfaces. From the command line, run the following command to report on network interface statistics.

bwmgr# netstat -in
bwmgr# netstat -in

Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
igb1 1500 Link#5 00:e0:ed:2b:05:28 3487 0 0 0 0 0
igb2 1500 Link#6 00:e0:ed:2b:05:29 0 0 0 1743 0 0

If you do see input or output errors, check again after a short time and see if the error counters are increasing as more data is passed. If so, look at the link settings on both sides of the connection. Occasionally, N-way/auto negotiation to an individual piece of equipment may fail (one common example is using a crossover cable directly to some Cisco router equipment.) If this happens, and you have conflicts as to link speed or duplex, you will see many errors and a significant drop in performance. Manually setting the link speed & duplex on each side of the link may solve the problem. If not, another solution is to put a small, inexpensive switch in-between the failover port and the problem equipment.

Manually Setting Link Speed & Duplex

Using the ifconfig command, you can manually set the link options, which disables auto-negotiation. To see the available options for a given port, you can read the man page for the driver, in this case em:

# man em

Examples:

Set the link speed to 100Mb/s, full duplex:

# ifconfig em3 media 100baseTX mediaopt full-duplex

Set the link speed to 1000Mb/s (Gigabit) full duplex:

# ifconfig em3 media 1000baseTX mediaopt full-duplex

Check the Software

If you do not have a hardware issue or a link negotiation conflict, then the only likely cause of high latency or poor performance is the active ET/BWMGR ruleset. In this order, here are 3 steps for identifying whether your rules are the cause of the observed poor throughput.

  • Use the failover bypass feature to bypass the appliance by "Closing" the bypass ports, which takes your ruleset out of the equation. Make sure traffic flows normally, then re-enable your rules by "Opening" the bypass ports.
  • Disable your rules by issuing the command bwmgr stop, and see if traffic flows normally. Re-enable your rules by running sh /etc/rc.bwmgr
  • Check your rules for problems. A mistake or misunderstanding about the scope of a rule can affect all traffic going through the bridge - for example, neglecting to add the "IP Address" field when adding a bandwidth limit will affect all traffic that hasn't matched a prior rule.

Loop Messages on console and/or in /var/log/messages

A bridge configuration depends on each MAC address on your network being accessible via only one port of the bridge. A loop occurs when any MAC address can be reached on both sides of the bridge. This is not necessarily a problem if you get one or two isolated messages - especially during testing when you may be moving machines around or plugging them into different ports. If you see a screen full of these messages, this means that two or more bridged ports on the appliance are plugged into the same switch or hub. Specifically, the message tells you that the MAC address was received on both of the listed interfaces. Constant looping can either halt your system or make it extremely slow, and must be resolved. It indicates a serious flaw in your network setup.

Make sure your system is set up as described in the Quick-Start Guide.

ET/BWMGR is Limiting Too Much

Adjust your Shaping Settings

Some causes of slower than expected connections:

  • Packet Loss: Check your interface for errors and check for drops on any rule(s) that would match the traffic in question.
  • TCP Window Settings: A combination of your bandwidth setting and the default TCP Window settings will slow down an individual session more than desired in many cases. Try using different settings for tcpwindow to keep the window from being set too low. Try 5000 to start. A setting of 64000 effectively disables window shaping. (There's also a setting in v5 to disable shaping for a rule)

If you have a very high settings, it's possible that the server your testing with (or some intermediate network) doesn't want to give you that much. Shaping isn't designed to work with 1 connection. It's designed to manage many connections, so 1 connection may not be able to get the full amount with normal settings.

Check your Interrupts Per Second

For acceptable latency, you shouldn't have your max_ints setting less than 6000. If your system is loaded, you can see your interrupts in action using:

# systat -vmstat 1

This will show you a lot of statistics with interrupts shown on the right side of the page. If your system isn't under significant load, the numbers won't be significant or useful for tuning purposes. If you think that you need to increase your interrupts, contact ET Support as the procedure is different for different network adapters.

Graph Problems

If you see a broken icon instead of your graph, you can right-click on the icon and "open image in new window". This should print out any error messages. Usually the error is fairly self-explanatory. If you see an error message but don't know how to go about fixing the underlying issue, send the problem report along with the exact error message(s) to our support staff using the ticket system.

If the graphs load properly, but do not show any data points, then the data is not being stored. If this is the case, the first thing to do is check the log file for bwmgrd, the program responsible for storing the stats in the databases. From the console:

# tail /var/log/bwmgrd.log

will show the last 25 lines in the log. Each time that bwmgrd is run, it will print an entry to this log. If the entry reads something other than "Running", that is noteworthy.

If you have no data in your graphs

Check Permissions for /usr/local/etc/bwmgr/graphs

You can fix permissions with:

chmod 666 /usr/local/etc/bwmgr/graphs/*

they should also be owned by daemon

chown daemon /usr/local/etc/bwmgr/graphs/*

General Troubleshooting

Disaster Recovery

This section deals with a situation wherein your appliance does not boot, either due to a crash that fsck (the UNIX "chkdsk" or "scandisk" equivalent) cannot deal with gracefully, or a panic during the boot process. In either case, you can either use the USB Demo to fix the problem, or take manual control of the appliance at boot time.

If you do not have a USB boot image, then you must follow the step-by-step instructions below. If you do have a USB Demo or backup, boot from that, and you can check & mount your appliance disk using the following command:

#diskutil mount ada0

If the appliance was panicking at boot, this will enable you to make the necessary changes, since the appliance filesystem will be accessible in the "/ada0" directory.

If you know exactly what is causing the problem, then you can take specific action to fix it. For example, If you suspect a BWMGR rule is causing problems, but don't know which one, then you can bypass starting the ET/BWMGR like this:

# cd /ada0/etc
# mv rc.bwmgr rc.bwmgr.sav
# halt

Remove the USB and then reboot.

If you are unsure of how to proceed, you can open a support ticket and describe the problem.

Manual Instructions:

Power-on the appliance, wait for the boot menu to appear, press "s" to select single-user mode, then hit ENTER to begin booting.

You will be prompted to enter the shell for root, if you are entering single-user mode. Simply hit "enter" to accept the default of /bin/sh. Now you should have a root prompt. First, you should run a fsck to check the drives:

# fsck -y

This command should take a few minutes to complete, at which time you can either continue the boot, or you can make appropriate changes to your startup files. If you need to make any changes, you must first enable read/write access to your filesystems:

# mount -a

If you know exactly what is causing the problem, then you can take specific action to fix it. If you suspect a BWMGR rule is causing problems, but don't know which one, then you can bypass starting the ET/BWMGR like this:

# mv /etc/rc.bwmgr /etc/rc.bwmgr.sav
# exit

The boot will now continue.

Comment Policy Add Comment

Next: ET/BWMGR v6 Appliance Manual