The goal of RUMT is to check the memory of a computer over a long period of time and almost-real load conditions without having to interrupt the services.
RUMT exploits the possibility of some Unix kernels to selectivly disable
some memory areas while still accessing them through the
/dev/mem device. The principle of RUMT is to write
pseudo-random data in these disabled memory areas, and later check them.
This principle and the original code for the deterministic pseudo-random
generator are from
This distribution contains another variant on the same theme: URUMT
allocates a large chunk of memory, locks it in memory using the
mlock(2) system call, and scans
/dev/mem to find
where in physical memory the allocated area is. Then it continuously runs
the same tests in that memory.
URUMT can not be used to test a particular area of memory: the kernel will give it whatever physical memory it feels like. But URUMT can be restarted now and then, hopefully getting different physical memory each time. This is perfect if you suspect you have bad bits, but do not know at all where they are. Once you have sighted the bad bits, you can use a plain RUMT to test more extensively the neighborhood.
The core of RUMT is
rumt_trymem, which accepts the following
-d device: use device instead of
-i input seed: check the memory areas according to input seed.
-o output seed: prepare the memory areas according to output seed.
The remaining arguments are memory areas, with one of the syntax
start-end (where end is excluded).
Values can be suffixed with
for kilobytes, megabytes or pages (usually 4kB, cf.
PAGE_SIZE). All values must be multiples of the page size.
The normal way to use RUMT is to call
some_seed on the disabled memory areas, wait some time and
rumt_trymem -i with the same seed on the same memory
areas. If nothing has changed,
rumt_trymem will be silent. If
something has changed, the detected bad bits will be printed as
AAAAAAAAAAAA the address,
b the bit, and
± the direction (
a bit that should be 0 and is 1,
rumt_trymem -o on memory areas
used by the system will likely cause crashes or data loss. Triple check your
rumt_daemon is a shell script that calls
rumt_trymem, keeps track of the seed, and keeps a nice table of
detected bad bits. It can be configured using the
auxiliary script. See the comments in this script for options.
The standar way to run
num_pages, where num_pages is the number of
pages to allocate and test. A typical value may be on eighth or one quarter
of your total physical memory.
urumt will print some diagnosis
and start testing. It records its results in a file called
urumt_stats whose size is eight bytes per page of physical
memory (not tested memory). If bad bits are found, a message will also be
urumt accepts the followihg options:
-s stats_file: selects the file where statistics are stored.
-m mem_device: selects an alternate path for a
-d delay: selects the time (in microseconds) between series of tests.
-D delay_modulus: selects the number of pages to test between each sleep. Thus, the total time to test all page will be approximatively num_pages×delay/delay_modulus (in microseconds). The default value for delay is 900 (which will probably be rounded up to one time slice) and 1 for delay_modulus.
-b max_bad_bits: if
urumtfinds more than max_bad_bits in one page at once, it will print a message and exit. The reason for that is that memory is normally not that bad, but the internal data structures can themselves land on bad bits and get corrupted; if that happen, you do not want your statistics ruined.
-S: diverts all messages to syslog; the facility is local0.
urumtwill not run any test, but print its statistics file; the first column is the page number (in hexadecimal), the second column is the total number of times that page has been tested, the third column is the total number of errors found in that page.
It should be ok to run two
urumt at the same time on the same
statistics file, since they will get distinct pages. If you restart
urumt to change the memory area being tested, it is probably a
good idea to start the new one before killing the first one, since it would
guarantee a totally new area (of course, if you're trying to check half your
memory at once, it will probably fail).
This is a perl script used by
rumt_daemon to beautify the list
of bad bits.
On Linux, for those who have more than 960MB of memory and so enabled the
/dev/mem gives access to
only the first 768MB. This kernel module creates
/dev/misc/highmem which does not have this limit.
Beware. This code has been tested on my box without crashing it. It has also
been posted on LKML, but got no answer. I do not know if it is compatible
CONFIG_HIGHMEM64G option. Use at your own risks.
Beware (bis). This code has not been ported nor tested with 2.6 kernels.
This perl script will parse the boot messages of a Linux kernel to guess its
command lines options and disabled memory areas. It prints one line with the
mem= arguments to the kernel, and one line with the disabled
areas in a format suitable for
You should double-check the former with your Grub/LILO/whatever
configuration and your
/proc/cmdline before using the
RUMT works for me at home with a 2.4.20 Linux kernel;
rumt_trymem and the shell scripts should be quite portable.
There is no installation procedure: is started from its compilation
directory; anyway, RUMT is not a program that one wants to use now and then.
I do not intend to make RUMT a well-packaged program: I will program it until I find my bad memory bits, and that's all. I give it to the comunity as is, and whoever wants to enhance it is welcome.