Changes to this message: Further delay Update (Oct 31 17:00): Unfortunately a further delay in restarting operation is unavoidable. We expect to restart batch processing later in the evening of Oct 31, but full user operation can only be restarted on Friday due to the bank holiday on Nov. 1. ----------------------------------------------------------------------- Update (Oct 30 16:45): Based on a strong recommendation from Mellanox to install a firmware update on the interconnect switches, we have decided to postpone user operation by one day (see the new target date below) rather than schedule a new maintenance. This update should give a further improvement in signal quality, thereby hopefully reducing the intermittent occurrence of shaky nodes which have caused job failures. ----------------------------------------------------------------------- Between October 29, 8:00 and October 31, 18:00 the system will be unavailable for user operation. During this maintenance configuration changes and software updates will be installed which should improve stability of both Intel and IBM MPI operation. The following items are of particular interest: * IBM PE 1.2 PTF9. This update fixes a memory leak in the IBM MPI library as well as some problems with MPI-IO. Please remove any MP_MPILIB=pempi and/or MP_SHARED_MEMORY=no workaround settings from your LoadLeveler scripts, recompile, relink and rerun your application and check whether the problems are solved. Please provide feedback if this is not the case. * An update for the GPFS file system (although it is not expected that this update will solve all outstanding I/O problems). * Intel compilers (from 12.1.5 to 12.1.7 for the default release, and 13.0 Update is available as a non-default release) * Intel MKL update (from 10.3u9 to 10.3u13) This information is also available on our web server http://www.lrz-muenchen.de/services/compute/supermuc/aktuell/ali4438/ Reinhold Bader