Good news and not so good news.. Good news:
We have completed the required maintenance and can see all compute nodes connected in the new fabric. We are essentially ready to restore service.
Not so good news:
Upon restarting Lustre filesystem (/storage/scratch2), the last step to restoring service.. We noticed a communication error preventing the 2 metadata nodes from completing their simple redundancy confirmations. The error isn't related to the maintenance and is in fact almost trivial. However, it would mean in the event that one of the nodes failed, the other wouldn't automatically take over services. We have already contacted the vendor, DDN, for support. If no resolution is available by early tomorrow we will reassess coming online without.
In any event, this is the last update you will receive besides the announcement that Talon2 is back. As always, we appreciate your patience and patronage of UNT HPC services.