Power fault detection and management
The rectifiers of the Oxide rack are capable of detecting power faults and power supply changes in different parts of the system, proactively charging/discharging power supply rails, and transitioning between power states for safe rack shutdown and cold start. No custom configuration is required and no data loss as a result of power outage is expected.
The remote monitoring unit (RMU) in the Power Shelf Controller (PSC) is designed to monitor rack-level power consumption and PSU state of health and expose that information to the control plane. These capabilities will enable more advanced power management and monitoring features in the future releases of the Oxide rack.
Understanding status and fault LEDs
All LEDs in the system are monochrome LEDs with three possible modes:
| Mode | Description |
|---|---|
Solid On | device or component is functioning properly |
Solid Off | device is not present, incorrectly inserted, or so mechanically broken that it cannot function |
Blinking | device needs attention or it is being worked on |
A common cause for the “Solid Off” light is that hot-serviceable components such as server sleds, optical transceivers, and U.2/U.3 NVMe devices are not properly inserted and therefore cannot function.
Blinking is used as a combined service and fault indicator for signaling the device to be operated on. Here is an example of how the LED signal works in the case of a SSD replacement:
If the LED is “Solid On” afterwards, the device is operational and the service has been completed.
If the LED remains “Solid Off”, it means that the SSD fails to be completely inserted or it is so broken that it cannot be powered on or detected at all.
If a new fault arises, the LED will blink again.
Hardware replacement
Sleds
Sled removal
If Oxide Support have confirmed that a sled requires replacement, you can follow the steps in this FAQ to prepare the sled for removal. The process of expunging a sled is facilitated by Oxide support personnel to ensure:
The sled is no longer in use by the Oxide system.
All virtual disk data on the sled is fully replicated to maintain redundancy.
Oxide control plane services are rebalanced as necessary.
The process of replicating disk data may take up to several hours for very large virtual disks. During this time, disk read/write is unaffected for running instances attached to the virtual disks. Instances that are started from a stopped state will, however, wait for the replication to complete before booting up.
Sled addition
Sled addition is self-serviceable by rack operators. Use the steps below to add a new sled — whether replacing a removed sled or populating empty cubbies in a partially filled rack:
Insert the new sled securely into an empty rack cubby, following the instructions that come with the sled package. If you are adding multiple sleds, you may insert them all before proceeding to Step 2.
Execute this CLI command as a fleet admin user to see the new sled(s):
oxide system hardware sled list-uninitializedExample output[ { "baseboard": { "part": "913-0000019", "revision": 11, "serial": "BRM23230010" }, "cubby": 17, "rack_id": "3c71e552-55f8-4d91-baaa-8560593c932e" } ]Construct a JSON file with the list of sleds to add, in this format:
{ "sled_ids": [ { "part_number": "$PART", "serial_number": "$SERIAL" } ] }With the JSON processor
jq, you can create the input file withoxide system hardware sled list-uninitialized \
| jq '{sled_ids: [.[] | {part_number: .baseboard.part, serial_number: .baseboard.serial}]}' \
> /path/to/new-sleds.jsonAdd the new sleds as members of the rack with:
oxide system hardware rack membership add \ --rack-id $RACK_ID \ --json-body /path/to/new-sleds.json
Monitor the progress of sled member addition with
oxide system hardware rack membership status --rack-id $RACK_IDOnce a new sled has been successfully initialized, the membership status should change from
in_progresstocommitted.Confirm that the new sleds are in the web console or API sled list
oxide system hardware sled list
Sled storage devices
SSD removal
As with sled removal, the process of expunging an SSD is facilitated by Oxide support to ensure:
The SSD is no longer in use by the Oxide system.
All virtual disk data on the SSD is fully replicated to maintain redundancy.
Oxide control plane services are rebalanced as necessary.
The process of replicating disk data may take several hours for very large virtual disks. During this time, disk read/write is unaffected for running instances attached to the virtual disks. However, instances that are started from a stopped state will wait for the replication to complete before booting up.
SSD addition
SSD addition is self-serviceable by rack operators. If the SSD is a warranty replacement with a brand new serial number, the disk will be ready for use as soon as it is inserted in place and shows up in the list of disks in the web console or in the API:
oxide system hardware disk listIf the SSD is an expunged disk returned to service after remediation, you will need to follow the steps below to "re-adopt" it:
Confirm with Oxide Support that all existing data on the SSD has been wiped.
Execute this CLI command as a fleet admin user to list disks available for re-adoption:
oxide system hardware disk list-unadoptedExample output[ { "disk_id": { "model": "WUS4C6432DSP3X3", "serial": "A084A75A", "vendor": "1b96" }, "sled_id": "87c2c4fc-b0c7-4fef-a305-78f0ed265bbc", "slot": 6, "variant": "u2" } ]Enable disk adoption using the model, serial, and vendor from the previous step:
oxide system hardware disk enable-adoption \ --model $MODEL \ --serial $SERIAL \ --vendor $VENDOR
Confirm that the new disk shows up in the disk list in the web console or API.
Other components
All field-replaceable units (FRUs) are serviced through Oxide technical support engagement. Commonly-serviced units are designed to be hot-pluggable, meaning that they can be replaced without taking the encompassing unit offline.
Here is a list of FRUs serviceable by Oxide:
| Encompassing FRU | Component FRU | Hot-pluggable? |
|---|---|---|
Sled | - | Yes |
Sled | Sled U.2 or U.3 Drive/Carrier | Yes |
Sled | Sled Front Panel/Drive Cage | No |
Sled | Air Shroud | No |
Sled | DIMMs | No |
Sled | M.2 Device | No |
Sled | M.2 Heatsink | No |
Sled | CPU | No |
Sled | CPU Heatsink | No |
Sled | Sled Individual Fans | No |
Sled | Sled Storage Midplane (“Sharkfin”) | No |
Sidecar | - | Yes |
Sidecar | Sidecar Optical Transceiver (Single) | Yes |
Sidecar | Sidecar Rear Fan | Yes |
Sidecar | 4:1 Squid Cables | Yes |
Sidecar | Sidecar to Sled PCIe Cable | Yes |
Sidecar | PSC to Sidecar Cables | Yes |
Sidecar | Sidecar to Sidecar Aux Cable | Yes |
Sidecar | Sidecar Internal Cables | No |
Power Shelf | - | No |
Power Shelf | Power Shelf Rectifier | Yes |
Power Shelf | Power Shelf Controller | Yes |
Power Shelf | Fiber Optic Cables | Yes |
Power Shelf | Power Shelf Adapter Kit | No |
Power Shelf | Power Shelf Bus Bar Connector | No |
Power Shelf | Power Shelf Whip Adapters | No |
Power Shelf | Backplane 4:1 Squid Cables | No |
Power Shelf | Sidecar to Sled PCIe Cable | No |
Power Shelf | Core Rack | No |
Power Shelf | Bus Bar | No |
Power Shelf | Front and Rear Doors | Yes |
Power Shelf | Side Panels | Yes |
Power Shelf | Sled Cubby | No |
Power Shelf | Sled Blank | Yes |
Seismic Kit | - | Yes |