Important Notes

  1. The Oxide CLI, Go SDK, and Terraform Provider have been updated for API enhancements described under New Features. Please be sure to upgrade.

Installation

Oxide Computer Model 0 must be installed and configured under the guidance of Oxide technicians. The requirement may change in future releases.

Upgrade Compatibility

Upgrade from version 13 is supported. We recommend shutting down all running instances on the rack before software update commences. Any instances that aren’t stopped for software update are transitioned to the failed state when the control plane comes up. They can be configured to start automatically with auto-restart policy or they can be started manually by the user.

All existing setup and data (e.g., projects, users, instances) remain intact after the software update.

New Features

Anti-affinity groups

You can now use anti-affinity groups to spread instances across sleds to make applications more resilient to sled failure. Anti-affinity groups have a policy field (allow or fail) that determines whether instances are still allowed to start when affinity constraints cannot be satisfied, such as when all available sleds already contain an instance from the group. See Affinity and Anti-Affinity in the Instances guide to learn more.

API changes for anti-affinity groups include:

In this release we only support anti-affinity groups. We are evaluating use cases for affinity groups, which would place instances on the same sled.

Anti-affinity group

Instance metrics in the web console

CPU, disk, and network interface metrics are now available on the Metrics tab on the instance page. Users can adjust the date and time range and view the OxQL query powering each chart. Prior to this release, these metrics were recorded in the system but were only accessible to users with view permissions on the fleet, and only through the CLI. Now anyone with permission to view an instance can view its metrics as well.

Instance CPU utilization chart
Instance CPU utilization OxQL query

Web console

In addition to the work on affinity groups and instance metrics highlighted above, we increased the page size for list views to 50, fixed bugs, and polished the UI.

Full console changelog
  • Instance CPU, disk, and network metrics (#2654, #2761, #2762, #2773, #2779, #2801)

  • Manage anti-affinity groups (#2760, #2768, #2767, #2775, #2789, #2790, #2795, #2796)

  • Increase list page size to 50 (#2798)

  • Allow links in row actions and more actions dropdowns (#2751)

  • Fix error display bug in date range picker (#2780)

  • Better titles for silo and project access pages (#2793)

  • Page loading skeleton (#2754)

  • Fix button bg color on disabled combobox (#2791)

  • Improve tooltip contents and spacing (#2770, #2797)

  • Fix sidebar links active highlight on certain pages (#2748)

  • Organize instance details in card blocks (#2741, #2744)

  • Loading bar respects reduced motion (#2720)

  • Update properties table styling on detail pages (#2723)

  • Validate invalid characters in name fields rather than blocking them (#2710)

  • Make form field labels bright white (#2799)

Bug fixes and other enhancements

  • Disallow invalid transit_ips in network interface update API (omicron#7530)

  • Improve error handling when instance is unable to start or stop due to unresponsive disk or guest (propolis#841, omicron#4004)

  • Mitigate VM shutdown/auto-restart race conditions (omicron#7927)

  • Image creation failure could leave behind orphan volumes (omicron#7765)

  • Improve error handling for instance deletion (omicron#7556)

  • Reduce disk volume memory usage (crucible#1625)

  • Revise storage capacity and usage calculations to account for system overhead (omicron#4234)

  • Display transceiver status in wicket UI (omicron#7562)

  • Improve rack setup configuration validation and error handling (omicron#7653, omicron#7457)

  • Improve time synchronisation in the face of connectivity problems (omicron#7675)

  • Various sled expungement bug fixes and supporting tool improvements

Known Behavior and Limitations

End-user features

Feature AreaKnown Issue/LimitationIssue Number

Disk/image management

Disks in importing_from_bulk_writes state cannot be deleted directly. The procedure for unsticking a canceled disk import can be used as a workaround.

Disk/image management

Image upload sometimes stalls with HTTP/2 on Firefox.

Disk/image management

The ability to modify image metadata is not available at this time.

Instance orchestration

Instances fail to start when one of the switch zones is unavailable.

Instance orchestration

New instances cannot be created when the total number of NAT entries (private-to-external IP mappings) in the system exceeds 1024.

Instance orchestration

HTTP 400 error returned when creating an instance with null anti_affinity_groups. To avoid this error, set the value to an empty array or exclude the field from the request body.

Instance performance

The tsc clocksource is treated as unreliable by guest, resulting in its fallback to use substantially slower timestamp syscalls. A workaround for this issue can be found in the Troubleshooting Guide.

Instance performance

Linux guests unable to capture hardware events using perf record. A workaround for this issue can be found in the Troubleshooting Guide.

VPC internet gateway

Changing a silo’s default IP pool causes some instances to lose their outbound internet access. This is due to a mismatch between the pool containing the instances' external IP (which are allocated from the new default pool) and the pool attached to the system-created internet gateways (which are linked to the old pool during creation time). See the Troubleshooting Guide for some possible options for restoring instance outbound connectivity.

VPC routing

Subnet update clears custom router ID when the field is left out of the request body.

VPC routing

Network interface update clears transit ips when the field is left out of the request body.

-

Telemetry

VM instance memory utilization and VPC network/firewall metrics are unavailable at this time.

-

Operator features

Feature AreaKnown Issue/LimitationIssue Number

Silo management

The ability to modify silo and IDP metadata is not available at this time.

omicron#3400, omicron#3125

System management

Sled and physical storage availability real-time status are not available in the inventory UI and API yet.

omicron#2035

System management

The built-in test silo named "default-silo" has resource quotas and should be removed.

omicron#5731

System management

Operator-driven software update is currently unavailable. All updates need to be performed by Oxide technicians.

-

System management

Operator-driven instance migration across sleds is currently unavailable.

-

User management

Device tokens do not expire.

omicron#2302

User management

User offboarding from the rack is not supported at this time. Apart from updating the identity provider to remove obsolete users from the relevant groups, operators will need to remove any IAM roles granted directly to those users in silos and projects.

omicron#2587