Initial DistOS project constitution and council design briefs
12 council design briefs for distributed OS specification project targeting 1024-node Hopper/Grace/Blackwell GPU cluster with Weka parallel filesystem. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
514
councils/08-api-surface.md
Normal file
514
councils/08-api-surface.md
Normal file
@@ -0,0 +1,514 @@
|
||||
# Council Design Brief: API Surface and Developer Experience
|
||||
|
||||
**Council ID:** `council-api`
|
||||
**Mission:** Define the complete, coherent, and ergonomic interface between DistOS and its users — operators, application developers, and other systems. This council decides what the operating system looks like from the outside: system calls, SDK bindings, CLI tools, and the conventions that make all of the above consistent and maintainable across language boundaries and API versions.
|
||||
**UCXL Base Address:** `ucxl://council-api:*@DistOS:api/*`
|
||||
**Agent Count:** ~40
|
||||
**Status:** Design Brief — Constitution Phase
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope and Responsibilities
|
||||
|
||||
`council-api` owns the external interface contract of DistOS. Its scope covers:
|
||||
|
||||
- Deciding the overall API philosophy: POSIX-compatible extension, clean-slate design, or a layered model that offers both
|
||||
- Defining GPU-native system calls for kernel launch, memory allocation, device-to-device transfers, stream and graph management, and event synchronisation
|
||||
- Defining distributed system calls: remote procedure invocation (covering both synchronous RPC and async futures), distributed lock acquisition and release, barriers, and collective operations across node groups
|
||||
- Designing an async-first API surface that aligns with modern language runtimes (Rust `async`/`await`, Go goroutines, Python `asyncio`)
|
||||
- Establishing error handling conventions, including integration with UCXL response codes for errors that carry provenance (which node, which operation, at what logical time)
|
||||
- Designing the SDK for four target languages: C (ABI-stable systems interface), Rust (idiomatic, zero-cost), Go (ergonomic, channel-friendly), and Python (user-friendly, numpy-compatible)
|
||||
- Designing CLI tooling for cluster management: node status, job submission, resource inspection, log retrieval, and administrative operations
|
||||
- Defining the API versioning and evolution strategy: how new calls are introduced, how deprecated calls are retired, compatibility guarantees across minor and major versions
|
||||
- Producing API reference documentation that is precise enough to serve as a normative source alongside the formal spec
|
||||
- Specifying example applications that exercise non-trivial API paths and serve as integration test targets
|
||||
|
||||
Responsibilities this council does **not** own: kernel implementation (owned by subsystem councils); formal verification of API contracts (owned by `council-verify`); security policy enforcement (owned by `council-sec`, though `council-api` designs the authentication and authorisation API surface in coordination with it); monitoring and metering calls (owned by `council-telemetry`, though `council-api` exposes the SDK surface for those).
|
||||
|
||||
---
|
||||
|
||||
## 2. Research Domains
|
||||
|
||||
### 2.1 POSIX Compatibility vs. Clean-Slate Design
|
||||
|
||||
POSIX (IEEE 1003.1) defines the canonical Unix system call interface. Its strengths are: near-universal language runtime support, a mature ecosystem of tools, and decades of developer familiarity. Its weaknesses in a GPU-cluster OS context are: blocking I/O semantics that assume CPU-thread models, file-descriptor-centric resource management ill-suited to GPU memory objects, and no native concept of distributed operations or remote memory.
|
||||
|
||||
Two design philosophies must be fully researched before the council can decide:
|
||||
|
||||
- **POSIX-compatible extension:** Retain the full POSIX interface and extend it with GPU and distributed primitives as optional add-ons. Applications written for Linux run unmodified; GPU-aware applications opt into extensions. This is the approach taken by CUDA (which layers a driver API on top of the OS) and by ROCm/HIP.
|
||||
- **Clean-slate design:** Design an interface optimal for the DistOS hardware target without backward-compatibility constraints. This allows stronger type safety, async-native semantics, and a capability-based resource model from the first call. Plan 9 (Pike et al.) and Fuchsia (Zircon) are the primary existence proofs.
|
||||
- **Layered model:** Provide a clean-slate primary API and a POSIX compatibility layer implemented on top of it. This is the architectural recommendation for evaluation. The compatibility layer has a defined cost budget.
|
||||
|
||||
Key references:
|
||||
- The Open Group. *The Single UNIX Specification (SUSv4/POSIX.1-2017)*. The normative POSIX reference.
|
||||
- Pike, R. et al. "Plan 9 from Bell Labs." *USENIX Summer 1990 Technical Conference*. Plan 9's contribution is the 9P protocol: everything is a file, including processes and network connections. The simplicity of the resource model is instructive even if DistOS does not adopt 9P verbatim.
|
||||
- Pike, R. "The Use of Name Spaces in Plan 9." *EUUG Newsletter* 12(1), 1992.
|
||||
- Google. *Fuchsia OS: Zircon Kernel Objects*. https://fuchsia.dev/fuchsia-src/concepts/kernel. Zircon uses a capability-based object system with handles as the only way to reference kernel objects. This is the most complete modern clean-slate OS design and must be studied in depth.
|
||||
|
||||
### 2.2 GPU-Native System Calls
|
||||
|
||||
The CUDA Driver API provides the lowest-level GPU control surface available: `cuInit`, `cuDeviceGet`, `cuCtxCreate`, `cuMemAlloc`, `cuLaunchKernel`, `cuEventRecord`, `cuStreamWaitEvent`. It is the reference for what a GPU system call interface must cover.
|
||||
|
||||
Agents must evaluate the tradeoffs between:
|
||||
- **Driver-level API** (CUDA Driver API / ROCm HIP Low-Level): explicit context management, explicit stream management, maximum control, verbose
|
||||
- **Runtime API** (CUDA Runtime / ROCm): implicit context, automatic stream assignment, less control, more ergonomic
|
||||
- **Graph-based execution** (CUDA Graphs / HIP Graphs): capture a sequence of operations as a graph for repeated execution with lower launch overhead. Critical for the 1024-node deployment where kernel launch overhead accumulates.
|
||||
|
||||
Key references:
|
||||
- NVIDIA. *CUDA Driver API Reference Manual*. https://docs.nvidia.com/cuda/cuda-driver-api/. Normative reference for GPU system call semantics.
|
||||
- NVIDIA. *CUDA C Programming Guide* (Chapter 3: Programming Interface). Covers the Runtime API and its relationship to the Driver API.
|
||||
- NVIDIA. *CUDA Graphs* documentation. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs. The graph execution model is essential for understanding low-latency repeated workloads on Hopper and Blackwell.
|
||||
- Khronos Group. *OpenCL 3.0 Specification*. https://www.khronos.org/opencl/. The vendor-neutral GPU programming API. DistOS must decide whether to support OpenCL alongside CUDA semantics.
|
||||
- Khronos Group. *SYCL 2020 Specification*. https://www.khronos.org/sycl/. SYCL provides a C++ abstraction over OpenCL and oneAPI targets. Intel's oneAPI unifies GPU programming across vendors and is a candidate for the DistOS higher-level SDK layer.
|
||||
- Intel. *oneAPI Programming Guide*. https://www.intel.com/content/www/us/en/developer/tools/oneapi/programming-guide.html.
|
||||
- NVIDIA. *NVLink and NVSwitch Architecture Overview*. https://www.nvidia.com/en-us/data-center/nvlink/. GPU-to-GPU direct access semantics affect memory system call design.
|
||||
|
||||
Blackwell-specific: The GB200 NVL72 introduces NVLink Switch System connecting 72 GPUs in a single flat memory domain. System calls for `cuMemAdvise` and `cuMemPrefetchAsync` take on new semantics in this topology. Agents must review:
|
||||
- NVIDIA. *NVIDIA Blackwell Architecture Technical Brief*. 2024.
|
||||
|
||||
### 2.3 Distributed System Calls
|
||||
|
||||
System calls that span nodes are novel: POSIX has no notion of them. The design space covers:
|
||||
|
||||
- **Remote procedure invocation:** How does a process on node A invoke a procedure on node B? Synchronous blocking (simple, latency-bound), asynchronous with futures (complex, scalable), or continuation-passing. gRPC is the de facto standard for service-to-service RPC in the cloud but carries HTTP/2 overhead.
|
||||
- **Distributed locks:** Lease-based locks (Chubby/Zookeeper model), RDMA-based compare-and-swap (best latency), or consensus-based locks for strong guarantees. Each has different failure semantics.
|
||||
- **Barriers:** Collective synchronisation across node groups. MPI_Barrier semantics are well understood; the question is how to expose this in a general-purpose OS API.
|
||||
- **Collective operations:** AllReduce, AllGather, Broadcast, Reduce-Scatter. These are first-class operations for distributed ML workloads (the dominant use case on a 1024-node GPU cluster) and must be surfaced as OS-level calls, not just library calls, so the OS can optimise placement and routing.
|
||||
|
||||
Key references:
|
||||
- Birrell, A. and Nelson, B. "Implementing Remote Procedure Calls." *ACM Transactions on Computer Systems* 2(1), 1984. The foundational RPC paper.
|
||||
- Google. *gRPC*. https://grpc.io/. The current industry standard for typed RPC. Protocol Buffers schema evolution strategy is directly applicable to DistOS API versioning.
|
||||
- Google. *Chubby: A Lock Service for Loosely-Coupled Distributed Systems*. Burrows, M. OSDI 2006.
|
||||
- Hunt, P. et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." *USENIX ATC 2010*.
|
||||
- Message Passing Interface Forum. *MPI: A Message-Passing Interface Standard, Version 4.1*. 2023. The collective operations specification is normative for `council-api`'s collective call design.
|
||||
- Mellanox/NVIDIA. *RDMA Programming Guide*. InfiniBand verbs API (ibv_post_send, ibv_post_recv, ibv_create_qp) provides the lowest-latency distributed memory access primitives available on the target cluster.
|
||||
|
||||
### 2.4 Async-First API Design
|
||||
|
||||
A GPU cluster OS serving AI workloads will have I/O patterns dominated by deep asynchrony: thousands of in-flight kernel launches, streaming data from Weka FS, collective comms across 1024 nodes. A synchronous API is a fundamental design mistake. Agents must research:
|
||||
|
||||
- **Rust async/await:** The Rust async model (futures, the `Poll` trait, the executor model) provides zero-cost abstraction over async I/O. The `tokio` runtime is the dominant executor. The DistOS Rust SDK must integrate naturally with tokio.
|
||||
- **io_uring (Linux 5.1+):** The io_uring interface provides a shared ring-buffer interface between kernel and userspace that eliminates syscall overhead for I/O. Its submission/completion queue model is the reference for how DistOS should design its own async system call interface.
|
||||
- **Go channels and goroutines:** Go's concurrency model maps well to distributed operations. The DistOS Go SDK must express distributed calls as channels or via the `context.Context` cancellation pattern.
|
||||
- **Python asyncio:** The Python SDK must be usable from `async def` coroutines. NumPy compatibility for GPU tensor operations should be considered (compatibility with the Numba/CuPy interface).
|
||||
|
||||
Key references:
|
||||
- Axboe, J. *io_uring and the new Linux async I/O API*. https://kernel.dk/io_uring.pdf. 2019. This paper is essential for understanding the state of the art in async syscall design.
|
||||
- The Rust Async Book. https://rust-lang.github.io/async-book/. Normative reference for Rust async design patterns.
|
||||
- Grigorik, I. *High Performance Browser Networking* (Chapter 2 on event loop and async I/O patterns). 2013. O'Reilly. Useful background on event-driven I/O design.
|
||||
|
||||
### 2.5 Error Handling Conventions
|
||||
|
||||
A cluster OS at this scale will produce a high volume of partial failures: a node goes dark, a GPU kernel faults, a network partition isolates a subsystem. The error handling convention must be:
|
||||
|
||||
- **Structured:** Every error carries a type, a severity, a source identifier (node, subsystem, call), and a correlation ID that links it to a UCXL-addressed event in the distributed log.
|
||||
- **Actionable:** The API must distinguish between errors that the caller should retry (transient), errors that require intervention (permanent), and errors that indicate a usage mistake (programmer error).
|
||||
- **Traceable:** Error correlation IDs must be UCXL-compatible so that an error returned to a Python application can be resolved to the full distributed event chain using the UCXL resolver.
|
||||
|
||||
Key references:
|
||||
- Google. *Google Cloud API Design Guide: Errors*. https://cloud.google.com/apis/design/errors. The most systematic public treatment of structured API error design. The canonical status codes (OK, INVALID_ARGUMENT, NOT_FOUND, UNAVAILABLE, etc.) should be adopted or adapted.
|
||||
- Klabnik, S. and Nichols, C. *The Rust Programming Language* (Chapter 9: Error Handling). The Rust approach to `Result<T, E>` and the `?` operator represents the state of the art for recoverable errors in a systems language.
|
||||
- Syme, D. et al. "Exceptional Syntactic Support for Error Handling in F#." *Haskell Symposium 2020*. Relevant to the higher-level SDK error design.
|
||||
|
||||
The UCXL response code integration specifically means that API error structs carry a `ucxl_trace` field containing the UCXL address of the distributed event that caused the failure:
|
||||
|
||||
```
|
||||
error.ucxl_trace = "ucxl://council-fault:monitor@DistOS:fault-tolerance/^^/events/node-042-timeout-2026-03-01T14:22:00Z"
|
||||
```
|
||||
|
||||
### 2.6 SDK Design for Multiple Languages
|
||||
|
||||
The SDK must present a coherent surface across four languages with different idioms. The design principles are:
|
||||
|
||||
- **C ABI as the foundation:** The canonical system call interface is a C ABI. All other language SDKs are generated or hand-written wrappers over the C ABI. This ensures ABI stability and FFI compatibility with every language.
|
||||
- **Rust SDK:** Idiomatic, zero-cost wrappers. Use Rust's ownership system to enforce resource lifetimes at compile time (e.g., a `GpuBuffer<T>` type that is `Send` but not `Sync`, reflecting GPU buffer ownership semantics). The Rust SDK should use `#[repr(C)]` structs for ABI compatibility.
|
||||
- **Go SDK:** Ergonomic wrappers using `cgo` for the C ABI. Expose distributed operations as channel-returning functions. Context-aware: all calls accept `context.Context` for cancellation and timeout propagation.
|
||||
- **Python SDK:** High-level, NumPy-compatible. Consider auto-generating stub code from a schema. Must be `asyncio`-compatible. Integrate with the Python type system via `Protocol` and `TypedDict`.
|
||||
|
||||
Key references:
|
||||
- Klabnik, S. and Nichols, C. *The Rust Programming Language*. https://doc.rust-lang.org/book/. Idiomatic Rust patterns.
|
||||
- Go Authors. *Effective Go*. https://go.dev/doc/effective_go. Idiomatic Go patterns.
|
||||
- Google. *Google Cloud API Design Guide*. https://cloud.google.com/apis/design. The most comprehensive public API design guide, covering resource-oriented design, standard methods, naming conventions, and backwards compatibility.
|
||||
- Smith, P. *Designing for Compatibility in Evolving APIs*. IEEE Software 39(4), 2022.
|
||||
|
||||
### 2.7 CLI Tooling Design
|
||||
|
||||
The cluster management CLI (`distos-ctl` or equivalent) must follow modern CLI design principles:
|
||||
|
||||
- Machine-readable output (JSON/YAML with `--output json`) for scripting
|
||||
- Structured logging with log levels
|
||||
- Human-readable default output with colour and progress indicators
|
||||
- Completion generation for bash/zsh/fish
|
||||
- Subcommand structure: `node`, `job`, `gpu`, `net`, `storage`, `secret`, `log`
|
||||
|
||||
Key references:
|
||||
- Sigurdsson, A. et al. *Command Line Interface Guidelines*. https://clig.dev/. The community-written standard for modern CLI design. Should be treated as the style guide for `distos-ctl`.
|
||||
- Hashicorp. *Vault CLI design*. The Vault CLI is an exemplar of a well-structured cluster management tool with consistent subcommand and flag conventions.
|
||||
- Kubernetes. `kubectl` source and documentation. The de facto standard for distributed cluster management CLIs. The DistOS CLI should match `kubectl` conventions where applicable to reduce cognitive load.
|
||||
|
||||
### 2.8 API Versioning and Evolution Strategy
|
||||
|
||||
A system call interface must be stable. The versioning strategy must address:
|
||||
|
||||
- **Compatibility guarantees:** What changes are backwards-compatible (adding optional parameters, adding new calls) vs. breaking (changing parameter semantics, removing calls)?
|
||||
- **Deprecation lifecycle:** Minimum deprecation notice period, deprecation markers in the SDK, removal schedule.
|
||||
- **Version negotiation:** How does a client indicate the API version it was compiled against? How does the kernel report available versions?
|
||||
- **Experimental APIs:** A clearly marked experimental tier for new calls before they enter the stable surface.
|
||||
|
||||
Key references:
|
||||
- Google. *Google Cloud API Versioning*. https://cloud.google.com/apis/design/versioning. URL-based versioning for REST APIs; the principles apply to system call versioning.
|
||||
- Klabnik, S. "Stability as a Deliverable." https://blog.rust-lang.org/2014/10/30/Stability.html. Rust's stability commitment is a model for how a systems project can make and keep compatibility promises.
|
||||
- Semantic Versioning Specification. https://semver.org/. The DistOS SDK and ABI will follow SemVer 2.0.
|
||||
|
||||
### 2.9 Plan 9 and Fuchsia Zircon Deep Dive
|
||||
|
||||
These two systems represent the clearest non-POSIX OS API designs and must be studied in depth:
|
||||
|
||||
- **Plan 9:** The 9P protocol represents all system resources as files served over a file system protocol. Network connections, processes, and graphics are files. The simplicity is extreme. The DistOS clean-slate layer need not adopt 9P but should understand its design philosophy.
|
||||
- Pike, R. et al. "The Use of Name Spaces in Plan 9." *EUUG Newsletter* 12(1), 1992.
|
||||
- Dorward, S. et al. "The Inferno Operating System." *Bell Labs Technical Journal* 2(1), 1997.
|
||||
- **Fuchsia / Zircon:** Zircon is a microkernel with capabilities as the security primitive. Every kernel resource is a `zx_handle_t`. Handles are passed between processes explicitly; there is no global namespace for kernel objects. This is the preferred model for DistOS's capability integration with `council-sec`.
|
||||
- Google. *Zircon Kernel Concepts*. https://fuchsia.dev/fuchsia-src/concepts/kernel/concepts.
|
||||
- Google. *Zircon Syscall Reference*. https://fuchsia.dev/fuchsia-src/reference/syscalls.
|
||||
|
||||
---
|
||||
|
||||
## 3. Agent Roles
|
||||
|
||||
| Role | Count | Responsibilities |
|
||||
|------|-------|-----------------|
|
||||
| Lead API Architect | 1 | Decides overall API philosophy; coordinates with all subsystem councils; owns the master API specification document; resolves conflicts between API and subsystem requirements |
|
||||
| POSIX Compatibility Analysts | 4 | Audit which POSIX calls must be retained; design the compatibility shim layer; document compatibility coverage gaps |
|
||||
| GPU Syscall Designers | 6 | Design GPU-native system calls for kernel launch, memory, streams, events, graphs; ensure Hopper/Blackwell/Grace specifics are covered |
|
||||
| Distributed Syscall Designers | 5 | Design RPC, distributed lock, barrier, and collective operation system calls; consult MPI and RDMA references |
|
||||
| SDK Designers | 8 | Design language-specific SDKs: 2 per language (C, Rust, Go, Python); responsible for ergonomics, idiom conformance, and ABI stability |
|
||||
| Async API Specialists | 4 | Design the async call model; specify io_uring-style ring buffer interface; ensure Rust/Go/Python async integration |
|
||||
| CLI Designers | 3 | Design `distos-ctl` command structure, output formats, and completions |
|
||||
| Error Handling Architects | 3 | Design structured error types, UCXL trace integration, and error propagation conventions across all SDK layers |
|
||||
| API Versioning Strategists | 2 | Develop the versioning policy, deprecation lifecycle, compatibility matrix, and experimental API tier |
|
||||
| Developer Experience Reviewers | 4 | Evaluate API usability; write developer-facing documentation and example applications; run internal "dogfooding" walkthroughs |
|
||||
|
||||
**Total:** 40 agents
|
||||
|
||||
---
|
||||
|
||||
## 4. Key Deliverables
|
||||
|
||||
All artifacts use the pattern `ucxl://council-api:{role}@DistOS:api/^^/{artifact-type}/{name}`.
|
||||
|
||||
### 4.1 Master API Philosophy Decision Record
|
||||
|
||||
```
|
||||
ucxl://council-api:lead-api-architect@DistOS:api/^^/decisions/dr-api-01-philosophy.md
|
||||
```
|
||||
|
||||
Covers the layered model decision: clean-slate primary API, POSIX compatibility shim, and the cost budget for the shim.
|
||||
|
||||
### 4.2 GPU System Call Specification
|
||||
|
||||
```
|
||||
ucxl://council-api:gpu-syscall-designer@DistOS:api/^^/specs/gpu-syscalls.md
|
||||
```
|
||||
|
||||
Full specification of all GPU-native system calls with parameter types, semantics, error codes, and Hopper/Blackwell/Grace specifics.
|
||||
|
||||
### 4.3 Distributed System Call Specification
|
||||
|
||||
```
|
||||
ucxl://council-api:distributed-syscall-designer@DistOS:api/^^/specs/distributed-syscalls.md
|
||||
```
|
||||
|
||||
### 4.4 Async Call Interface Specification
|
||||
|
||||
```
|
||||
ucxl://council-api:async-api-specialist@DistOS:api/^^/specs/async-interface.md
|
||||
```
|
||||
|
||||
Documents the submission/completion ring model, back-pressure semantics, and language runtime integration.
|
||||
|
||||
### 4.5 C ABI Reference
|
||||
|
||||
```
|
||||
ucxl://council-api:sdk-designer@DistOS:api/^^/specs/c-abi-reference.h
|
||||
```
|
||||
|
||||
The normative C header file. All other SDKs are derived from this.
|
||||
|
||||
### 4.6 Language SDK Specifications
|
||||
|
||||
```
|
||||
ucxl://council-api:sdk-designer@DistOS:api/^^/specs/sdk-rust.md
|
||||
ucxl://council-api:sdk-designer@DistOS:api/^^/specs/sdk-go.md
|
||||
ucxl://council-api:sdk-designer@DistOS:api/^^/specs/sdk-python.md
|
||||
```
|
||||
|
||||
### 4.7 Error Type Catalogue
|
||||
|
||||
```
|
||||
ucxl://council-api:error-handling-architect@DistOS:api/^^/specs/error-catalogue.md
|
||||
```
|
||||
|
||||
All structured error types with UCXL trace integration, severity levels, and retry guidance.
|
||||
|
||||
### 4.8 CLI Specification
|
||||
|
||||
```
|
||||
ucxl://council-api:cli-designer@DistOS:api/^^/specs/distos-ctl-spec.md
|
||||
```
|
||||
|
||||
Full command reference including all subcommands, flags, output formats, and completion scripts.
|
||||
|
||||
### 4.9 API Versioning Policy
|
||||
|
||||
```
|
||||
ucxl://council-api:api-versioning-strategist@DistOS:api/^^/policies/versioning-policy.md
|
||||
```
|
||||
|
||||
### 4.10 POSIX Compatibility Coverage Matrix
|
||||
|
||||
```
|
||||
ucxl://council-api:posix-compatibility-analyst@DistOS:api/^^/specs/posix-compatibility-matrix.md
|
||||
```
|
||||
|
||||
Tabulates every POSIX call: supported natively, supported via shim, not supported (with rationale).
|
||||
|
||||
### 4.11 Example Applications
|
||||
|
||||
```
|
||||
ucxl://council-api:developer-experience-reviewer@DistOS:api/^^/examples/hello-distributed-gpu.md
|
||||
ucxl://council-api:developer-experience-reviewer@DistOS:api/^^/examples/allreduce-collective.md
|
||||
ucxl://council-api:developer-experience-reviewer@DistOS:api/^^/examples/weka-fs-streaming-io.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Decision Points
|
||||
|
||||
All DRs use the address pattern `ucxl://council-api:lead-api-architect@DistOS:api/^^/decisions/{dr-id}.md`.
|
||||
|
||||
### DP-A01: POSIX vs. Clean-Slate vs. Layered
|
||||
|
||||
The foundational design philosophy choice. The default recommendation is the layered model, but this must be validated against: the cost of maintaining the shim layer, the risk of semantic leakage from POSIX into the clean-slate layer, and the developer familiarity benefit.
|
||||
**Deciding parties:** Lead API Architect, POSIX Compatibility Analysts, `council-synth`
|
||||
|
||||
### DP-A02: Async System Call Mechanism
|
||||
|
||||
Choose between: io_uring-inspired ring buffer (lowest overhead, Linux precedent), a POSIX-extended `aio_*` interface (familiarity, limited expressiveness), or a fully custom completion port model. This decision is tightly coupled to the `council-mem` memory model (the ring buffer requires shared memory between kernel and userspace).
|
||||
**Deciding parties:** Async API Specialists, `council-mem`, `council-verify` (for ABI safety check)
|
||||
|
||||
### DP-A03: GPU Memory API at the Syscall Layer vs. Library Layer
|
||||
|
||||
Should GPU memory allocation (`cuMemAlloc` equivalent) be a kernel-mediated system call (allowing the OS to account for and schedule GPU memory as a first-class resource) or a library call that bypasses the kernel after initial device setup? Kernel mediation adds latency; bypass reduces accounting fidelity.
|
||||
**Deciding parties:** GPU Syscall Designers, `council-mem`, `council-telemetry`
|
||||
|
||||
### DP-A04: RPC Mechanism for Distributed System Calls
|
||||
|
||||
Choose the wire protocol for remote procedure calls: gRPC (typed, HTTP/2, mature), a custom binary protocol over RDMA (lowest latency, highest implementation cost), or a two-tier model (gRPC for control plane, RDMA for data plane). The choice directly affects the latency budget for distributed system calls.
|
||||
**Deciding parties:** Distributed Syscall Designers, `council-net`
|
||||
|
||||
### DP-A05: SDK Code Generation vs. Hand-Written Wrappers
|
||||
|
||||
Decide whether to generate the Rust, Go, and Python SDKs from a schema definition (IDL, such as Protocol Buffers or a custom DSL) or maintain hand-written wrappers. Generated code is more consistent; hand-written code can be more idiomatic. A hybrid (generate the boilerplate, hand-write ergonomic wrappers) is the likely outcome.
|
||||
**Deciding parties:** SDK Designers, API Versioning Strategists
|
||||
|
||||
### DP-A06: Authentication and Authorisation API
|
||||
|
||||
How does a process prove its identity to the kernel and acquire capabilities? Options: token-based (JWT or similar), capability handles (Zircon model), certificate-based (X.509 with a cluster CA), or UCXL-scoped credentials. This decision must be made jointly with `council-sec`.
|
||||
**Deciding parties:** Lead API Architect, `council-sec`
|
||||
|
||||
---
|
||||
|
||||
## 6. Dependencies on Other Councils
|
||||
|
||||
`council-api` is the integrating council: every subsystem council produces functionality, and `council-api` exposes that functionality through a coherent surface. It is therefore a downstream consumer of requirements from all councils and an upstream provider to `council-docs` and `council-verify`.
|
||||
|
||||
| Council | Relationship | What council-api consumes | What council-api produces |
|
||||
|---------|-------------|--------------------------|--------------------------|
|
||||
| `council-sched` | Consuming requirements | Job submission semantics, priority model, queue management APIs | Scheduler-facing system calls in API spec |
|
||||
| `council-mem` | Bidirectional | Memory model, allocation semantics, consistency guarantees | Memory system call specs; async memory API |
|
||||
| `council-net` | Bidirectional | Network abstraction primitives, RDMA capabilities | Network system calls; distributed RPC wire protocol choice |
|
||||
| `council-fault` | Consuming requirements | Failure notification model, recovery primitives | Fault-tolerance-related error codes; node failure event API |
|
||||
| `council-sec` | Bidirectional | Capability model, identity primitives, isolation guarantees | Authentication/authorisation API surface; capability handle design |
|
||||
| `council-telemetry` | Consuming requirements | Metering call semantics, SLO query interface | Telemetry-facing SDK surface; metering call specs |
|
||||
| `council-verify` | Providing for verification | N/A | API interface contracts for formal verification |
|
||||
| `council-qa` | Providing for test design | N/A | API spec enables QA to design conformance tests |
|
||||
| `council-synth` | Receiving directives | Cross-council conflict resolutions affecting API design | Updates to API spec when directed by synth |
|
||||
| `council-docs` | Providing for documentation | N/A | All API specs feed directly into the reference documentation |
|
||||
|
||||
**Critical path constraint:** `council-api` cannot finalise the distributed system call interface until `council-net` has committed to its RPC and RDMA model (DP-A04 depends on this). GPU system call design can proceed independently from Day 1.
|
||||
|
||||
---
|
||||
|
||||
## 7. WHOOSH Configuration
|
||||
|
||||
### 7.1 Team Formation
|
||||
|
||||
```yaml
|
||||
council_id: council-api
|
||||
display_name: "API Surface and Developer Experience Council"
|
||||
target_size: 40
|
||||
formation_strategy: competency_weighted
|
||||
required_roles:
|
||||
- role: lead-api-architect
|
||||
count: 1
|
||||
persona: systems-analyst
|
||||
competencies: [api-design, posix, distributed-systems, gpu-programming, developer-experience]
|
||||
- role: posix-compatibility-analyst
|
||||
count: 4
|
||||
persona: technical-specialist
|
||||
competencies: [posix, linux-kernel, system-calls, abi-stability]
|
||||
- role: gpu-syscall-designer
|
||||
count: 6
|
||||
persona: technical-specialist
|
||||
competencies: [cuda, rocm, gpu-memory, hopper-architecture, blackwell-architecture, nvlink]
|
||||
- role: distributed-syscall-designer
|
||||
count: 5
|
||||
persona: technical-specialist
|
||||
competencies: [rpc, rdma, mpi-collectives, distributed-locks, grpc]
|
||||
- role: sdk-designer
|
||||
count: 8
|
||||
persona: technical-specialist
|
||||
competencies: [c-abi, rust-async, go-concurrency, python-asyncio, ffi, sdk-design]
|
||||
- role: async-api-specialist
|
||||
count: 4
|
||||
persona: technical-specialist
|
||||
competencies: [io-uring, async-io, rust-futures, event-driven-design]
|
||||
- role: cli-designer
|
||||
count: 3
|
||||
persona: technical-specialist
|
||||
competencies: [cli-design, ux, kubectl-conventions, shell-completion]
|
||||
- role: error-handling-architect
|
||||
count: 3
|
||||
persona: systems-analyst
|
||||
competencies: [error-design, structured-errors, distributed-tracing, ucxl]
|
||||
- role: api-versioning-strategist
|
||||
count: 2
|
||||
persona: systems-analyst
|
||||
competencies: [api-versioning, semver, deprecation-policy, compatibility]
|
||||
- role: developer-experience-reviewer
|
||||
count: 4
|
||||
persona: technical-writer
|
||||
competencies: [developer-documentation, api-usability, example-applications, dogfooding]
|
||||
```
|
||||
|
||||
### 7.2 Quorum Rules
|
||||
|
||||
```yaml
|
||||
quorum:
|
||||
decision_threshold: 0.65 # 65% of active agents must agree on API design decisions
|
||||
lead_architect_veto: true # Lead API Architect can block any interface decision
|
||||
breaking_change_threshold: 0.85 # Breaking changes require 85% supermajority
|
||||
cross_council_approval:
|
||||
trigger: api_affects_subsystem
|
||||
required: [affected_council_lead, council-synth]
|
||||
response_sla_hours: 6
|
||||
developer_experience_review:
|
||||
trigger: new_public_call
|
||||
required: [developer-experience-reviewer_count >= 2]
|
||||
purpose: "Ensure every new call meets ergonomics standard before it enters the spec"
|
||||
```
|
||||
|
||||
### 7.3 Subchannels
|
||||
|
||||
```yaml
|
||||
subchannels:
|
||||
- id: api-posix-compat
|
||||
subscribers: [posix-compatibility-analyst, lead-api-architect]
|
||||
purpose: "POSIX coverage analysis, shim design, compatibility gap triage"
|
||||
ucxl_feed: "ucxl://council-api:posix-compatibility-analyst@DistOS:api/^^/specs/posix-*"
|
||||
|
||||
- id: api-gpu-syscalls
|
||||
subscribers: [gpu-syscall-designer, lead-api-architect, async-api-specialist]
|
||||
purpose: "GPU-native system call design; Hopper/Blackwell capability integration"
|
||||
ucxl_feed: "ucxl://council-api:gpu-syscall-designer@DistOS:api/^^/specs/gpu-*"
|
||||
|
||||
- id: api-distributed-syscalls
|
||||
subscribers: [distributed-syscall-designer, lead-api-architect]
|
||||
purpose: "Distributed call design; RPC and RDMA protocol negotiation with council-net"
|
||||
ucxl_feed: "ucxl://council-api:distributed-syscall-designer@DistOS:api/^^/specs/distributed-*"
|
||||
|
||||
- id: api-sdk-coordination
|
||||
subscribers: [sdk-designer, async-api-specialist, developer-experience-reviewer]
|
||||
purpose: "Cross-language SDK consistency; ABI stability coordination"
|
||||
ucxl_feed: "ucxl://council-api:sdk-designer@DistOS:api/^^/specs/sdk-*"
|
||||
|
||||
- id: api-error-and-versioning
|
||||
subscribers: [error-handling-architect, api-versioning-strategist, lead-api-architect]
|
||||
purpose: "Error catalogue development; versioning policy; UCXL trace integration"
|
||||
ucxl_feed: "ucxl://council-api:error-handling-architect@DistOS:api/^^/specs/error-*"
|
||||
|
||||
- id: api-cross-council-requirements
|
||||
subscribers: [lead-api-architect, distributed-syscall-designer, gpu-syscall-designer]
|
||||
purpose: "Inbound requirements from all subsystem councils; tracks what each council needs exposed"
|
||||
ucxl_feed: "ucxl://council-*:*@DistOS:*/^^/requirements/api-*"
|
||||
|
||||
- id: api-devex-review
|
||||
subscribers: [developer-experience-reviewer, lead-api-architect]
|
||||
purpose: "Developer experience walkthroughs; example application drafts; usability feedback"
|
||||
ucxl_feed: "ucxl://council-api:developer-experience-reviewer@DistOS:api/^^/examples/*"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Success Criteria
|
||||
|
||||
1. **Complete API surface:** The master API specification covers all system calls required by all six core subsystem councils. No subsystem has an unaddressed API requirement at the end of Phase 4.
|
||||
2. **POSIX coverage documented:** The POSIX compatibility matrix exists and classifies every POSIX.1-2017 system call as supported, shim-supported, or explicitly unsupported with rationale.
|
||||
3. **GPU system calls complete:** All GPU-native system calls for Hopper, Grace, and Blackwell are specified with parameter types, semantics, and error codes. NVLink/NVSwitch topology-aware calls are included.
|
||||
4. **Distributed system calls complete:** All distributed calls (RPC, locks, barriers, collectives) are specified with failure semantics and consistency guarantees matching the `council-fault` and `council-net` specs.
|
||||
5. **Four-language SDK specs complete:** C ABI, Rust, Go, and Python SDK specifications exist and have been reviewed for idiomatic correctness by SDK Designers.
|
||||
6. **Error handling consistent:** All error types are catalogued and every public API call has a documented error table. Every error carries a UCXL trace field.
|
||||
7. **Versioning policy ratified:** The versioning policy is agreed with `council-synth` and published. The experimental API tier is defined.
|
||||
8. **Verification-ready contracts:** All interface contracts have been delivered to `council-verify` in Alloy-compatible form by Day 8.
|
||||
9. **Developer experience validated:** At least three example applications have been written by Developer Experience Reviewers and cover: a simple GPU computation, a distributed collective operation, and a Weka FS streaming I/O pattern.
|
||||
10. **CLI specification complete:** `distos-ctl` subcommand structure and all primary flags are specified.
|
||||
|
||||
---
|
||||
|
||||
## 9. Timeline
|
||||
|
||||
### Phase 1: Research (Days 1–3)
|
||||
|
||||
- POSIX Compatibility Analysts catalogue POSIX.1-2017 system calls and assess coverage feasibility
|
||||
- GPU Syscall Designers survey CUDA Driver API, CUDA Graphs, Hopper/Blackwell architecture documentation, NVLink topology implications
|
||||
- Distributed Syscall Designers survey MPI collectives, gRPC, RDMA verbs, ZooKeeper/Chubby lock models
|
||||
- SDK Designers survey language ecosystems: Rust async patterns, Go `cgo` patterns, Python asyncio/CuPy
|
||||
- Async API Specialists study io_uring interface in depth
|
||||
- Lead API Architect drafts the API philosophy options paper for DP-A01
|
||||
- Deliverable: `ucxl://council-api:lead-api-architect@DistOS:api/^^/research/api-philosophy-options.md`
|
||||
|
||||
### Phase 2: Architecture (Days 3–6)
|
||||
|
||||
- Resolve DP-A01 (philosophy), DP-A02 (async mechanism), DP-A04 (RPC wire protocol), DP-A06 (auth/authz) — all in consultation with relevant councils
|
||||
- Lead API Architect drafts the call taxonomy: which calls belong in which layer (kernel/shim/library)
|
||||
- GPU Syscall Designers draft the GPU system call prototype spec for Hopper and Blackwell
|
||||
- Distributed Syscall Designers draft the distributed call prototype spec, contingent on DP-A04 resolution
|
||||
- Error Handling Architects draft the error type taxonomy and UCXL trace integration
|
||||
- Deliverable: `ucxl://council-api:lead-api-architect@DistOS:api/^^/research/call-taxonomy.md`
|
||||
|
||||
### Phase 3: Formal Specification (Days 6–10)
|
||||
|
||||
- Full API spec written: GPU syscalls, distributed syscalls, async interface, C ABI reference
|
||||
- Language SDK specifications written in parallel by SDK Designers
|
||||
- Error catalogue completed and UCXL trace integration specified
|
||||
- Alloy interface contracts delivered to `council-verify` for structural verification
|
||||
- CLI specification drafted by CLI Designers
|
||||
- POSIX compatibility matrix completed
|
||||
- Deliverable: `ucxl://council-api:gpu-syscall-designer@DistOS:api/^^/specs/gpu-syscalls.md` and all companion specs
|
||||
|
||||
### Phase 4: Integration (Days 10–12)
|
||||
|
||||
- Resolve any outstanding API requirements from subsystem councils surfaced during their Phase 3 spec work
|
||||
- DP-A03 and DP-A05 resolved with full DR records
|
||||
- API versioning policy ratified by `council-synth`
|
||||
- Developer Experience Reviewers conduct walkthroughs of all three example applications
|
||||
- Deliver final interface contracts to `council-verify` for re-verification after any Phase 3 changes
|
||||
- Deliverable: Versioning policy, three example applications
|
||||
|
||||
### Phase 5: Documentation (Days 12–14)
|
||||
|
||||
- Developer Experience Reviewers produce the developer-facing API reference document
|
||||
- SDK Designers produce getting-started guides for each language
|
||||
- All specs integrated into the master DistOS specification document via `council-docs`
|
||||
- Final UCXL navigability check: every API call traces back to the council decision that introduced it
|
||||
- Deliverable: `ucxl://council-api:developer-experience-reviewer@DistOS:api/^^/docs/api-reference.md`
|
||||
Reference in New Issue
Block a user