Developer's Guide to OpenAPI Schema Generation in Waldur
This document provides an in-depth explanation of our approach to generating a high-quality OpenAPI 3 schema for the Waldur API using drf-spectacular
. A well-defined schema is critical for API documentation, client generation, automated testing, and providing a clear contract for our API consumers.
We heavily customize drf-spectacular
's default behavior to produce a schema that is not only accurate but also rich with metadata, developer-friendly, and reflective of Waldur's specific architecture and conventions.
1. Architectural Overview
drf-spectacular
generates a schema by introspecting your Django Rest Framework project. Our customizations hook into this process at four key stages, each handled by a different component:
Component | File | Responsibility | When to Use |
---|---|---|---|
Endpoint Enumerator | openapi_generators.py |
Discovering Endpoints. Controls which API endpoints and methods are included in the schema. | When you need to globally filter out views or methods based on a project-specific convention (e.g., a disabled_actions property on a viewset). |
Schema Inspector (AutoSchema ) |
openapi_inspector.py |
Analyzing Individual Endpoints. The main workhorse. It inspects a single view/method to determine its parameters, request/response bodies, description, operation ID, and other details. | For the majority of customizations related to a specific endpoint's representation, like adding custom parameters, modifying descriptions, or adding vendor extensions. |
Extensions | openapi_extensions.py |
Handling Custom Components. Provides explicit schema definitions for custom classes (Authentication, Serializer Fields, Serializers) that drf-spectacular cannot introspect automatically. |
When you have a reusable custom class (e.g., GenericRelatedField ) that needs a consistent representation across the entire schema. |
Post-processing Hooks | schema_hooks.py |
Modifying the Final Schema. Functions that run on the fully generated schema just before it's rendered. They are used for global search-and-replace operations, refactoring, and complex structural changes. | For broad, cross-cutting changes like adding a header to all list endpoints, refactoring common parameters into components, or implementing complex polymorphic schemas. |
The generation process flows like this: Enumerator → Inspector (for each endpoint) → Extensions (as needed by Inspector) → Schema Hooks → Final OpenAPI YAML/JSON
2. The Core Inspector: WaldurOpenApiInspector
This class, located in openapi_inspector.py
, is our custom subclass of AutoSchema
and contains the most significant logic for tailoring the schema endpoint-by-endpoint.
Key Methods and Use-Cases
resolve_serializer(...)
- Purpose: To ensure that when a serializer is instantiated during schema generation, it's aware of this context.
- Mechanism: It calls the parent method and then sets a flag
_is_generating_schema = True
on the mock request object within the serializer's context. - Design Rationale: Several of our custom serializers change their behavior based on this flag. For example,
RestrictedSerializerMixin
might include all its fields, even optional ones, if this flag is set. This gives API consumers a complete picture of all possible data they could receive.
get_operation(...)
- Purpose: To enrich the generated "operation" object with Waldur-specific metadata and logic.
-
Edge Cases Handled:
- HEAD method for Lists: We map the
HEAD
HTTP method to a "count" operation for list views. The inspector provides a custom description and a simple200
response. Crucially, it returnsNone
for detail views (/api/users/{uuid}/
), effectively hiding this non-sensical operation. - Custom Permissions Metadata: This is a powerful feature for our frontend developers. If a view action has a
_permissions
attribute (e.g.,create_permissions
), the inspector extracts this data and injects it into the schema under a customx-permissions
vendor extension. This allows the frontend to understand the permissions required for an action without hardcoding them.
1 2 3 4 5 6 7
# Example Output "/api/projects/": post: summary: "Create a new project" x-permissions: - permission: "project.create" scopes: ["customer"]
- HEAD method for Lists: We map the
get_description()
- Purpose: To pull the docstring from the correct viewset action (
create
,retrieve
,my_action
) rather than from the view class itself. - Convention: Developers must write clear, concise docstrings on viewset action methods. These docstrings are what users will see in the API documentation.
get_operation_id()
- Purpose: To generate clean, predictable, and code-generator-friendly operation IDs.
- Convention: The default behavior is modified to produce IDs like
projects_list
,projects_create
,projects_retrieve
. A special case for non-createPOST
actions (e.g., custom actions) uses a shorter format to avoid redundancy. This consistency is vital for generated API clients.
get_override_parameters()
- Purpose: To dynamically add query parameters based on the response serializer.
- Use-Case: Our
RestrictedSerializerMixin
allows users to request a subset of fields via thefield
query parameter (e.g.,?field=name&field=uuid
). This method introspects the response serializer, gets all its possible field names, and automatically generates theOpenApiParameter
forfield
with a completeenum
of available values. This provides excellent auto-complete and validation in tools like Swagger UI.
_postprocess_serializer_schema(...)
- Purpose: To modify a serializer's schema after it has been generated.
- Use-Case: Our serializers can have an
optional_fields
override. This method respects that override by removing those fields from therequired
array in the final schema. This is a clean way to tweak serializer requirements for the API without complex serializer inheritance.
3. Specialized Handlers: Extensions
Located in openapi_extensions.py
, these classes provide a modular way to handle custom components.
WaldurTokenScheme
,WaldurSessionScheme
,OIDCAuthenticationScheme
: These extensions map our custom DRF authentication classes to standard OpenAPI security schemes. This is the correct way to document API authentication.GenericRelatedFieldExtension
:- Problem:
drf-spectacular
doesn't know how to represent our customGenericRelatedField
. - Solution: This extension tells the generator to simply represent it as a
string
(which, in our case, is a URL). This avoids schema generation errors and provides a simple, accurate representation.
- Problem:
OpenStackNestedSecurityGroupSerializerExtension
:- Problem: A specific nested serializer is overly complex, and for the API schema, we only want to show a simplified version of it.
- Solution: This extension bypasses introspection of the serializer entirely and provides a fixed, hardcoded schema (
{"type": "object", "properties": {"url": ...}}
). This is an excellent technique for simplifying complex nested objects in the API documentation.
4. Endpoint Discovery: WaldurEndpointEnumerator
Located in openapi_generators.py
, this class controls which endpoints are included in the schema.
- Purpose: The default enumerator might include all possible HTTP methods that a view could support. Our
WaldurEndpointEnumerator
is smarter. - Mechanism: It respects the
disabled_actions
list property on our viewsets. If an action (e.g.,'destroy'
) is indisabled_actions
, the corresponding method (DELETE
) will be excluded from the schema for that endpoint. - Convention: To disable an API endpoint, add its action name to the
disabled_actions
list on theViewSet
. The API documentation will automatically update to reflect this.
5. Global Transformations: Schema Hooks
Located in schema_hooks.py
, these functions perform powerful, sweeping modifications to the entire generated schema. They are the last step in the process.
- Design Principle: Use hooks for cross-cutting concerns that affect many endpoints, or for complex transformations that are difficult to achieve within the inspector.
Key Hooks and Their Purpose
refactor_pagination_parameters
:- Best Practice: This hook implements the DRY (Don't Repeat Yourself) principle. It finds all instances of
page
andpage_size
parameters, moves their definition to the global#/components/parameters/
section, and replaces the inline definitions with$ref
pointers. This reduces schema size and improves consistency.
- Best Practice: This hook implements the DRY (Don't Repeat Yourself) principle. It finds all instances of
add_result_count_header
:- Purpose: To document that all our paginated list endpoints return the
x-result-count
header. - Mechanism: It identifies list endpoints (by checking if
operationId
ends in_list
), defines a reusable header in#/components/headers/
, and adds a reference to it in the2xx
responses of those endpoints.
- Purpose: To document that all our paginated list endpoints return the
make_fields_optional
:- Problem: Endpoints using
RestrictedSerializerMixin
can return a variable subset of fields. How do we represent this? - Solution: This hook finds any operation that has a
field
query parameter. For those operations, it recursively traverses their response schemas and removes therequired
property from all objects. This correctly signals to API consumers that any field might be absent if not explicitly requested.
- Problem: Endpoints using
transform_paginated_arrays
:- Purpose: To simplify the schema structure for paginated responses.
- Mechanism:
drf-spectacular
often creates named components likePaginatedUserList
. This hook finds all such components, inlines their array definition wherever they are referenced, and then removes the original component definition. The result is a slightly more verbose but flatter and often easier-to-understand schema for the end-user.
add_polymorphic_attributes_schema
:- This is the most advanced and powerful hook in our arsenal.
- Problem: The
attributes
field on the "Create Order" endpoint is polymorphic. Its structure depends entirely on theoffering_type
of the marketplace offering. - Solution: We use OpenAPI's
oneOf
keyword to represent this polymorphism. - Mechanism: The hook acts as a pre-processing step. It dynamically:
- Iterates through all registered marketplace plugins (
waldur_mastermind.marketplace.plugins
). - For each plugin, it finds the serializer responsible for validating the
attributes
field. - It uses a temporary
AutoSchema
instance to generate a schema for that specific serializer's fields. - It adds this generated schema to
#/components/schemas/
with a unique name (e.g.,OpenStackInstanceCreateOrderAttributes
). - Finally, it modifies the
OrderCreateRequest
schema to replace theattributes
field with aoneOf
that references all the dynamically generated schemas, plus a generic fallback.
- Iterates through all registered marketplace plugins (
- Architectural Significance: This demonstrates how hooks can be used to generate schema fragments dynamically by introspecting parts of the application (in this case, the plugin system) that are outside the immediate scope of a DRF view.
- Other Hooks:
postprocess_drop_description
,postprocess_fix_enum
,remove_waldur_cookie_auth
,adjust_request_body_content_types
are utility hooks for cleaning up and standardizing the final output.
6. Best Practices and Conventions
- Docstrings are the Source of Truth: Write clear docstrings on viewset action methods. They become the official API descriptions.
- Use the Right Tool for the Job:
- View-specific logic? Use the
WaldurOpenApiInspector
. - Reusable custom class? Create an
Extension
. - Global rule for filtering endpoints? Modify the
WaldurEndpointEnumerator
. - Schema-wide refactoring or complex polymorphism? Write a
postprocessing_hook
.
- View-specific logic? Use the
- Leverage View Attributes for Metadata: We use view attributes like
create_permissions
anddisabled_actions
to control schema generation. This co-locates API behavior and its documentation, making the code easier to maintain. - Embrace Vendor Extensions (
x-
): For custom metadata that doesn't fit the OpenAPI standard (like ourx-permissions
), vendor extensions are the correct and standard way to include it. - Strive for DRY Schemas: Use hooks like
refactor_pagination_parameters
to create reusable components (parameters
,headers
,schemas
). This keeps the schema clean and consistent. - Handle Polymorphism with Hooks: For complex conditional schemas (
oneOf
,anyOf
), post-processing hooks are the most flexible and powerful tool available, as demonstrated byadd_polymorphic_attributes_schema
. - Simplify for the Consumer: Use extensions (
OpenStackNestedSecurityGroupSerializerExtension
) and hooks (transform_paginated_arrays
) to simplify complex or deeply nested objects where the full detail is unnecessary for the API consumer. The goal is a schema that is not just accurate, but also usable.
7. The OpenAPI Schema in the Broader Workflow
The OpenAPI schema is not merely a documentation artifact; it is a critical, machine-readable contract that drives a significant portion of our development, testing, and release workflows. Our CI/CD pipelines are built around the schema as the single source of truth for the API's structure.
The entire automated process is defined in the GitLab CI configurations for the waldur-mastermind
and waldur-docs
repositories.
1. Automated Generation
The process begins in the waldur-mastermind
pipeline in a job named Generate OpenAPI schema
.
- Triggers: This job runs automatically in two scenarios:
- On a schedule for the
develop
branch: This ensures we always have an up-to-date schema reflecting the latest development state. - When a version tag is pushed (e.g.,
1.2.3
): This generates a stable, versioned schema for a specific release.
- On a schedule for the
- Output: The job produces a versioned
waldur-openapi-schema.yaml
file, which is stored as a CI artifact. This artifact becomes the input for all subsequent steps.
2. Automated SDK and Tooling Generation
The generated schema artifact immediately triggers a series of parallel jobs, each responsible for generating a specific client SDK or tool. This "schema-first" approach ensures that our client libraries are always perfectly in sync with the API they are meant to consume.
Generate TypeScript SDK
: For Waldur HomePort and other web frontends.Generate Python SDK
: For scripting, integrations, and internal tools.Generate Go SDK
: For command-line tools and backend services.Generate Terraform SDK
: Creates a Terraform provider for infrastructure-as-code management of Waldur resources.Generate Ansible modules
: Creates Ansible collections for configuration management and automation.
3. Continuous Delivery of SDKs
For development builds (from the develop
branch), the newly generated SDKs are automatically committed and pushed to the main
or develop
branch of their respective GitHub repositories. This provides a continuous delivery pipeline for our API clients, allowing developers to immediately access and test the latest API changes through their preferred language.
4. Release and Versioning Workflow
For tagged releases, the workflow is more extensive:
- API Diff Generation: A job named
Generate OpenAPI schema diff
is triggered. It fetches the schema of the previous release from thewaldur-docs
repository and compares it against the newly generated schema usingoasdiff
. It produces a human-readable Markdown file (openapi-diff.md
) detailing exactly what has changed (endpoints added, fields removed, etc.). - Documentation Deployment: The new versioned schema (
waldur-openapi-schema-1.2.3.yaml
) and the diff file are automatically committed to thewaldur-docs
repository. The documentation site is then rebuilt, archiving the new schema and making the API changes visible in the release notes. - Changelog Integration: The main
CHANGELOG.md
in thewaldur-docs
repository is automatically updated with links to the new schema file and the API diff page. This provides unparalleled clarity for integrators, showing them precisely what changed in a new release. - SDK Release: The tagged version of each SDK is released, often involving bumping the version in configuration files (
pyproject.toml
,package.json
) and pushing a corresponding version tag to the SDK's repository.
This automated, schema-driven workflow provides immense benefits:
- Consistency: All clients and documentation are generated from the same source, eliminating discrepancies.
- Speed: Developers get up-to-date SDKs without manual intervention, accelerating the development cycle.
- Reliability: The risk of human error in writing client code or documenting changes is significantly reduced.
- Clarity: Release notes are precise and automatically generated, giving integrators clear instructions on what to expect.