USPTO PAIR vs Patent Center Data Structures

USPTO PAIR versus Patent Center data structures is the field-level contrast between the retired Patent Application Information Retrieval SOAP/XML feed and the RESTful JSON that the current Patent Center Open Data Portal returns — a contrast a docketing pipeline must resolve exactly, because every mismatched field, timezone, or event code translates directly into a mis-computed statutory deadline.

The retirement of legacy PAIR forced automation off predictable, deeply nested XML and onto a JSON API with different serialization rules, explicit null semantics, ISO 8601 UTC timestamps, and a re-cut event taxonomy. This page isolates the exact structural deltas and gives you a single validated ingestion contract so the migration does not silently corrupt your calendar. It is the practitioner-level companion to the parent USPTO Data Schema Mapping layer, which owns the canonical internal schema these payloads must map into.

Technical Specification: PEDS/PAIR XML vs the Open Data Portal JSON

Legacy Private and Public PAIR data was exposed programmatically through the Patent Examination Data System (PEDS), returning SOAP/XML with deeply nested <TransactionHistory> and <ApplicationStatus> blocks and lenient string coercion. Patent Center’s data is now served through the USPTO Open Data Portal (ODP) API, documented at the USPTO Developer Hub and data.uspto.gov/apis, which returns flat JSON: an events (transaction history) array with alphanumeric eventCode/statusCode values, documentCodes, and cursor pagination (nextPageToken) instead of offset paging.

Four structural deltas break naive parsers:

Concern	Legacy PAIR / PEDS (XML)	Patent Center / ODP (JSON)
Serialization	Nested elements, mixed content	Flat arrays + objects
Missing data	Element absent or empty string	Explicit `null`
Dates	`MM/DD/YYYY`, implicit local time	ISO 8601 UTC `YYYY-MM-DDTHH:MM:SSZ`
Event identity	Numeric `EventCode` (e.g. `1002`)	Alphanumeric `eventCode`/`statusCode`, often split into finer steps

The most dangerous delta is date semantics for deadline math. Under PAIR, a <MailingDate> implicitly carried the weekend/holiday extension logic of 37 CFR § 1.7. Patent Center decouples mailDate from receiptDate and returns both as UTC instants, so your pipeline — not the payload — now owns the business-day roll-forward and must align it to the official USPTO federal-holiday calendar rather than generic OPM observances. Enforce ISO 8601 UTC at the ingestion boundary and reject any MM/DD/YYYY string outright.

Minimal Reproducible Implementation

The following focused ingestion contract validates a single Patent Center ODP event: it fails closed on missing eventCode/statusCode, rejects any timestamp that is not explicit UTC, and normalizes to a timezone-aware datetime using the standard-library zoneinfo module (never pytz). This is the smallest piece you can drop in front of a deadline engine to stop schema drift at the door.

from __future__ import annotations

from datetime import datetime, timezone
from zoneinfo import ZoneInfo

from pydantic import BaseModel, StrictStr, field_validator

# Patent Center serves UTC instants; docketing UI renders in USPTO local time.
USPTO_TZ = ZoneInfo("America/New_York")  # 37 CFR 1.6 filing-time reference


class PatentCenterEvent(BaseModel):
    """One transaction-history entry from the USPTO Open Data Portal.

    Source: https://data.uspto.gov/apis (Patent Center / ODP event schema).
    """

    eventCode: StrictStr          # alphanumeric, e.g. "ALLOWED", "MCTNF"
    statusCode: StrictStr         # prosecution status, e.g. "PATENTED"
    mailDate: datetime | None = None
    receiptDate: datetime | None = None

    @field_validator("mailDate", "receiptDate", mode="before")
    @classmethod
    def require_utc(cls, v: str | None) -> datetime | None:
        # Reject legacy MM/DD/YYYY and any timestamp without an explicit
        # UTC designator — silent coercion is how deadlines drift by a day.
        if v is None:
            return None
        if not isinstance(v, str) or not v.endswith("Z"):
            raise ValueError(f"Non-UTC or malformed timestamp: {v!r}")
        return datetime.fromisoformat(v.replace("Z", "+00:00"))

    def mailed_local_day(self) -> datetime | None:
        """UTC instant -> USPTO local calendar day (deadline anchor)."""
        if self.mailDate is None:
            return None
        return self.mailDate.astimezone(USPTO_TZ)


# Version-pinned, bidirectional code map keeps new ODP codes translatable
# back to the legacy docketing triggers your rule engine already understands.
EVENT_CODE_MAP: dict[str, str] = {
    "ALLOWED": "NOTICE_OF_ALLOWANCE",  # legacy EventCode 1002
    "MCTNF":   "NON_FINAL_REJECTION",  # response clock starts
    "MCTFR":   "FINAL_REJECTION",
    "ABN":     "ABANDONED",            # terminal — closes open deadlines
}

The paired mapping table belongs in version control as a data file, with each row citing the office source it was derived from:

# uspto_event_map.yaml
# Source: USPTO Open Data Portal event/status code reference
# https://data.uspto.gov/apis  (review each USPTO API changelog quarterly)
ALLOWED:                    # ODP eventCode
  legacy_event_code: "1002"
  docketing_trigger: NOTICE_OF_ALLOWANCE
  priority: action_required
ABN:
  legacy_event_code: "398"
  docketing_trigger: ABANDONED
  priority: terminal        # overrides all pending actions

Because ODP splits monolithic legacy events into granular steps (ALLOWED, ISSUE_FEE_DUE, PUBLICATION_READY), the rule engine must sort the events array by date and apply the highest-priority trigger, with terminal states (PATENTED, ABANDONED, EXPIRED) always overriding pending ones. Route the raw payload through disciplined Schema Validation & Error Categorization before a code is trusted enough to move a date.

Known Gotchas & Compliance Traps

Silent timezone shift on mail date. A UTC mailDate of 2026-03-08T02:00:00Z is still March 7 in America/New_York. Converting to the local calendar day after deadline arithmetic, or skipping conversion entirely, produces off-by-one due dates around midnight and across DST boundaries. Mitigation: normalize to UTC on ingest, run all offset math in UTC, then convert to USPTO local time only for the final calendar-day anchor — and add regression tests for the March/November DST weekends.
receiptDate earlier than mailDate. Scanner backdating and ODP sync lag can produce a receiptDate that precedes mailDate. Docketing off the wrong one shifts the response clock. Mitigation: flag the record PENDING_VERIFICATION, suppress automatic deadline generation, and escalate to a paralegal rather than guessing.
Unmapped event codes fail open. When the ODP taxonomy adds or renames a code your EVENT_CODE_MAP has not caught up to, a permissive parser drops it and no deadline is generated — the most dangerous silent failure in docketing. Mitigation: fail closed on any unknown eventCode/statusCode, quarantine the payload, and alert operations; treat the mapping file as version-pinned config reviewed against each USPTO API changelog.
Rate limits and partial payloads mid-poll. The ODP API enforces quotas and returns structured errors ({"error": {"code": 429, ...}}). Dropping a throttled page leaves a matter with a stale event history. Mitigation: apply the retry discipline from Implementing Exponential Backoff for Patent APIs, and open a circuit breaker that halts docketing writes after repeated 5xx responses while still serving read-only cached state.

Integration Point

This mapping sits at the ingestion edge of the docketing pipeline. Upstream, the raw JSON arrives from USPTO Patent Center Web Scraping or a direct ODP API pull; this page is the normalization contract that turns those bytes into validated events. Downstream, the normalized docketing_trigger values feed the deadline engines, and specialized cases such as Handling USPTO Maintenance Fee Notification Parsing consume the same validated event stream.

Every raw payload and every mapping decision must land in an append-only, hash-chained store so a computed deadline is reproducible from its exact inputs — the discipline defined by the Security & Access Control Boundaries module, whose audit trail is what defends the calculation against a malpractice claim. When the ODP feed is unavailable or drifts, degrade to cached last-known-good state through the Building a Fallback Routing System for Patent Dockets pattern rather than emitting a bad date. All of this rolls up into the USPTO Data Schema Mapping canonical schema.

Frequently Asked Questions

Is legacy PAIR still available as a data source?

Public and Private PAIR were retired and their programmatic feed (PEDS) is being superseded by the USPTO Open Data Portal API. Treat any retained PAIR/PEDS cache as a read-only tertiary fallback for last-known state, never as the primary source, and mark records sourced from it with a stale_data flag.

Why did my parser start returning null where it used to see a value?

Legacy XML often omitted an element or emitted an empty string for unavailable data, which lenient parsers coerced to a default. Patent Center returns an explicit null. Model those fields as datetime | None and decide per field whether null is acceptable or should route the record to manual review.

What happens if the mail date falls on a weekend or federal holiday?

Patent Center no longer folds the extension into the date, so your engine applies 37 CFR 1.7: a due date on a Saturday, Sunday, or DC federal holiday rolls forward to the next business day, checked against the USPTO federal-holiday calendar rather than OPM or state schedules. Log the raw timestamp, the applied rule ID, and the final date.

How do I keep new ODP event codes from silently dropping deadlines?

Fail closed. Any eventCode or statusCode not present in your version-pinned mapping file must quarantine the payload and alert operations instead of being ignored. Review the mapping against every USPTO API changelog on a fixed cadence.