Skip to content

CustomsHive

Belgian customs declaration processing tool. Accepts invoice PDFs, packing list PDFs, and/or Excel files; extracts structured goods data via AI; allows review and correction; then generates compliant IDMS/NCTS XML declarations for submission (with AES export support planned).

Stack

  • ASP.NET Core 10 Razor Pages — EF Core + SQL Server, background processing via System.Threading.Channels
  • AI extraction — Azure Document Intelligence (OCR/layout) + Azure OpenAI (classification + structured extraction); transit extraction path with fallback behavior
  • Auth — Microsoft Entra ID (OIDC); role-based (Beheerder, Hoofdgebruiker, Gebruiker)
  • XML generation — custom generators per message type (no third-party library)
  • Reference data — Tarbel/UN-LOCODE/code lists in application databases and services

Core domain concepts

Dossier

Central entity. Each dossier represents one shipment/declaration file. Key fields: - Ucr — unique customs reference - Lrn — local reference number (transit) - ContainerNumber, SupplierCode - Regime"IM" (import), "EX" (export), "T1"/"T2" (transit) - Status — Draft → Queued → Processing → Review → Approved → Submitted (or Failed) - RawExtraction / CorrectedData — AI output + user corrections stored as JSON - InvoicePdfPath, PackingPdfPath, XlsxPath — uploaded source documents

Client

Importer/client profile. Pre-filled into declarations when a dossier is created. Key fields: - Code — short identifier (e.g. "SKCH") - Name, IdentificationNumber (EORI) - Address: StreetAndNumber (max 70), Postcode (max 17), City (max 35), Country (ISO2, max 2) - Authorisation references: Fr1 (BTW importeur / FR1), AuthC503, Ref4007, Et14000 - DefaultProcedure — default procedure code (Standaard Regeling)

CustomsDeclaration

Per-dossier declaration metadata captured at submission time: - Reference documents: BillOfLadingRef (N337), OriginCertRef / OriginCertDate (N935) - Transport: ContainerNumber, ContainerIndicator, DestinationCountry, CountryOfDispatch - SupervisingCustomsOfficeRef (8-char office code) - NatureOfTransaction, UseH2B (H1B = standard import, H2B = customs warehouse / procedure 71) - Static company fields (sender GLN, company EORI, authorisations) read from AppSettings

Extraction

Each AI extraction attempt stored as ExtractionRecord with: - Source — PDF pages converted to base64 images → sent to vision model - RawJson — raw AI response (may contain // comment lines, stripped before parse) - ParsedData — structured goods lines (quantity, description, HS code, value, weight, country of origin, etc.) - ConfidenceScore

Supported XML message types

Regime XML message Standard Notes
IM (import) IE415B — H1B IDMS Standard import, procedure 40, Exporter element
IM (import) IE415B — H2B IDMS Customs warehouse, procedure 71, Warehouse + Seller
EX (export) CC515 / IE515B AES Planned; generation flow not yet implemented
T1/T2 (transit) CC015C NCTS Transit declaration

Implemented XML generators (IE415BXmlGenerator, CC015CXmlGenerator) read corrected data + declaration metadata and produce schema-valid XML per the Belgian IDMS/NCTS XSDs.

Key workflows

1. New dossier

/NewDossier — upload invoice PDF + packing list PDF (and/or XLSX), select regime, select or create client inline, set UCR/LRN/container.

2. AI extraction

Background worker converts PDF pages to JPEG images (base64), sends to configured vision model with a structured prompt, parses JSON response into goods lines. Multiple extraction attempts can be compared.

3. Review & correction

  • IM: /Dossiers/ReviewImport — review extracted invoice lines, set HS codes, values, quantities
  • EX: /Dossiers/ReviewExport
  • T1/T2: /Dossiers/ReviewTransit — review transit goods, raw JSON panel for debugging
  • Goods breakdown pages: /Dossiers/GoodsBreakdown, /Dossiers/GoodsBreakdownImport, /Dossiers/GoodsBreakdownTransit
  • Transport costs: /Dossiers/TransportCosts

4. Declaration

/Dossiers/Declare — Beheerder/Hoofdgebruiker fills in declaration-specific fields (B/L ref, origin cert, customs office, container indicator, etc.) and generates XML for implemented flows (IM, T1/T2).

5. Clients

/Clients — CRUD for client profiles. Address + authorisation references populated here are auto-loaded into new dossiers.

AI configuration

Two Azure services are used together for PDF extraction:

Service Config prefix Role
Azure OpenAI AI:AzureOpenAI:* Document classification + structured JSON extraction
Azure Document Intelligence DocumentIntelligence:* OCR / layout analysis (reads PDF natively)

Both services share the same app-registration credentials (Azure:TenantId, Azure:ClientId, Azure:ClientSecret) for keyless auth. See configuration.md for full key reference.

Extraction pipeline

flowchart TD
    subgraph Input
        PDF(["Invoice / Packing List PDF"])
        XLSX(["Excel XLSX"])
    end

    PDF --> ADI["Azure Document Intelligence\nprebuilt-layout OCR"]
    XLSX --> XP["XlsxProcessor\n(direct parse)"]

    ADI -->|OCR text| CL["Azure OpenAI\nclassify_document prompt"]
    CL -->|invoice / packing_list| EX["Azure OpenAI\nextract_invoice / extract_packing_list"]
    EX -->|structured JSON| DB[(ExtractionRecord)]
    XP -->|structured JSON| DB

    subgraph Transit
        TPDF(["Transit TAD PDF"])
        TPDF --> TADI["Azure Document Intelligence\nprebuilt-layout OCR"]
        TADI -->|OCR text| TEX["Azure OpenAI\nextract_transit prompt"]
        TEX -->|structured JSON| DB
    end

XLSX files are parsed directly without AI — no OCR or LLM call needed.

Transit PDFs skip classification and go straight to extraction. A forced-vision fallback exists (PDF → JPEG → Azure OpenAI vision) for edge cases where ADI OCR quality is insufficient.

Prompts are stored in the Prompts table (DB) and editable via /Admin/Prompts.

Reference data

  • Tarbel — Belgian tariff/nomenclature data; TarbelService resolves HS codes, descriptions, applicable VAT
  • Country codesCountryCodeService resolves country names/codes from the Tarbel GeographicalArea table
  • Locode — UN/LOCODE lookup via LocodeService
  • Code listsCodeListService for customs code validation

SMF XSD validation

SMF wrapper validation can run against bundled XSD files before submit/return. In Docker images built from this repo, XSD files are available at /app/schemas/smf and runtime defaults include:

  • Descartes__Smf__ValidateXsd=true
  • Descartes__Smf__XsdFolder=/app/schemas/smf

Quick start (Docker)

cp .env.example .env
# fill in .env
docker compose up -d

Docker images are published to ghcr.io/rousseauxy/customstuf. Tags follow docker-x.y.z convention on GitHub → image tag x.y.z.

See docs/configuration.md for all environment variables.

IIS deployment

Deployed via tag-triggered GitHub Actions + Azure DevOps self-hosted agent. Push a v* tag → build artifact → ADO pipeline → versioned folder on IIS server.

See docs/deployment.md for full setup instructions.

Docs

docs/configuration.md appsettings, secrets, auth/config keys
docs/deployment.md CI/CD pipeline setup (GitHub Actions + ADO + IIS)
docs/customs.md implemented declaration flows, regimes, XML message formats