CustomsHive
Belgian customs declaration processing tool. Accepts invoice PDFs, packing list PDFs, and/or Excel files; extracts structured goods data via AI; allows review and correction; then generates compliant IDMS/NCTS XML declarations for submission (with AES export support planned).
Stack
- ASP.NET Core 10 Razor Pages — EF Core + SQL Server, background processing via
System.Threading.Channels - AI extraction — Azure Document Intelligence (OCR/layout) + Azure OpenAI (classification + structured extraction); transit extraction path with fallback behavior
- Auth — Microsoft Entra ID (OIDC); role-based (
Beheerder,Hoofdgebruiker,Gebruiker) - XML generation — custom generators per message type (no third-party library)
- Reference data — Tarbel/UN-LOCODE/code lists in application databases and services
Core domain concepts
Dossier
Central entity. Each dossier represents one shipment/declaration file. Key fields:
- Ucr — unique customs reference
- Lrn — local reference number (transit)
- ContainerNumber, SupplierCode
- Regime — "IM" (import), "EX" (export), "T1"/"T2" (transit)
- Status — Draft → Queued → Processing → Review → Approved → Submitted (or Failed)
- RawExtraction / CorrectedData — AI output + user corrections stored as JSON
- InvoicePdfPath, PackingPdfPath, XlsxPath — uploaded source documents
Client
Importer/client profile. Pre-filled into declarations when a dossier is created. Key fields:
- Code — short identifier (e.g. "SKCH")
- Name, IdentificationNumber (EORI)
- Address: StreetAndNumber (max 70), Postcode (max 17), City (max 35), Country (ISO2, max 2)
- Authorisation references: Fr1 (BTW importeur / FR1), AuthC503, Ref4007, Et14000
- DefaultProcedure — default procedure code (Standaard Regeling)
CustomsDeclaration
Per-dossier declaration metadata captured at submission time:
- Reference documents: BillOfLadingRef (N337), OriginCertRef / OriginCertDate (N935)
- Transport: ContainerNumber, ContainerIndicator, DestinationCountry, CountryOfDispatch
- SupervisingCustomsOfficeRef (8-char office code)
- NatureOfTransaction, UseH2B (H1B = standard import, H2B = customs warehouse / procedure 71)
- Static company fields (sender GLN, company EORI, authorisations) read from AppSettings
Extraction
Each AI extraction attempt stored as ExtractionRecord with:
- Source — PDF pages converted to base64 images → sent to vision model
- RawJson — raw AI response (may contain // comment lines, stripped before parse)
- ParsedData — structured goods lines (quantity, description, HS code, value, weight, country of origin, etc.)
- ConfidenceScore
Supported XML message types
| Regime | XML message | Standard | Notes |
|---|---|---|---|
IM (import) |
IE415B — H1B | IDMS | Standard import, procedure 40, Exporter element |
IM (import) |
IE415B — H2B | IDMS | Customs warehouse, procedure 71, Warehouse + Seller |
EX (export) |
CC515 / IE515B | AES | Planned; generation flow not yet implemented |
T1/T2 (transit) |
CC015C | NCTS | Transit declaration |
Implemented XML generators (IE415BXmlGenerator, CC015CXmlGenerator) read corrected data + declaration metadata and produce schema-valid XML per the Belgian IDMS/NCTS XSDs.
Key workflows
1. New dossier
/NewDossier — upload invoice PDF + packing list PDF (and/or XLSX), select regime, select or create client inline, set UCR/LRN/container.
2. AI extraction
Background worker converts PDF pages to JPEG images (base64), sends to configured vision model with a structured prompt, parses JSON response into goods lines. Multiple extraction attempts can be compared.
3. Review & correction
- IM:
/Dossiers/ReviewImport— review extracted invoice lines, set HS codes, values, quantities - EX:
/Dossiers/ReviewExport - T1/T2:
/Dossiers/ReviewTransit— review transit goods, raw JSON panel for debugging - Goods breakdown pages:
/Dossiers/GoodsBreakdown,/Dossiers/GoodsBreakdownImport,/Dossiers/GoodsBreakdownTransit - Transport costs:
/Dossiers/TransportCosts
4. Declaration
/Dossiers/Declare — Beheerder/Hoofdgebruiker fills in declaration-specific fields (B/L ref, origin cert, customs office, container indicator, etc.) and generates XML for implemented flows (IM, T1/T2).
5. Clients
/Clients — CRUD for client profiles. Address + authorisation references populated here are auto-loaded into new dossiers.
AI configuration
Two Azure services are used together for PDF extraction:
| Service | Config prefix | Role |
|---|---|---|
| Azure OpenAI | AI:AzureOpenAI:* |
Document classification + structured JSON extraction |
| Azure Document Intelligence | DocumentIntelligence:* |
OCR / layout analysis (reads PDF natively) |
Both services share the same app-registration credentials (Azure:TenantId, Azure:ClientId, Azure:ClientSecret) for keyless auth. See configuration.md for full key reference.
Extraction pipeline
flowchart TD
subgraph Input
PDF(["Invoice / Packing List PDF"])
XLSX(["Excel XLSX"])
end
PDF --> ADI["Azure Document Intelligence\nprebuilt-layout OCR"]
XLSX --> XP["XlsxProcessor\n(direct parse)"]
ADI -->|OCR text| CL["Azure OpenAI\nclassify_document prompt"]
CL -->|invoice / packing_list| EX["Azure OpenAI\nextract_invoice / extract_packing_list"]
EX -->|structured JSON| DB[(ExtractionRecord)]
XP -->|structured JSON| DB
subgraph Transit
TPDF(["Transit TAD PDF"])
TPDF --> TADI["Azure Document Intelligence\nprebuilt-layout OCR"]
TADI -->|OCR text| TEX["Azure OpenAI\nextract_transit prompt"]
TEX -->|structured JSON| DB
end
XLSX files are parsed directly without AI — no OCR or LLM call needed.
Transit PDFs skip classification and go straight to extraction. A forced-vision fallback exists (PDF → JPEG → Azure OpenAI vision) for edge cases where ADI OCR quality is insufficient.
Prompts are stored in the Prompts table (DB) and editable via /Admin/Prompts.
Reference data
- Tarbel — Belgian tariff/nomenclature data;
TarbelServiceresolves HS codes, descriptions, applicable VAT - Country codes —
CountryCodeServiceresolves country names/codes from the TarbelGeographicalAreatable - Locode — UN/LOCODE lookup via
LocodeService - Code lists —
CodeListServicefor customs code validation
SMF XSD validation
SMF wrapper validation can run against bundled XSD files before submit/return. In Docker images built from this repo, XSD files are available at /app/schemas/smf and runtime defaults include:
Descartes__Smf__ValidateXsd=trueDescartes__Smf__XsdFolder=/app/schemas/smf
Quick start (Docker)
Docker images are published to ghcr.io/rousseauxy/customstuf. Tags follow docker-x.y.z convention on GitHub → image tag x.y.z.
See docs/configuration.md for all environment variables.
IIS deployment
Deployed via tag-triggered GitHub Actions + Azure DevOps self-hosted agent.
Push a v* tag → build artifact → ADO pipeline → versioned folder on IIS server.
See docs/deployment.md for full setup instructions.
Docs
| docs/configuration.md | appsettings, secrets, auth/config keys |
| docs/deployment.md | CI/CD pipeline setup (GitHub Actions + ADO + IIS) |
| docs/customs.md | implemented declaration flows, regimes, XML message formats |