Skip to content

Endpoints

All routes are served over HTTP by the Uvicorn process inside the sinatools container. The base URL defaults to http://localhost:8000 in local development. For a machine-readable specification, see sinatools/openapi.json (OpenAPI v3).

Summary Table

Method Path Description
POST /ner Named-entity recognition via Wojood (mode = nested | flat).
GET /ner Query-string variant of the NER endpoint.
POST /wsd Word-sense disambiguation using Salma, optional gloss enrichment.
GET /wsd Query-string variant of the WSD endpoint.
POST /morph Morphological analysis via Alma/Qabas, supports task, flag, include_lemma.
GET /morph Query-string variant of the morphology endpoint.
POST /dialect/nabra Retrieve annotated Nabra sentence with CODA tokens, metadata, and match diagnostics.
GET /dialect/nabra Query-string variant, with fuzzy search fallbacks.
POST /dialect/nabra/glossary Map arbitrary text to Nabra glosses (tooltip-style meanings).
GET /dialect/nabra/glossary Query-string variant.
POST /re Relation extraction (Hadath) returning event-argument triples and confidence.
GET /re Query-string variant of the relation endpoint.
GET /health Liveness check with version metadata.
GET /healthz Mirror of /health for compatibility.

NER (/ner)

  • Parameters:
  • text (required): input string.
  • mode: nested (default) or flat.
  • include_offsets: boolean; when true, adds start/end character indices.
  • Response: tokens[] with BIO tags.
  • Notes: Wrapped via sinatools.ner.entity_extractor.extract.

WSD (/wsd)

  • Parameters:
  • text (required).
  • include_gloss (bool): include gloss text from local pickle.
  • Response: tokens[] containing sense_id, optional gloss, sense_url (offsets not applicable).

Morphology (/morph)

  • Parameters:
  • text (required).
  • task: full | lemmatization | pos | root.
  • flag: 1 (top) or * (all analyses).
  • language: currently only MSA.
  • include_lemma: adds lemma_url, lemma_forms (local lookup).
  • Response: tokens[] with Qabas identifiers and optional lemma metadata.

Dialect Sentence Lookup (/dialect/nabra)

  • Parameters:
  • sentence_id (int) and/or text (string). If both supplied, they must match the same row.
  • Response: Sentence context plus tokens[] annotated with CODA segmentation, MSA/Dialect lemmas, glosses.
  • Match metadata: match_type (id, exact, substring, fuzzy) and match_score help explain the lookup result.
  • Errors: Returns 400 if parameters conflict, 404 with closest suggestion if no match crosses similarity threshold.

Dialect Glossary (/dialect/nabra/glossary)

  • Parameters:
  • text (required): sentence to annotate.
  • top_k: max matches per token (default 3).
  • similarity_threshold: fuzzy match threshold (default 0.65).
  • include_offsets: when true, adds character spans for each token after trimming punctuation.
  • Response: Array of tokens with match diagnostics, corpus references, and (optionally) offsets for tooltip placement.

Relation Extraction (/re)

  • Parameters: text (required).
  • Response: relations[] with predicate, subject, object, confidence.

Health (/health, /healthz)

  • Returns service metadata and status; used by Compose and upstream health probes.