Requirements¶
Overview¶
The NLP Lab sidecar is our deployable shell for Arabic NLP capabilities. Today it wraps the SinaTools SDK models (NER, WSD, morphology, relation extraction) plus the offline Nabra dialect corpus utilities, and it will host additional models as they come online.
Runtime Environment¶
- Python: 3.11 (configured via
SINA_PYTHON_VERSION). - Base image: Derived from our
sinatools/Dockerfilewith the SinaTools SDK installed from PyPI. - Process:
uvicorn main:app(single process with multiple workers viaUVICORN_WORKERS). - CPU / Memory: Align with container defaults; tune
UVICORN_WORKERSfor throughput.
Required Assets¶
- SinaTools SDK models: Pulled during image build, includes Wojood, Salma, Alma, Hadath (collectively our current Arabic NLP stack).
- Nabra dataset: CSV files mounted at
/app/Nabra(Nabra-dataset.csv,Nabra RowText_sentences.csv). These provide dialect annotations and glosses for the glossary endpoint. - OpenAPI schema: Generated at
sinatools/openapi.jsonfor reference and SDK generation.
Configuration¶
| Variable | Purpose | Default |
|---|---|---|
SINA_PORT |
Uvicorn listen port inside the container | 8000 |
SINA_HOST_PORT |
Published host port | 8000 |
SINA_WARM |
0 (lazy) or all (preload models on startup) |
0 |
SINA_ENABLE_CORS |
Enable permissive CORS during development | false |
UVICORN_WORKERS |
Worker count for Uvicorn | 2 |
SINA_VERSION |
SDK version tag (for metadata only) | 0.1.36 |
Local Development¶
- Mount
sinatools/appinto the container (already configured indocker-compose.override.yml). - Install dev dependencies (pytest, requests) inside the container if deeper testing is needed.
- Run
pytest /app/tests/test_api.pyafter changes.
External Integrations¶
- Platform: Other services call the REST endpoints; no direct DB or message queue dependency.
- Monitoring: TBD (hook into existing Prometheus scrape once we expose metrics).
- Feature toggles: API parameters handle optional metadata (
include_gloss,include_lemma). No runtime config yet.