30 Commits

Author SHA1 Message Date
m.dabbagh
b57792eb41 feat: add pptx_extractor and html_extractor 2026-01-31 18:23:04 +03:30
m.dabbagh
b53f8c47d3 add title to Document model and remove display_name form DocumentMetadata 2026-01-28 22:13:55 +03:30
m.dabbagh
6259220629 disable swagger auth 2026-01-28 22:10:24 +03:30
m.dabbagh
2753b913fb enable swagger auth 2026-01-28 10:46:37 +03:30
m.dabbagh
a1fbd12874 fix: paragraph_chunker was adding "None" when the section.title was none 2026-01-27 21:08:32 +03:30
m.dabbagh
80dd901e42 fix: remove file extension from DocumentMetadata.display_name 2026-01-25 11:33:50 +03:30
m.dabbagh
9e1e49bc59 add document title and section title to the beginning of each chunk in paragraph chunker 2026-01-25 11:32:35 +03:30
m.dabbagh
cda128e438 one paragraph per chunk in paragraph chunking method 2026-01-25 11:03:54 +03:30
m.dabbagh
8ecbd88498 make DocumentSection.title optional 2026-01-24 20:25:34 +03:30
m.dabbagh
3aad734140 comment out swagger authentication 2026-01-24 17:06:25 +03:30
m.dabbagh
c6302bc792 add api-key header and swagger authentication 2026-01-24 17:05:29 +03:30
m.dabbagh
2ccb38179d use docling in extractors 2026-01-24 13:43:07 +03:30
m.dabbagh
ad163eb665 change api defaults 2026-01-20 23:36:02 +03:30
m.dabbagh
91f8035043 add s3 storage 2026-01-20 12:46:47 +03:30
m.dabbagh
0c09c79a2e refactor api routes 2026-01-19 22:03:36 +03:30
m.dabbagh
6086ddf818 add /chunk route 2026-01-19 21:54:23 +03:30
m.dabbagh
2c4a59f84b add extract endpoint 2026-01-19 16:05:55 +03:30
m.dabbagh
0084ae6bc0 fix 2026-01-19 15:42:46 +03:30
m.dabbagh
e783d92eca make chunking method enum and remove some redundant code in core and api 2026-01-19 15:19:11 +03:30
m.dabbagh
e2e1c86dd4 fix sorting and merging in zip extractor 2026-01-19 14:00:17 +03:30
m.dabbagh
6072bb188c fix a bug in zip extractor 2026-01-18 20:57:01 +03:30
m.dabbagh
32ca394d91 some fixes on the output text 2026-01-18 20:05:41 +03:30
m.dabbagh
90c10c79fa add text api 2026-01-18 19:38:53 +03:30
m.dabbagh
13b887260f add zip extractor adapter 2026-01-18 15:44:49 +03:30
m.dabbagh
f06370e0b9 some fixes in concrete implementations of chunkers 2026-01-08 16:47:50 +03:30
m.dabbagh
2c375ce6bd make the domain general and open to add crawling system 2026-01-08 04:57:35 +03:30
m.dabbagh
359026fa98 add SourceFile, DocumentSection models and markdown parser 2026-01-08 03:46:35 +03:30
m.dabbagh
10a619494b fix potential race condition in DocumentProcessorService._chunk_document by making the context stateless 2026-01-07 21:57:22 +03:30
m.dabbagh
fd39184c0c some fixes on architecture. make bootstrap wraps only the hexagonal plus the outgoing adapters 2026-01-07 21:02:38 +03:30
m.dabbagh
70f5b1478c init 2026-01-07 19:15:46 +03:30