Dabbagh m.dabbagh
  • Joined on 2025-10-05
m.dabbagh pushed to main at m.dabbagh/text_processor 2026-01-31 14:53:13 +00:00
b57792eb41 feat: add pptx_extractor and html_extractor
m.dabbagh pushed to main at m.dabbagh/text_processor 2026-01-29 00:58:50 +00:00
b53f8c47d3 add title to Document model and remove display_name form DocumentMetadata
6259220629 disable swagger auth
Compare 2 commits »
m.dabbagh pushed to main at m.dabbagh/text_processor 2026-01-28 07:16:46 +00:00
2753b913fb enable swagger auth
a1fbd12874 fix: paragraph_chunker was adding "None" when the section.title was none
Compare 2 commits »
m.dabbagh pushed to main at m.dabbagh/text_processor 2026-01-25 08:04:04 +00:00
80dd901e42 fix: remove file extension from DocumentMetadata.display_name
9e1e49bc59 add document title and section title to the beginning of each chunk in paragraph chunker
cda128e438 one paragraph per chunk in paragraph chunking method
8ecbd88498 make DocumentSection.title optional
3aad734140 comment out swagger authentication
Compare 8 commits »
m.dabbagh pushed to main at m.dabbagh/text_processor 2026-01-20 09:17:09 +00:00
91f8035043 add s3 storage
0c09c79a2e refactor api routes
6086ddf818 add /chunk route
Compare 3 commits »
m.dabbagh pushed to main at m.dabbagh/text_processor 2026-01-19 12:36:18 +00:00
2c4a59f84b add extract endpoint
e783d92eca make chunking method enum and remove some redundant code in core and api
e2e1c86dd4 fix sorting and merging in zip extractor
Compare 4 commits »
m.dabbagh created branch main in m.dabbagh/text_processor 2026-01-18 17:37:01 +00:00
m.dabbagh pushed to main at m.dabbagh/text_processor 2026-01-18 17:37:01 +00:00
6072bb188c fix a bug in zip extractor
32ca394d91 some fixes on the output text
90c10c79fa add text api
13b887260f add zip extractor adapter
f06370e0b9 some fixes in concrete implementations of chunkers
m.dabbagh created repository m.dabbagh/text_processor 2026-01-18 17:31:36 +00:00