My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon.
-
My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon. On Reddit, people said Docling is better, so I tried it and I agree. Docling does a much better job preserving structure and tags, and it is definitely worth checking out! https://docling-project.github.io/docling/
-
My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon. On Reddit, people said Docling is better, so I tried it and I agree. Docling does a much better job preserving structure and tags, and it is definitely worth checking out! https://docling-project.github.io/docling/
@chikim You're saying it actually translates existing accessibility tags into the output format's equivalent? That is very cool if so.
-
@chikim You're saying it actually translates existing accessibility tags into the output format's equivalent? That is very cool if so.
@twynn Yes like it preserves headings, whereas markit down dropped the headings from boht PDF and docx. Also I havne't tried, but Docling has automatic image caption feature with vllm.
-
@twynn Yes like it preserves headings, whereas markit down dropped the headings from boht PDF and docx. Also I havne't tried, but Docling has automatic image caption feature with vllm.
@chikim When you have a chance, can you see if it really does preserve tags? STR:
1. Download <https://zoomcorp.com/media/documents/E_H1essential_v1.2_Supplementary.pdf>.
2. Under step 1, one of the images should read: "Illustration showing a frame around the area of the rectangular MENU button, which is near the middle on the right side of the unit." -
@chikim When you have a chance, can you see if it really does preserve tags? STR:
1. Download <https://zoomcorp.com/media/documents/E_H1essential_v1.2_Supplementary.pdf>.
2. Under step 1, one of the images should read: "Illustration showing a frame around the area of the rectangular MENU button, which is near the middle on the right side of the unit."@twynn I'm not sure about alt. I meant headings. I doubt it would keep the image alt desc. I don't even know if markdown has alt tag feature for images.
-
@twynn I'm not sure about alt. I meant headings. I doubt it would keep the image alt desc. I don't even know if markdown has alt tag feature for images.
@chikim I think HTML output format would, though I also don't know about Markdown. The problem with headings is that I'm not sure if it's getting those from the structure tags or using heuristic's.
-
R relay@relay.publicsquare.global shared this topic