@twynn I'm not sure about alt. I meant headings. I doubt it would keep the image alt desc. I don't even know if markdown has alt tag feature for images.
chikim@mastodon.social
Posts
-
My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon. -
My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon.@twynn Yes like it preserves headings, whereas markit down dropped the headings from boht PDF and docx. Also I havne't tried, but Docling has automatic image caption feature with vllm.
-
My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon.My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon. On Reddit, people said Docling is better, so I tried it and I agree. Docling does a much better job preserving structure and tags, and it is definitely worth checking out! https://docling-project.github.io/docling/
-
Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links!@x0 @marvellousmachine @ondrosik A lot of people also mentioned that docling is better! It might be worth to check out.
-
Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links!@x0 @marvellousmachine @ondrosik Not sure if Pandoc has support for OCR, out look messages, speech transcription, LLM support for MCP server, etc. Total speculation, but I suspect they created specifically to digest all kinds of documents for LLM training.
-
Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links!Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links! https://github.com/microsoft/markitdown