Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon.

My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon.

Scheduled Pinned Locked Moved Uncategorized
6 Posts 2 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • chikim@mastodon.socialC This user is from outside of this forum
    chikim@mastodon.socialC This user is from outside of this forum
    chikim@mastodon.social
    wrote last edited by
    #1

    My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon. On Reddit, people said Docling is better, so I tried it and I agree. Docling does a much better job preserving structure and tags, and it is definitely worth checking out! https://docling-project.github.io/docling/

    T 1 Reply Last reply
    1
    0
    • chikim@mastodon.socialC chikim@mastodon.social

      My earlier post about converting documents to Markdown with Microsoft MarkItDown got a lot of boosts on Mastodon. On Reddit, people said Docling is better, so I tried it and I agree. Docling does a much better job preserving structure and tags, and it is definitely worth checking out! https://docling-project.github.io/docling/

      T This user is from outside of this forum
      T This user is from outside of this forum
      twynn@mas.to
      wrote last edited by
      #2

      @chikim You're saying it actually translates existing accessibility tags into the output format's equivalent? That is very cool if so.

      chikim@mastodon.socialC 1 Reply Last reply
      0
      • T twynn@mas.to

        @chikim You're saying it actually translates existing accessibility tags into the output format's equivalent? That is very cool if so.

        chikim@mastodon.socialC This user is from outside of this forum
        chikim@mastodon.socialC This user is from outside of this forum
        chikim@mastodon.social
        wrote last edited by
        #3

        @twynn Yes like it preserves headings, whereas markit down dropped the headings from boht PDF and docx. Also I havne't tried, but Docling has automatic image caption feature with vllm.

        T 1 Reply Last reply
        0
        • chikim@mastodon.socialC chikim@mastodon.social

          @twynn Yes like it preserves headings, whereas markit down dropped the headings from boht PDF and docx. Also I havne't tried, but Docling has automatic image caption feature with vllm.

          T This user is from outside of this forum
          T This user is from outside of this forum
          twynn@mas.to
          wrote last edited by
          #4

          @chikim When you have a chance, can you see if it really does preserve tags? STR:
          1. Download <https://zoomcorp.com/media/documents/E_H1essential_v1.2_Supplementary.pdf>.
          2. Under step 1, one of the images should read: "Illustration showing a frame around the area of the rectangular MENU button, which is near the middle on the right side of the unit."

          chikim@mastodon.socialC 1 Reply Last reply
          0
          • T twynn@mas.to

            @chikim When you have a chance, can you see if it really does preserve tags? STR:
            1. Download <https://zoomcorp.com/media/documents/E_H1essential_v1.2_Supplementary.pdf>.
            2. Under step 1, one of the images should read: "Illustration showing a frame around the area of the rectangular MENU button, which is near the middle on the right side of the unit."

            chikim@mastodon.socialC This user is from outside of this forum
            chikim@mastodon.socialC This user is from outside of this forum
            chikim@mastodon.social
            wrote last edited by
            #5

            @twynn I'm not sure about alt. I meant headings. I doubt it would keep the image alt desc. I don't even know if markdown has alt tag feature for images.

            T 1 Reply Last reply
            0
            • chikim@mastodon.socialC chikim@mastodon.social

              @twynn I'm not sure about alt. I meant headings. I doubt it would keep the image alt desc. I don't even know if markdown has alt tag feature for images.

              T This user is from outside of this forum
              T This user is from outside of this forum
              twynn@mas.to
              wrote last edited by
              #6

              @chikim I think HTML output format would, though I also don't know about Markdown. The problem with headings is that I'm not sure if it's getting those from the structure tags or using heuristic's.

              1 Reply Last reply
              0
              • R relay@relay.publicsquare.global shared this topic
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups