Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links!

Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links!

Scheduled Pinned Locked Moved Uncategorized
10 Posts 4 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • chikim@mastodon.socialC This user is from outside of this forum
    chikim@mastodon.socialC This user is from outside of this forum
    chikim@mastodon.social
    wrote last edited by
    #1

    Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links! https://github.com/microsoft/markitdown

    J marvellousmachine@dragonscave.spaceM 2 Replies Last reply
    2
    0
    • chikim@mastodon.socialC chikim@mastodon.social

      Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links! https://github.com/microsoft/markitdown

      J This user is from outside of this forum
      J This user is from outside of this forum
      jcast432@mastodon.social
      wrote last edited by
      #2

      @DavidGoldfield @chikim Oh my goodness! I ask and then, I receive! Just a few minutes ago, I wanted to convert a Word document directly into Markdown. Thank you!

      x0@dragonscave.spaceX 1 Reply Last reply
      0
      • R relay@relay.infosec.exchange shared this topic
      • chikim@mastodon.socialC chikim@mastodon.social

        Just discovered Microsoft has a tool to convert documents (pdf, docx, pttx, xlsx, html, outlook messages...) to markdown as well as transcribe audio and Youtube links! https://github.com/microsoft/markitdown

        marvellousmachine@dragonscave.spaceM This user is from outside of this forum
        marvellousmachine@dragonscave.spaceM This user is from outside of this forum
        marvellousmachine@dragonscave.space
        wrote last edited by
        #3

        @chikim Thanks for the info. @ondrosik This might well be useful for us. I've already forwarded it to Peter.

        x0@dragonscave.spaceX 1 Reply Last reply
        0
        • J jcast432@mastodon.social

          @DavidGoldfield @chikim Oh my goodness! I ask and then, I receive! Just a few minutes ago, I wanted to convert a Word document directly into Markdown. Thank you!

          x0@dragonscave.spaceX This user is from outside of this forum
          x0@dragonscave.spaceX This user is from outside of this forum
          x0@dragonscave.space
          wrote last edited by
          #4

          @jcast432 @DavidGoldfield @chikim FOr something that might be more presentable, try https://pandoc.org/app, or it has a CLI too, pandoc is the known standard tool for document conversion.

          1 Reply Last reply
          0
          • marvellousmachine@dragonscave.spaceM marvellousmachine@dragonscave.space

            @chikim Thanks for the info. @ondrosik This might well be useful for us. I've already forwarded it to Peter.

            x0@dragonscave.spaceX This user is from outside of this forum
            x0@dragonscave.spaceX This user is from outside of this forum
            x0@dragonscave.space
            wrote last edited by
            #5

            @marvellousmachine @chikim @ondrosik Any particular reason pandoc doesn't cut it here? Is it the transcription? Or does pandoc not read some of those formats like PDF?

            marvellousmachine@dragonscave.spaceM chikim@mastodon.socialC 2 Replies Last reply
            0
            • x0@dragonscave.spaceX x0@dragonscave.space

              @marvellousmachine @chikim @ondrosik Any particular reason pandoc doesn't cut it here? Is it the transcription? Or does pandoc not read some of those formats like PDF?

              marvellousmachine@dragonscave.spaceM This user is from outside of this forum
              marvellousmachine@dragonscave.spaceM This user is from outside of this forum
              marvellousmachine@dragonscave.space
              wrote last edited by
              #6

              @x0 @chikim @ondrosik Honestly, I keep forgetting about it. 🙂

              1 Reply Last reply
              0
              • x0@dragonscave.spaceX x0@dragonscave.space

                @marvellousmachine @chikim @ondrosik Any particular reason pandoc doesn't cut it here? Is it the transcription? Or does pandoc not read some of those formats like PDF?

                chikim@mastodon.socialC This user is from outside of this forum
                chikim@mastodon.socialC This user is from outside of this forum
                chikim@mastodon.social
                wrote last edited by
                #7

                @x0 @marvellousmachine @ondrosik Not sure if Pandoc has support for OCR, out look messages, speech transcription, LLM support for MCP server, etc. Total speculation, but I suspect they created specifically to digest all kinds of documents for LLM training.

                chikim@mastodon.socialC x0@dragonscave.spaceX 2 Replies Last reply
                0
                • chikim@mastodon.socialC chikim@mastodon.social

                  @x0 @marvellousmachine @ondrosik Not sure if Pandoc has support for OCR, out look messages, speech transcription, LLM support for MCP server, etc. Total speculation, but I suspect they created specifically to digest all kinds of documents for LLM training.

                  chikim@mastodon.socialC This user is from outside of this forum
                  chikim@mastodon.socialC This user is from outside of this forum
                  chikim@mastodon.social
                  wrote last edited by
                  #8

                  @x0 @marvellousmachine @ondrosik A lot of people also mentioned that docling is better! It might be worth to check out.

                  x0@dragonscave.spaceX 1 Reply Last reply
                  0
                  • chikim@mastodon.socialC chikim@mastodon.social

                    @x0 @marvellousmachine @ondrosik Not sure if Pandoc has support for OCR, out look messages, speech transcription, LLM support for MCP server, etc. Total speculation, but I suspect they created specifically to digest all kinds of documents for LLM training.

                    x0@dragonscave.spaceX This user is from outside of this forum
                    x0@dragonscave.spaceX This user is from outside of this forum
                    x0@dragonscave.space
                    wrote last edited by
                    #9

                    @chikim @marvellousmachine @ondrosik It says in the README actually, MS did create it for training AI which speaks markdown, and yeah those features are definitely not part of pandoc.

                    1 Reply Last reply
                    0
                    • chikim@mastodon.socialC chikim@mastodon.social

                      @x0 @marvellousmachine @ondrosik A lot of people also mentioned that docling is better! It might be worth to check out.

                      x0@dragonscave.spaceX This user is from outside of this forum
                      x0@dragonscave.spaceX This user is from outside of this forum
                      x0@dragonscave.space
                      wrote last edited by
                      #10

                      @chikim @marvellousmachine @ondrosik Now what I'd lvoe to see is something that can take, say, a PDF mathematical paper and convert it into markdown and LaTeX. They exist, marker-pdf and nougat-ocr, but IDK how workable they are with structured data like tables, and haven't paire dthem with cloud AI jusst yet becuase haven't got the tokens and usage is a bit difficult.

                      1 Reply Last reply
                      1
                      0
                      • pixelate@tweesecake.socialP pixelate@tweesecake.social shared this topic
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups