Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Is anyone good with #Rstats and #regex ?

Is anyone good with #Rstats and #regex ?

Scheduled Pinned Locked Moved Uncategorized
rstatsregexgeany
11 Posts 6 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • guyjantic@infosec.exchangeG This user is from outside of this forum
    guyjantic@infosec.exchangeG This user is from outside of this forum
    guyjantic@infosec.exchange
    wrote last edited by
    #1

    Is anyone good with #Rstats and #regex ? I'm having issues.

    strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

    I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

    sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

    I get this as a result:

    [1] "50"
    [2] "70"
    [3] NA
    [4] "00"
    [5] "15"
    [6] "10"
    [7] "44"
    [8] "It is a hysterical idling. More vibraton than sound."
    [9] NA

    I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

    Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

    Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

    rastrau@swiss.socialR thatdnaguy@genomic.socialT jorismeys@mstdn.socialJ nxskok@cupoftea.socialN jmkinen@mementomori.socialJ 5 Replies Last reply
    0
    • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

      Is anyone good with #Rstats and #regex ? I'm having issues.

      strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

      I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

      sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

      I get this as a result:

      [1] "50"
      [2] "70"
      [3] NA
      [4] "00"
      [5] "15"
      [6] "10"
      [7] "44"
      [8] "It is a hysterical idling. More vibraton than sound."
      [9] NA

      I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

      Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

      Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

      rastrau@swiss.socialR This user is from outside of this forum
      rastrau@swiss.socialR This user is from outside of this forum
      rastrau@swiss.social
      wrote last edited by
      #2

      @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

      guyjantic@infosec.exchangeG 2 Replies Last reply
      0
      • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

        Is anyone good with #Rstats and #regex ? I'm having issues.

        strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

        I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

        sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

        I get this as a result:

        [1] "50"
        [2] "70"
        [3] NA
        [4] "00"
        [5] "15"
        [6] "10"
        [7] "44"
        [8] "It is a hysterical idling. More vibraton than sound."
        [9] NA

        I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

        Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

        Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

        thatdnaguy@genomic.socialT This user is from outside of this forum
        thatdnaguy@genomic.socialT This user is from outside of this forum
        thatdnaguy@genomic.social
        wrote last edited by
        #3

        @guyjantic is it getting confused by .*<number stuff>.*?

        . Includes numbers.

        Maybe a something like ^[//s ]*(//d{2,5}).*$

        1 Reply Last reply
        0
        • rastrau@swiss.socialR rastrau@swiss.social

          @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

          guyjantic@infosec.exchangeG This user is from outside of this forum
          guyjantic@infosec.exchangeG This user is from outside of this forum
          guyjantic@infosec.exchange
          wrote last edited by
          #4

          @rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

          rastrau@swiss.socialR 1 Reply Last reply
          0
          • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

            Is anyone good with #Rstats and #regex ? I'm having issues.

            strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

            I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

            sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

            I get this as a result:

            [1] "50"
            [2] "70"
            [3] NA
            [4] "00"
            [5] "15"
            [6] "10"
            [7] "44"
            [8] "It is a hysterical idling. More vibraton than sound."
            [9] NA

            I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

            Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

            Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

            jorismeys@mstdn.socialJ This user is from outside of this forum
            jorismeys@mstdn.socialJ This user is from outside of this forum
            jorismeys@mstdn.social
            wrote last edited by
            #5

            @guyjantic
            You need to make that greedy. Might be as easy as

            sub("(^.*?)(\\d{2,5})(.*?$)", "\\2", strings)

            This makes the matches before and after "lazy", meaning they match as few as possible.

            Edit: I didn't test it due to on my phone now.

            1 Reply Last reply
            0
            • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

              @rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

              rastrau@swiss.socialR This user is from outside of this forum
              rastrau@swiss.socialR This user is from outside of this forum
              rastrau@swiss.social
              wrote last edited by
              #6

              @guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

              guyjantic@infosec.exchangeG 1 Reply Last reply
              0
              • rastrau@swiss.socialR rastrau@swiss.social

                @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

                guyjantic@infosec.exchangeG This user is from outside of this forum
                guyjantic@infosec.exchangeG This user is from outside of this forum
                guyjantic@infosec.exchange
                wrote last edited by
                #7

                @rastrau Hey, that works! Thanks a ton!

                rastrau@swiss.socialR 1 Reply Last reply
                0
                • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

                  @rastrau Hey, that works! Thanks a ton!

                  rastrau@swiss.socialR This user is from outside of this forum
                  rastrau@swiss.socialR This user is from outside of this forum
                  rastrau@swiss.social
                  wrote last edited by
                  #8

                  @guyjantic 🥳 Yay! You’re most welcome.

                  1 Reply Last reply
                  0
                  • rastrau@swiss.socialR rastrau@swiss.social

                    @guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

                    guyjantic@infosec.exchangeG This user is from outside of this forum
                    guyjantic@infosec.exchangeG This user is from outside of this forum
                    guyjantic@infosec.exchange
                    wrote last edited by
                    #9

                    @rastrau I suspect you're more of a regexpert than I am, and your explanation seems plausible.

                    1 Reply Last reply
                    0
                    • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

                      Is anyone good with #Rstats and #regex ? I'm having issues.

                      strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

                      I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

                      sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

                      I get this as a result:

                      [1] "50"
                      [2] "70"
                      [3] NA
                      [4] "00"
                      [5] "15"
                      [6] "10"
                      [7] "44"
                      [8] "It is a hysterical idling. More vibraton than sound."
                      [9] NA

                      I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

                      Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

                      Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

                      nxskok@cupoftea.socialN This user is from outside of this forum
                      nxskok@cupoftea.socialN This user is from outside of this forum
                      nxskok@cupoftea.social
                      wrote last edited by
                      #10

                      @guyjantic if you are happy with just the first numerical thing, parse_number() works really well.

                      1 Reply Last reply
                      0
                      • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

                        Is anyone good with #Rstats and #regex ? I'm having issues.

                        strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

                        I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

                        sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

                        I get this as a result:

                        [1] "50"
                        [2] "70"
                        [3] NA
                        [4] "00"
                        [5] "15"
                        [6] "10"
                        [7] "44"
                        [8] "It is a hysterical idling. More vibraton than sound."
                        [9] NA

                        I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

                        Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

                        Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

                        jmkinen@mementomori.socialJ This user is from outside of this forum
                        jmkinen@mementomori.socialJ This user is from outside of this forum
                        jmkinen@mementomori.social
                        wrote last edited by
                        #11

                        @guyjantic Is "87" what you want from the fourth string?

                        If so, making it non greedy seems to work:
                        sub(".*?(\\d{2,5}).*", "\\1", strings)

                        1 Reply Last reply
                        1
                        0
                        • R relay@relay.an.exchange shared this topic
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups