Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Is anyone good with #Rstats and #regex ?

Is anyone good with #Rstats and #regex ?

Scheduled Pinned Locked Moved Uncategorized
rstatsregexgeany
11 Posts 6 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

    Is anyone good with #Rstats and #regex ? I'm having issues.

    strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

    I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

    sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

    I get this as a result:

    [1] "50"
    [2] "70"
    [3] NA
    [4] "00"
    [5] "15"
    [6] "10"
    [7] "44"
    [8] "It is a hysterical idling. More vibraton than sound."
    [9] NA

    I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

    Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

    Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

    rastrau@swiss.socialR This user is from outside of this forum
    rastrau@swiss.socialR This user is from outside of this forum
    rastrau@swiss.social
    wrote last edited by
    #2

    @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

    guyjantic@infosec.exchangeG 2 Replies Last reply
    0
    • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

      Is anyone good with #Rstats and #regex ? I'm having issues.

      strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

      I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

      sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

      I get this as a result:

      [1] "50"
      [2] "70"
      [3] NA
      [4] "00"
      [5] "15"
      [6] "10"
      [7] "44"
      [8] "It is a hysterical idling. More vibraton than sound."
      [9] NA

      I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

      Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

      Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

      thatdnaguy@genomic.socialT This user is from outside of this forum
      thatdnaguy@genomic.socialT This user is from outside of this forum
      thatdnaguy@genomic.social
      wrote last edited by
      #3

      @guyjantic is it getting confused by .*<number stuff>.*?

      . Includes numbers.

      Maybe a something like ^[//s ]*(//d{2,5}).*$

      1 Reply Last reply
      0
      • rastrau@swiss.socialR rastrau@swiss.social

        @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

        guyjantic@infosec.exchangeG This user is from outside of this forum
        guyjantic@infosec.exchangeG This user is from outside of this forum
        guyjantic@infosec.exchange
        wrote last edited by
        #4

        @rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

        rastrau@swiss.socialR 1 Reply Last reply
        0
        • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

          Is anyone good with #Rstats and #regex ? I'm having issues.

          strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

          I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

          sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

          I get this as a result:

          [1] "50"
          [2] "70"
          [3] NA
          [4] "00"
          [5] "15"
          [6] "10"
          [7] "44"
          [8] "It is a hysterical idling. More vibraton than sound."
          [9] NA

          I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

          Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

          Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

          jorismeys@mstdn.socialJ This user is from outside of this forum
          jorismeys@mstdn.socialJ This user is from outside of this forum
          jorismeys@mstdn.social
          wrote last edited by
          #5

          @guyjantic
          You need to make that greedy. Might be as easy as

          sub("(^.*?)(\\d{2,5})(.*?$)", "\\2", strings)

          This makes the matches before and after "lazy", meaning they match as few as possible.

          Edit: I didn't test it due to on my phone now.

          1 Reply Last reply
          0
          • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

            @rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

            rastrau@swiss.socialR This user is from outside of this forum
            rastrau@swiss.socialR This user is from outside of this forum
            rastrau@swiss.social
            wrote last edited by
            #6

            @guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

            guyjantic@infosec.exchangeG 1 Reply Last reply
            0
            • rastrau@swiss.socialR rastrau@swiss.social

              @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

              guyjantic@infosec.exchangeG This user is from outside of this forum
              guyjantic@infosec.exchangeG This user is from outside of this forum
              guyjantic@infosec.exchange
              wrote last edited by
              #7

              @rastrau Hey, that works! Thanks a ton!

              rastrau@swiss.socialR 1 Reply Last reply
              0
              • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

                @rastrau Hey, that works! Thanks a ton!

                rastrau@swiss.socialR This user is from outside of this forum
                rastrau@swiss.socialR This user is from outside of this forum
                rastrau@swiss.social
                wrote last edited by
                #8

                @guyjantic 🥳 Yay! You’re most welcome.

                1 Reply Last reply
                0
                • rastrau@swiss.socialR rastrau@swiss.social

                  @guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

                  guyjantic@infosec.exchangeG This user is from outside of this forum
                  guyjantic@infosec.exchangeG This user is from outside of this forum
                  guyjantic@infosec.exchange
                  wrote last edited by
                  #9

                  @rastrau I suspect you're more of a regexpert than I am, and your explanation seems plausible.

                  1 Reply Last reply
                  0
                  • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

                    Is anyone good with #Rstats and #regex ? I'm having issues.

                    strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

                    I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

                    sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

                    I get this as a result:

                    [1] "50"
                    [2] "70"
                    [3] NA
                    [4] "00"
                    [5] "15"
                    [6] "10"
                    [7] "44"
                    [8] "It is a hysterical idling. More vibraton than sound."
                    [9] NA

                    I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

                    Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

                    Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

                    nxskok@cupoftea.socialN This user is from outside of this forum
                    nxskok@cupoftea.socialN This user is from outside of this forum
                    nxskok@cupoftea.social
                    wrote last edited by
                    #10

                    @guyjantic if you are happy with just the first numerical thing, parse_number() works really well.

                    1 Reply Last reply
                    0
                    • guyjantic@infosec.exchangeG guyjantic@infosec.exchange

                      Is anyone good with #Rstats and #regex ? I'm having issues.

                      strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

                      I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

                      sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

                      I get this as a result:

                      [1] "50"
                      [2] "70"
                      [3] NA
                      [4] "00"
                      [5] "15"
                      [6] "10"
                      [7] "44"
                      [8] "It is a hysterical idling. More vibraton than sound."
                      [9] NA

                      I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

                      Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

                      Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

                      jmkinen@mementomori.socialJ This user is from outside of this forum
                      jmkinen@mementomori.socialJ This user is from outside of this forum
                      jmkinen@mementomori.social
                      wrote last edited by
                      #11

                      @guyjantic Is "87" what you want from the fourth string?

                      If so, making it non greedy seems to work:
                      sub(".*?(\\d{2,5}).*", "\\1", strings)

                      1 Reply Last reply
                      1
                      0
                      • R relay@relay.an.exchange shared this topic
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups