Is anyone good with #Rstats and #regex ?

rastrau@swiss.social

@guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

thatdnaguy@genomic.social

@guyjantic is it getting confused by .*<number stuff>.*?

. Includes numbers.

Maybe a something like ^[//s ]*(//d{2,5}).*$

guyjantic@infosec.exchange

@rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

jorismeys@mstdn.social

@guyjantic
You need to make that greedy. Might be as easy as

sub("(^.*?)(\\d{2,5})(.*?$)", "\\2", strings)

This makes the matches before and after "lazy", meaning they match as few as possible.

Edit: I didn't test it due to on my phone now.

rastrau@swiss.social

@guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

guyjantic@infosec.exchange

@rastrau Hey, that works! Thanks a ton!

rastrau@swiss.social

@guyjantic 🥳 Yay! You’re most welcome.

guyjantic@infosec.exchange

@rastrau I suspect you're more of a regexpert than I am, and your explanation seems plausible.

nxskok@cupoftea.social

@guyjantic if you are happy with just the first numerical thing, parse_number() works really well.

jmkinen@mementomori.social

@guyjantic Is "87" what you want from the fourth string?

If so, making it non greedy seems to work:
sub(".*?(\\d{2,5}).*", "\\1", strings)

CIRCLE WITH A DOT

Is anyone good with #Rstats and #regex ?