1

I`m trying to extract the src URL/path without the quotes, only in the case it is an image:

  1. src="/path/image.png" // should capture => /path/image.png
  2. src="/path/image.bmp" // should capture => /path/image.bmp
  3. src="/path/image.jpg" // should capture => /path/image.jpg
  4. src="https://www.site1.com" // should NOT capture

So far I have /src="(.*)"/g, but that obviously captures both, I have been looking at look behind and look ahead but just can`t put it together.

3
  • 1
    This seems like a job for an HTML parser combined with an HTTP client library that can make HEAD requests to URLs to see what Content-Type they have. Trying to do this with regex feels very fragile. Commented Nov 15, 2022 at 15:48
  • @Quentin: If they require login, you might have a problem - but OP might too. Commented Nov 15, 2022 at 15:50
  • Actually src="https://www.site1.com" might return an image if you request that link in browser. Commented Nov 15, 2022 at 15:54

4 Answers 4

4

You can use a capture group, and you should prevent crossing the " using a negated character class.

If you want to match either href or src

\b(?:href|src)="([^\s"]*\.(?:png|jpg|bmp))"

Explanation

  • \b A word boundary to prevent a partial word match
  • (?:href|src)=" match either href= or src=
  • ( Capture group 1
    • [^\s"]* Match optional chars other than a whitespace char or "
    • \.(?:png|jpg|bmp) Match one of .png .jpg .bmp
  • ) Close group 1
  • " Match literally

Regex demo

const regex = /\b(?:href|src)="([^\s"]*\.(?:png|jpg|bmp))"/;
[
  'src="/path/image.png" test "',
  'src="/path/image.bmp"',
  'src="/path/image.jpg"',
  'src="https://www.site1.com"',
  'href="image.png"'
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[1]);
  }
})

Sign up to request clarification or add additional context in comments.

1 Comment

OP: For additional image extensions, consider the list of mime types that start with image/ in this list: developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/… The biggest missing ones are .jpeg (as opposed to .jpg) and .gif
2

Try /src="(.*(?:jpg|bmp|png))"/g

You'll need to enter in the list of extensions you consider valid images

6 Comments

This doesn't cover all the image formats that exist in the Universe and only works if the image format is jpg, bmp or png.
This works too, but I found another image under href="image.png", now I have to modify REGEX ... like I was saying in the previous comment, I was also trying to start from .png|.bmp|.jps and go back to the first quote and capture it that way? This would capture it no matter if it's src=, href= ...
you could capture that with [src|href] where src currently is
I think the latest answer by The Fourth Bird is probably the best answer now.
@Eterm The character class [jpg|bmp|png] matches the same characters as [bgjmnp|], i.e. you cannot have string alternatives inside a class, it's only going to match the unique characters inside it. You probably meant to write something like (?:jpg|bmp|png) which is an example of actual alternation.
|
1

If you want it to be a bit more fool proof you can use look behinds and look aheads. Expand the extension list png|bmp|jpg to test for more extensions.

/(?<=src=").*(png|bmp|jpg)(?=")/g

regex101

1 Comment

This works, but I was also trying to start from .png|.bmp|.jps and go back to the first quote and capture it that way? Because I found another image that is in href="image.png", so I can cover all cases no matter what is in front, href, src ...? If you understand me...
-1

Try this src="(.*image.*)"

2 Comments

This only works if they're literally all named "image" which I doubt was OP's intention, otherwise there'd be little need to capture them at all
Well I am answering the question he asked, not what you or I doubt or interpret as his intention. Besides, it's meant to illustrate what he needs to alter in order to select whatever it is he intends to select. It can be done using image format, or image name, or image-whatever-the-hell he wants, but it is up to him to follow the principle.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.