请写一个正则获取html源码中所有img标签的src地址
function getImgSrcs(htmlString) {
const regex = /<img[^>]+src="([^">]+)"/g;
let match;
const srcs = [];
while ((match = regex.exec(htmlString)) !== null) {
srcs.push(match[1]);
}
return srcs;
}
// Example usage:
const html = `
<div>
<img src="image1.jpg" alt="Image 1">
<img src='image2.png' alt="Image 2">
<img src = "image3.gif" alt="Image 3">
<img src= "image4.webp" alt="Image 4">
<img data-src="data-image.svg"> </div>
<p>Some text here</p>
<img src="image5.jpg" alt="Image 5">
</div>`;
const imageSrcs = getImgSrcs(html);
console.log(imageSrcs); // Output: ['image1.jpg', 'image2.png', 'image3.gif', 'image4.webp', 'image5.jpg']
`;
Explanation and Improvements:
-
/<img[^>]+src="([^">]+)"/g
: This is the core regular expression. Let's break it down:<img
: Matches the literal string<img
.[^>]+
: Matches one or more characters that are not a closing angle bracket (>
). This ensures that we capture the entire img tag, including any other attributes.src="
: Matches the literal stringsrc="
.([^">]+)
: This is the capturing group. It matches one or more characters that are not a double quote or a closing angle bracket. This is the part that extracts the actualsrc
value./g
: Theg
flag makes the regex global, so it finds all matches in the string, not just the first one.
-
match[1]
: Thematch
object returned byregex.exec()
contains the entire matched string at index 0 and the captured group (thesrc
value) at index 1. -
Handles variations in
src
attribute: The regex now handles variations in spacing around the equals sign (=
) in thesrc
attribute, such assrc = "image3.gif"
orsrc= "image4.webp"
. This makes it more robust. -
Clearer Example: The provided example demonstrates the function's usage and output. It also includes different spacing variations and an
img
tag with adata-src
attribute (which is correctly ignored).
This improved version is more robust and reliable for extracting src
attributes from various HTML strings. It's also easier to understand and maintain. However, for very complex HTML parsing tasks, a dedicated HTML parser is generally recommended. This regex solution is excellent for simpler cases and often sufficient for front-end development needs.