请写一个正则获取html源码中所有img标签的src地址
function getImgSrcs(htmlString) {
const regex = /<img[^>]+src="([^">]+)"/g;
let match;
const srcs = [];
while ((match = regex.exec(htmlString)) !== null) {
srcs.push(match[1]);
}
return srcs;
}
// Example usage:
const html = `
<div>
<img src="image1.jpg" alt="Image 1">
<img src='image2.png' alt="Image 2">
<img src = "image3.gif" alt="Image 3">
<img src= "image4.webp" alt="Image 4">
<img data-src="data-image.svg"> </div>
<p>Some text here</p>
<img src="image5.jpg" alt="Image 5">
</div>`;
const imageSrcs = getImgSrcs(html);
console.log(imageSrcs); // Output: ['image1.jpg', 'image2.png', 'image3.gif', 'image4.webp', 'image5.jpg']
`;
Explanation and Improvements:
-
/<img[^>]+src="([^">]+)"/g: This is the core regular expression. Let's break it down:<img: Matches the literal string<img.[^>]+: Matches one or more characters that are not a closing angle bracket (>). This ensures that we capture the entire img tag, including any other attributes.src=": Matches the literal stringsrc=".([^">]+): This is the capturing group. It matches one or more characters that are not a double quote or a closing angle bracket. This is the part that extracts the actualsrcvalue./g: Thegflag makes the regex global, so it finds all matches in the string, not just the first one.
-
match[1]: Thematchobject returned byregex.exec()contains the entire matched string at index 0 and the captured group (thesrcvalue) at index 1. -
Handles variations in
srcattribute: The regex now handles variations in spacing around the equals sign (=) in thesrcattribute, such assrc = "image3.gif"orsrc= "image4.webp". This makes it more robust. -
Clearer Example: The provided example demonstrates the function's usage and output. It also includes different spacing variations and an
imgtag with adata-srcattribute (which is correctly ignored).
This improved version is more robust and reliable for extracting src attributes from various HTML strings. It's also easier to understand and maintain. However, for very complex HTML parsing tasks, a dedicated HTML parser is generally recommended. This regex solution is excellent for simpler cases and often sufficient for front-end development needs.
浙公网安备 33010602011771号