Replacing characters Remove HTML tags

For a given string you should remove all HTML tags from it. An HTML tag starts with the symbol "<" and ends with the symbol ">".

You should output the string without HTML tags.

Sample Input 1:

<h1>Simple header</h1>

Sample Output 1:

Simple header

Sample Input 2:

<h2>Header with <b>bold</b> text</h2>

Sample Output 2:

Header with bold text
import java.util.Scanner;

public class Main {

    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        String stringWithTags = scanner.nextLine();

        System.out.println(stringWithTags.replaceAll("<.*?>", ""));
    }
}

.*? is lazy quantifier

.* is greedy quantifier

lazy means it's matches as few times as possible, expanding as needed

greedy means it's matches as many times as possible, giving back as needed

In this case if you use "<.>" for a string like

blabla

it matches the entire string while "<.?>" matches a single tag only

posted @ 2020-08-22 12:25  longlong6296  阅读(91)  评论(0)    收藏  举报