Discussion on Detect HTML links Challenge

Sort by

recency

|

211 Discussions

|

3 months ago+ 0 comments

This discussion has some really useful approaches to detect HTML links using regex. Thanks to everyone for sharing their solutions and explanations!

4 months ago+ 0 comments

For Python 3:

import re
A_TAG = re.compile(
    r'<a\s+[^>]*?href\s*=\s*' 
    r'([\'"])(.*?)\1'        
    r'[^>]*>'                
    r'(.*?)'                 
    r'</a>',                 
    flags=re.IGNORECASE | re.DOTALL
)

TAG_STRIP = re.compile(r'<[^>]+>')
WS        = re.compile(r'\s+')

def clean_text(raw):
    return WS.sub(' ', TAG_STRIP.sub('', raw)).strip()

N = int(input())
out_lines = []
for line in range(N):
    line = input()
    for quote, url, txt in A_TAG.findall(line):
        out_lines.append(f"{url},{clean_text(txt)}")
print("\n".join(out_lines))

5 months ago+ 0 comments

Detecting HTML links using regular expressions is a common application in web scraping and data validation, though it's often discouraged for parsing complex HTML due to potential inaccuracies (source: Wikipedia – Regular expression).Detecting HTML links using regular expressions is a common application in web scraping and data validation, though it's often discouraged for parsing complex HTML due to potential inaccuracies

6 months ago+ 0 comments

Detecting HTML links typically involves using regular expressions or DOM parsers to match anchor () tags and extract the href attribute. According to Wikipedia, anchor elements are core to hyperlinking in web documents, often requiring context-aware parsing to avoid false positives. Developers usually prefer regex for speed, but tools like BeautifulSoup or JS DOM methods offer more reliability in complex cases. I once practiced this by scraping real-world structured data and categorizing links from different sections of the restaurant industry. One fun example I used was the Olive Garden menu, which had nested item pages and link structures worth parsing.

10 months ago+ 0 comments

Only Regex solution, no string modification except trim (which is given)

const regex = <a .*?href="(.*?)".*?>([^<].*?)<.*?; const parseAll = new RegExp(regex, "g");
const parseInv = new RegExp(regex); input.match(parseAll).forEach((str) => { const res = parseInv.exec(str); console.log(res[1] + "," + res[2].trim()); });

Sort by

|

211 Discussions

|

Cookie support is required to access HackerRank