We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
This is a solid approach for extracting domain names from HTML content. I like how it uses regex to find URLs, cleans them by removing http(s) and www prefixes, and stores them in a set to avoid duplicates. Sorting at the end makes the output neat and easy to read.
import re
n=int(input())
htmls="\n".join(input() for _ in range(n))
urls = re.findall(r"https?://[^\s\"'>?]+.[a-z]{2,3}", htmls)
domains=set()
for url in urls:
domain=re.sub(r"^https?://?","",url)
domain=domain.split("/")[0]
if "." in domain:
domain=re.sub(r"^(www\d*.)","",domain)
domains.add(domain)
print(";".join(sorted(domains)))
You can also try testing this on a real website like https://estateagentsilford.co.uk/
to see how it handles actual live URLs, including subdomains and different top-level domains. It’s a practical way to check your regex and parsing logic beyond sample inputs.
Cookie support is required to access HackerRank
Seems like cookies are disabled on this browser, please enable them to open this website
Detect the Domain Name
You are viewing a single comment's thread. Return to all comments →
This is a solid approach for extracting domain names from HTML content. I like how it uses regex to find URLs, cleans them by removing http(s) and www prefixes, and stores them in a set to avoid duplicates. Sorting at the end makes the output neat and easy to read.
import re
n=int(input()) htmls="\n".join(input() for _ in range(n))
urls = re.findall(r"https?://[^\s\"'>?]+.[a-z]{2,3}", htmls) domains=set() for url in urls: domain=re.sub(r"^https?://?","",url) domain=domain.split("/")[0] if "." in domain: domain=re.sub(r"^(www\d*.)","",domain) domains.add(domain) print(";".join(sorted(domains)))
You can also try testing this on a real website like https://estateagentsilford.co.uk/ to see how it handles actual live URLs, including subdomains and different top-level domains. It’s a practical way to check your regex and parsing logic beyond sample inputs.