We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
Detect the Domain Name
Detect the Domain Name
Sort by
recency
|
169 Discussions
|
Please Login in order to post a comment
This is a solid approach for extracting domain names from HTML content. I like how it uses regex to find URLs, cleans them by removing http(s) and www prefixes, and stores them in a set to avoid duplicates. Sorting at the end makes the output neat and easy to read.
import re
n=int(input()) htmls="\n".join(input() for _ in range(n))
urls = re.findall(r"https?://[^\s\"'>?]+.[a-z]{2,3}", htmls) domains=set() for url in urls: domain=re.sub(r"^https?://?","",url) domain=domain.split("/")[0] if "." in domain: domain=re.sub(r"^(www\d*.)","",domain) domains.add(domain) print(";".join(sorted(domains)))
You can also try testing this on a real website like https://estateagentsilford.co.uk/ to see how it handles actual live URLs, including subdomains and different top-level domains. It’s a practical way to check your regex and parsing logic beyond sample inputs.
Detecting a domain name typically involves parsing the URL string to extract the domain using regex or built-in libraries in languages like Python or JavaScript. According to Wikipedia, a domain name identifies a realm of administrative autonomy within the Internet. In coding challenges, tools like urlparse or regex can simplify this process efficiently. I was practicing with real-world data like food service websites—checking out the Dairy queen menu helped simulate parsing dynamic URLs. It’s a fun way to blend real examples into coding logic while sharpening string manipulation skills.
import re import sys
html= sys.stdin.read()
pattern= r'https?://(?:www.|ww2.)?([a-zA-Z0-9-]+.?[a-zA-Z0-9-]+.[a-zA-Z].?[a-zA-Z])'
matches=re.findall(pattern,html)
output=sorted(set(url for url in matches))
print(';'.join(output))