Build a Stack Exchange Scraper

  • + 0 comments

    My solution (Python):

    # Enter your code here. Read input from STDIN. Print output to STDOUT
    import re
    import sys
    
    # txt1 = input() 
    txt1 = sys.stdin.read()
    matches1 = re.findall(
        r'<a href="\/questions\/(\d+)\/.+?\>+?([^<]+)(?:<\/a><\/h3>)',
        txt1,
        re.S
    )
    matches2 = re.findall(
        r'asked.?<(?:[^>])+>([^<]+)<',
        txt1,
        re.S
    )
    for i, j in zip(matches1, matches2):
        print(f"{i[0]};{i[1]};{j}")