Pythonic data cleaning of address ranges





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I need to know if the following task is possible via Python.



I have a list of 4,300 addresses, each that covers a range of addresses, like this:




  1. 8519-21 S Broadway

  2. 8700-6 Riverview Blvd

  3. 822-4 Angelica St

  4. 8442-6 N Broadway


I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.



In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?










share|improve this question























  • How is your data stored?

    – 0liveradam8
    Nov 25 '18 at 3:17











  • It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

    – jayohday
    Nov 25 '18 at 3:21


















0















I need to know if the following task is possible via Python.



I have a list of 4,300 addresses, each that covers a range of addresses, like this:




  1. 8519-21 S Broadway

  2. 8700-6 Riverview Blvd

  3. 822-4 Angelica St

  4. 8442-6 N Broadway


I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.



In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?










share|improve this question























  • How is your data stored?

    – 0liveradam8
    Nov 25 '18 at 3:17











  • It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

    – jayohday
    Nov 25 '18 at 3:21














0












0








0








I need to know if the following task is possible via Python.



I have a list of 4,300 addresses, each that covers a range of addresses, like this:




  1. 8519-21 S Broadway

  2. 8700-6 Riverview Blvd

  3. 822-4 Angelica St

  4. 8442-6 N Broadway


I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.



In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?










share|improve this question














I need to know if the following task is possible via Python.



I have a list of 4,300 addresses, each that covers a range of addresses, like this:




  1. 8519-21 S Broadway

  2. 8700-6 Riverview Blvd

  3. 822-4 Angelica St

  4. 8442-6 N Broadway


I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.



In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?







python data-cleaning






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 25 '18 at 3:13









jayohdayjayohday

309




309













  • How is your data stored?

    – 0liveradam8
    Nov 25 '18 at 3:17











  • It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

    – jayohday
    Nov 25 '18 at 3:21



















  • How is your data stored?

    – 0liveradam8
    Nov 25 '18 at 3:17











  • It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

    – jayohday
    Nov 25 '18 at 3:21

















How is your data stored?

– 0liveradam8
Nov 25 '18 at 3:17





How is your data stored?

– 0liveradam8
Nov 25 '18 at 3:17













It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

– jayohday
Nov 25 '18 at 3:21





It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

– jayohday
Nov 25 '18 at 3:21












3 Answers
3






active

oldest

votes


















0














Try this code:



addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
output=
for i in addresses:
output.append(i[0:i.find('-')]+i[i.find(' '):])
numEnd=i[i.find('-')+1:i.find(' ')]
numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
output.append(numStart)
print(output)


Where addresses is a list of the address ranges in the format you've given, and output is the data you want.






share|improve this answer































    0














    Let's say your input .csv file looks like this:



    Address
    8519-21 S Broadway
    8700-6 Riverview Blvd
    822-4 Angelica St
    8442-6 N Broadway


    and you want it to look something like this:



    First Address,Last Address
    8519 S Broadway,8521 S Broadway
    8700 Riverview Blvd,8706 Riverview Blvd
    822 Angelica St,824 Angelica St
    8442 N Broadway,8446 N Broadway


    You can accomplish this by using the csv module:



    from csv import writer

    # open both input and output csv
    with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
    csv_writer = writer(csv_out)

    # Skip 'Address' header
    next(csv_in)

    # Write new headers
    csv_writer.writerow(["First Address", "Last Address"])

    # Go through each line in csv
    for line in csv_in:

    # Split ranges from rest of line
    ranges, *rest = line.split()

    # Split ranges themselves
    start, end = ranges.split("-")

    # Get difference between length of numbers
    diff = len(start) - len(end)

    # Create new end address
    end = start[:diff] + end

    # Write new lines to output csv
    csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])


    Which will output your results into output.csv.






    share|improve this answer

































      0














      Thanks for the help and suggestions, all. This is how I ended up tackling it:



      import re
      import csv
      from csv import DictReader

      dash_add =

      def get_address_list(file):
      with open(file) as f:
      reader = csv.DictReader(f)
      for row in reader:
      adds = row['Siteaddr']
      if "-" in adds:
      dash_add.append(adds)

      z =

      def get_formatted_address(address):
      numeric = address.split(" ")[0]
      text = address.replace(numeric, "").strip()
      start = numeric.split("-")[0]
      end = numeric.split("-")[1]
      end = start[:-len(end)] + end
      return [start + " " + text, end + " " + text]

      get_address_list('./data/map_data_112318.csv')
      [z.extend(get_formatted_address(i)) for i in dash_add]

      with open('formatted_addresses.csv', 'w') as csvfile:
      writer = csv.writer(csvfile)
      writer.writerow(['Address'])
      for v in z:
      writer.writerows([[v]])





      share|improve this answer
























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53464328%2fpythonic-data-cleaning-of-address-ranges%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        0














        Try this code:



        addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
        output=
        for i in addresses:
        output.append(i[0:i.find('-')]+i[i.find(' '):])
        numEnd=i[i.find('-')+1:i.find(' ')]
        numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
        output.append(numStart)
        print(output)


        Where addresses is a list of the address ranges in the format you've given, and output is the data you want.






        share|improve this answer




























          0














          Try this code:



          addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
          output=
          for i in addresses:
          output.append(i[0:i.find('-')]+i[i.find(' '):])
          numEnd=i[i.find('-')+1:i.find(' ')]
          numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
          output.append(numStart)
          print(output)


          Where addresses is a list of the address ranges in the format you've given, and output is the data you want.






          share|improve this answer


























            0












            0








            0







            Try this code:



            addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
            output=
            for i in addresses:
            output.append(i[0:i.find('-')]+i[i.find(' '):])
            numEnd=i[i.find('-')+1:i.find(' ')]
            numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
            output.append(numStart)
            print(output)


            Where addresses is a list of the address ranges in the format you've given, and output is the data you want.






            share|improve this answer













            Try this code:



            addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
            output=
            for i in addresses:
            output.append(i[0:i.find('-')]+i[i.find(' '):])
            numEnd=i[i.find('-')+1:i.find(' ')]
            numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
            output.append(numStart)
            print(output)


            Where addresses is a list of the address ranges in the format you've given, and output is the data you want.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 25 '18 at 3:28









            0liveradam80liveradam8

            606115




            606115

























                0














                Let's say your input .csv file looks like this:



                Address
                8519-21 S Broadway
                8700-6 Riverview Blvd
                822-4 Angelica St
                8442-6 N Broadway


                and you want it to look something like this:



                First Address,Last Address
                8519 S Broadway,8521 S Broadway
                8700 Riverview Blvd,8706 Riverview Blvd
                822 Angelica St,824 Angelica St
                8442 N Broadway,8446 N Broadway


                You can accomplish this by using the csv module:



                from csv import writer

                # open both input and output csv
                with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
                csv_writer = writer(csv_out)

                # Skip 'Address' header
                next(csv_in)

                # Write new headers
                csv_writer.writerow(["First Address", "Last Address"])

                # Go through each line in csv
                for line in csv_in:

                # Split ranges from rest of line
                ranges, *rest = line.split()

                # Split ranges themselves
                start, end = ranges.split("-")

                # Get difference between length of numbers
                diff = len(start) - len(end)

                # Create new end address
                end = start[:diff] + end

                # Write new lines to output csv
                csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])


                Which will output your results into output.csv.






                share|improve this answer






























                  0














                  Let's say your input .csv file looks like this:



                  Address
                  8519-21 S Broadway
                  8700-6 Riverview Blvd
                  822-4 Angelica St
                  8442-6 N Broadway


                  and you want it to look something like this:



                  First Address,Last Address
                  8519 S Broadway,8521 S Broadway
                  8700 Riverview Blvd,8706 Riverview Blvd
                  822 Angelica St,824 Angelica St
                  8442 N Broadway,8446 N Broadway


                  You can accomplish this by using the csv module:



                  from csv import writer

                  # open both input and output csv
                  with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
                  csv_writer = writer(csv_out)

                  # Skip 'Address' header
                  next(csv_in)

                  # Write new headers
                  csv_writer.writerow(["First Address", "Last Address"])

                  # Go through each line in csv
                  for line in csv_in:

                  # Split ranges from rest of line
                  ranges, *rest = line.split()

                  # Split ranges themselves
                  start, end = ranges.split("-")

                  # Get difference between length of numbers
                  diff = len(start) - len(end)

                  # Create new end address
                  end = start[:diff] + end

                  # Write new lines to output csv
                  csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])


                  Which will output your results into output.csv.






                  share|improve this answer




























                    0












                    0








                    0







                    Let's say your input .csv file looks like this:



                    Address
                    8519-21 S Broadway
                    8700-6 Riverview Blvd
                    822-4 Angelica St
                    8442-6 N Broadway


                    and you want it to look something like this:



                    First Address,Last Address
                    8519 S Broadway,8521 S Broadway
                    8700 Riverview Blvd,8706 Riverview Blvd
                    822 Angelica St,824 Angelica St
                    8442 N Broadway,8446 N Broadway


                    You can accomplish this by using the csv module:



                    from csv import writer

                    # open both input and output csv
                    with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
                    csv_writer = writer(csv_out)

                    # Skip 'Address' header
                    next(csv_in)

                    # Write new headers
                    csv_writer.writerow(["First Address", "Last Address"])

                    # Go through each line in csv
                    for line in csv_in:

                    # Split ranges from rest of line
                    ranges, *rest = line.split()

                    # Split ranges themselves
                    start, end = ranges.split("-")

                    # Get difference between length of numbers
                    diff = len(start) - len(end)

                    # Create new end address
                    end = start[:diff] + end

                    # Write new lines to output csv
                    csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])


                    Which will output your results into output.csv.






                    share|improve this answer















                    Let's say your input .csv file looks like this:



                    Address
                    8519-21 S Broadway
                    8700-6 Riverview Blvd
                    822-4 Angelica St
                    8442-6 N Broadway


                    and you want it to look something like this:



                    First Address,Last Address
                    8519 S Broadway,8521 S Broadway
                    8700 Riverview Blvd,8706 Riverview Blvd
                    822 Angelica St,824 Angelica St
                    8442 N Broadway,8446 N Broadway


                    You can accomplish this by using the csv module:



                    from csv import writer

                    # open both input and output csv
                    with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
                    csv_writer = writer(csv_out)

                    # Skip 'Address' header
                    next(csv_in)

                    # Write new headers
                    csv_writer.writerow(["First Address", "Last Address"])

                    # Go through each line in csv
                    for line in csv_in:

                    # Split ranges from rest of line
                    ranges, *rest = line.split()

                    # Split ranges themselves
                    start, end = ranges.split("-")

                    # Get difference between length of numbers
                    diff = len(start) - len(end)

                    # Create new end address
                    end = start[:diff] + end

                    # Write new lines to output csv
                    csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])


                    Which will output your results into output.csv.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 25 '18 at 4:39

























                    answered Nov 25 '18 at 4:33









                    RoadRunnerRoadRunner

                    11.5k31441




                    11.5k31441























                        0














                        Thanks for the help and suggestions, all. This is how I ended up tackling it:



                        import re
                        import csv
                        from csv import DictReader

                        dash_add =

                        def get_address_list(file):
                        with open(file) as f:
                        reader = csv.DictReader(f)
                        for row in reader:
                        adds = row['Siteaddr']
                        if "-" in adds:
                        dash_add.append(adds)

                        z =

                        def get_formatted_address(address):
                        numeric = address.split(" ")[0]
                        text = address.replace(numeric, "").strip()
                        start = numeric.split("-")[0]
                        end = numeric.split("-")[1]
                        end = start[:-len(end)] + end
                        return [start + " " + text, end + " " + text]

                        get_address_list('./data/map_data_112318.csv')
                        [z.extend(get_formatted_address(i)) for i in dash_add]

                        with open('formatted_addresses.csv', 'w') as csvfile:
                        writer = csv.writer(csvfile)
                        writer.writerow(['Address'])
                        for v in z:
                        writer.writerows([[v]])





                        share|improve this answer




























                          0














                          Thanks for the help and suggestions, all. This is how I ended up tackling it:



                          import re
                          import csv
                          from csv import DictReader

                          dash_add =

                          def get_address_list(file):
                          with open(file) as f:
                          reader = csv.DictReader(f)
                          for row in reader:
                          adds = row['Siteaddr']
                          if "-" in adds:
                          dash_add.append(adds)

                          z =

                          def get_formatted_address(address):
                          numeric = address.split(" ")[0]
                          text = address.replace(numeric, "").strip()
                          start = numeric.split("-")[0]
                          end = numeric.split("-")[1]
                          end = start[:-len(end)] + end
                          return [start + " " + text, end + " " + text]

                          get_address_list('./data/map_data_112318.csv')
                          [z.extend(get_formatted_address(i)) for i in dash_add]

                          with open('formatted_addresses.csv', 'w') as csvfile:
                          writer = csv.writer(csvfile)
                          writer.writerow(['Address'])
                          for v in z:
                          writer.writerows([[v]])





                          share|improve this answer


























                            0












                            0








                            0







                            Thanks for the help and suggestions, all. This is how I ended up tackling it:



                            import re
                            import csv
                            from csv import DictReader

                            dash_add =

                            def get_address_list(file):
                            with open(file) as f:
                            reader = csv.DictReader(f)
                            for row in reader:
                            adds = row['Siteaddr']
                            if "-" in adds:
                            dash_add.append(adds)

                            z =

                            def get_formatted_address(address):
                            numeric = address.split(" ")[0]
                            text = address.replace(numeric, "").strip()
                            start = numeric.split("-")[0]
                            end = numeric.split("-")[1]
                            end = start[:-len(end)] + end
                            return [start + " " + text, end + " " + text]

                            get_address_list('./data/map_data_112318.csv')
                            [z.extend(get_formatted_address(i)) for i in dash_add]

                            with open('formatted_addresses.csv', 'w') as csvfile:
                            writer = csv.writer(csvfile)
                            writer.writerow(['Address'])
                            for v in z:
                            writer.writerows([[v]])





                            share|improve this answer













                            Thanks for the help and suggestions, all. This is how I ended up tackling it:



                            import re
                            import csv
                            from csv import DictReader

                            dash_add =

                            def get_address_list(file):
                            with open(file) as f:
                            reader = csv.DictReader(f)
                            for row in reader:
                            adds = row['Siteaddr']
                            if "-" in adds:
                            dash_add.append(adds)

                            z =

                            def get_formatted_address(address):
                            numeric = address.split(" ")[0]
                            text = address.replace(numeric, "").strip()
                            start = numeric.split("-")[0]
                            end = numeric.split("-")[1]
                            end = start[:-len(end)] + end
                            return [start + " " + text, end + " " + text]

                            get_address_list('./data/map_data_112318.csv')
                            [z.extend(get_formatted_address(i)) for i in dash_add]

                            with open('formatted_addresses.csv', 'w') as csvfile:
                            writer = csv.writer(csvfile)
                            writer.writerow(['Address'])
                            for v in z:
                            writer.writerows([[v]])






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Dec 4 '18 at 21:45









                            jayohdayjayohday

                            309




                            309






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53464328%2fpythonic-data-cleaning-of-address-ranges%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                這個網誌中的熱門文章

                                Academy of Television Arts & Sciences

                                L'Équipe

                                1995 France bombings