Pythonic data cleaning of address ranges

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I need to know if the following task is possible via Python.

I have a list of 4,300 addresses, each that covers a range of addresses, like this:

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.

In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?

asked Nov 25 '18 at 3:13

jayohday

309

How is your data stored?

– 0liveradam8
Nov 25 '18 at 3:17

It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

– jayohday
Nov 25 '18 at 3:21

add a comment |

I need to know if the following task is possible via Python.

I have a list of 4,300 addresses, each that covers a range of addresses, like this:

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.

asked Nov 25 '18 at 3:13

jayohday

309

How is your data stored?

– 0liveradam8
Nov 25 '18 at 3:17

It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

– jayohday
Nov 25 '18 at 3:21

add a comment |

I need to know if the following task is possible via Python.

I have a list of 4,300 addresses, each that covers a range of addresses, like this:

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.

asked Nov 25 '18 at 3:13

jayohday

309

I need to know if the following task is possible via Python.

I have a list of 4,300 addresses, each that covers a range of addresses, like this:

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.

python data-cleaning

asked Nov 25 '18 at 3:13

jayohday

309

asked Nov 25 '18 at 3:13

jayohday

309

asked Nov 25 '18 at 3:13

jayohday

309

asked Nov 25 '18 at 3:13

jayohday

309

asked Nov 25 '18 at 3:13

jayohday

309

How is your data stored?

– 0liveradam8
Nov 25 '18 at 3:17

It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

– jayohday
Nov 25 '18 at 3:21

add a comment |

How is your data stored?

– 0liveradam8
Nov 25 '18 at 3:17

It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

– jayohday
Nov 25 '18 at 3:21

How is your data stored?

– 0liveradam8
Nov 25 '18 at 3:17

It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.

– jayohday
Nov 25 '18 at 3:21

add a comment |

3 Answers
3

active

oldest

votes

Try this code:

addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]

output=

for i in addresses:

    output.append(i[0:i.find('-')]+i[i.find(' '):])

    numEnd=i[i.find('-')+1:i.find(' ')]

    numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]

    output.append(numStart)

print(output)

Where addresses is a list of the address ranges in the format you've given, and output is the data you want.

answered Nov 25 '18 at 3:28

0liveradam8

606115

add a comment |

Let's say your input .csv file looks like this:

Address

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

and you want it to look something like this:

First Address,Last Address

8519 S Broadway,8521 S Broadway

8700 Riverview Blvd,8706 Riverview Blvd

822 Angelica St,824 Angelica St

8442 N Broadway,8446 N Broadway

You can accomplish this by using the csv module:

from csv import writer



# open both input and output csv

with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:

    csv_writer = writer(csv_out)



    # Skip 'Address' header

    next(csv_in)



    # Write new headers

    csv_writer.writerow(["First Address", "Last Address"])



    # Go through each line in csv

    for line in csv_in:



        # Split ranges from rest of line

        ranges, *rest = line.split()



        # Split ranges themselves

        start, end = ranges.split("-")



        # Get difference between length of numbers

        diff = len(start) - len(end)



        # Create new end address

        end = start[:diff] + end



        # Write new lines to output csv

        csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])

Which will output your results into output.csv.

edited Nov 25 '18 at 4:39

answered Nov 25 '18 at 4:33

RoadRunner

11.5k31441

add a comment |

Thanks for the help and suggestions, all. This is how I ended up tackling it:

import re

import csv

from csv import DictReader



dash_add = 



def get_address_list(file):

    with open(file) as f:

        reader = csv.DictReader(f)

        for row in reader:

            adds = row['Siteaddr']

            if "-" in adds:

                dash_add.append(adds)



z = 



def get_formatted_address(address):

        numeric = address.split(" ")[0]

        text = address.replace(numeric, "").strip()

        start = numeric.split("-")[0]

        end = numeric.split("-")[1]

        end = start[:-len(end)] + end

        return [start + " " + text,  end + " " + text]



get_address_list('./data/map_data_112318.csv')

[z.extend(get_formatted_address(i)) for i in dash_add]



with open('formatted_addresses.csv', 'w') as csvfile:

    writer = csv.writer(csvfile)

    writer.writerow(['Address'])

    for v in z:

        writer.writerows([[v]])

answered Dec 4 '18 at 21:45

jayohday

309

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53464328%2fpythonic-data-cleaning-of-address-ranges%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Try this code:

addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]

output=

for i in addresses:

    output.append(i[0:i.find('-')]+i[i.find(' '):])

    numEnd=i[i.find('-')+1:i.find(' ')]

    numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]

    output.append(numStart)

print(output)

Where addresses is a list of the address ranges in the format you've given, and output is the data you want.

answered Nov 25 '18 at 3:28

0liveradam8

606115

add a comment |

Try this code:

addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]

output=

for i in addresses:

    output.append(i[0:i.find('-')]+i[i.find(' '):])

    numEnd=i[i.find('-')+1:i.find(' ')]

    numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]

    output.append(numStart)

print(output)

Where addresses is a list of the address ranges in the format you've given, and output is the data you want.

answered Nov 25 '18 at 3:28

0liveradam8

606115

add a comment |

Try this code:

addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]

output=

for i in addresses:

    output.append(i[0:i.find('-')]+i[i.find(' '):])

    numEnd=i[i.find('-')+1:i.find(' ')]

    numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]

    output.append(numStart)

print(output)

Where addresses is a list of the address ranges in the format you've given, and output is the data you want.

answered Nov 25 '18 at 3:28

0liveradam8

606115

Try this code:

addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]

output=

for i in addresses:

    output.append(i[0:i.find('-')]+i[i.find(' '):])

    numEnd=i[i.find('-')+1:i.find(' ')]

    numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]

    output.append(numStart)

print(output)

Where addresses is a list of the address ranges in the format you've given, and output is the data you want.

answered Nov 25 '18 at 3:28

0liveradam8

606115

answered Nov 25 '18 at 3:28

0liveradam8

606115

answered Nov 25 '18 at 3:28

0liveradam8

606115

answered Nov 25 '18 at 3:28

0liveradam8

606115

add a comment |

Let's say your input .csv file looks like this:

Address

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

and you want it to look something like this:

First Address,Last Address

8519 S Broadway,8521 S Broadway

8700 Riverview Blvd,8706 Riverview Blvd

822 Angelica St,824 Angelica St

8442 N Broadway,8446 N Broadway

You can accomplish this by using the csv module:

from csv import writer



# open both input and output csv

with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:

    csv_writer = writer(csv_out)



    # Skip 'Address' header

    next(csv_in)



    # Write new headers

    csv_writer.writerow(["First Address", "Last Address"])



    # Go through each line in csv

    for line in csv_in:



        # Split ranges from rest of line

        ranges, *rest = line.split()



        # Split ranges themselves

        start, end = ranges.split("-")



        # Get difference between length of numbers

        diff = len(start) - len(end)



        # Create new end address

        end = start[:diff] + end



        # Write new lines to output csv

        csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])

Which will output your results into output.csv.

edited Nov 25 '18 at 4:39

answered Nov 25 '18 at 4:33

RoadRunner

11.5k31441

add a comment |

Let's say your input .csv file looks like this:

Address

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

and you want it to look something like this:

First Address,Last Address

8519 S Broadway,8521 S Broadway

8700 Riverview Blvd,8706 Riverview Blvd

822 Angelica St,824 Angelica St

8442 N Broadway,8446 N Broadway

You can accomplish this by using the csv module:

from csv import writer



# open both input and output csv

with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:

    csv_writer = writer(csv_out)



    # Skip 'Address' header

    next(csv_in)



    # Write new headers

    csv_writer.writerow(["First Address", "Last Address"])



    # Go through each line in csv

    for line in csv_in:



        # Split ranges from rest of line

        ranges, *rest = line.split()



        # Split ranges themselves

        start, end = ranges.split("-")



        # Get difference between length of numbers

        diff = len(start) - len(end)



        # Create new end address

        end = start[:diff] + end



        # Write new lines to output csv

        csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])

Which will output your results into output.csv.

edited Nov 25 '18 at 4:39

answered Nov 25 '18 at 4:33

RoadRunner

11.5k31441

add a comment |

Let's say your input .csv file looks like this:

Address

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

and you want it to look something like this:

First Address,Last Address

8519 S Broadway,8521 S Broadway

8700 Riverview Blvd,8706 Riverview Blvd

822 Angelica St,824 Angelica St

8442 N Broadway,8446 N Broadway

You can accomplish this by using the csv module:

from csv import writer



# open both input and output csv

with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:

    csv_writer = writer(csv_out)



    # Skip 'Address' header

    next(csv_in)



    # Write new headers

    csv_writer.writerow(["First Address", "Last Address"])



    # Go through each line in csv

    for line in csv_in:



        # Split ranges from rest of line

        ranges, *rest = line.split()



        # Split ranges themselves

        start, end = ranges.split("-")



        # Get difference between length of numbers

        diff = len(start) - len(end)



        # Create new end address

        end = start[:diff] + end



        # Write new lines to output csv

        csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])

Which will output your results into output.csv.

edited Nov 25 '18 at 4:39

answered Nov 25 '18 at 4:33

RoadRunner

11.5k31441

Let's say your input .csv file looks like this:

Address

8519-21 S Broadway

8700-6 Riverview Blvd

822-4 Angelica St

8442-6 N Broadway

and you want it to look something like this:

First Address,Last Address

8519 S Broadway,8521 S Broadway

8700 Riverview Blvd,8706 Riverview Blvd

822 Angelica St,824 Angelica St

8442 N Broadway,8446 N Broadway

You can accomplish this by using the csv module:

from csv import writer



# open both input and output csv

with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:

    csv_writer = writer(csv_out)



    # Skip 'Address' header

    next(csv_in)



    # Write new headers

    csv_writer.writerow(["First Address", "Last Address"])



    # Go through each line in csv

    for line in csv_in:



        # Split ranges from rest of line

        ranges, *rest = line.split()



        # Split ranges themselves

        start, end = ranges.split("-")



        # Get difference between length of numbers

        diff = len(start) - len(end)



        # Create new end address

        end = start[:diff] + end



        # Write new lines to output csv

        csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])

Which will output your results into output.csv.

edited Nov 25 '18 at 4:39

answered Nov 25 '18 at 4:33

RoadRunner

11.5k31441

edited Nov 25 '18 at 4:39

answered Nov 25 '18 at 4:33

RoadRunner

11.5k31441

answered Nov 25 '18 at 4:33

RoadRunner

11.5k31441

answered Nov 25 '18 at 4:33

RoadRunner

11.5k31441

add a comment |

Thanks for the help and suggestions, all. This is how I ended up tackling it:

import re

import csv

from csv import DictReader



dash_add = 



def get_address_list(file):

    with open(file) as f:

        reader = csv.DictReader(f)

        for row in reader:

            adds = row['Siteaddr']

            if "-" in adds:

                dash_add.append(adds)



z = 



def get_formatted_address(address):

        numeric = address.split(" ")[0]

        text = address.replace(numeric, "").strip()

        start = numeric.split("-")[0]

        end = numeric.split("-")[1]

        end = start[:-len(end)] + end

        return [start + " " + text,  end + " " + text]



get_address_list('./data/map_data_112318.csv')

[z.extend(get_formatted_address(i)) for i in dash_add]



with open('formatted_addresses.csv', 'w') as csvfile:

    writer = csv.writer(csvfile)

    writer.writerow(['Address'])

    for v in z:

        writer.writerows([[v]])

answered Dec 4 '18 at 21:45

jayohday

309

add a comment |

Thanks for the help and suggestions, all. This is how I ended up tackling it:

import re

import csv

from csv import DictReader



dash_add = 



def get_address_list(file):

    with open(file) as f:

        reader = csv.DictReader(f)

        for row in reader:

            adds = row['Siteaddr']

            if "-" in adds:

                dash_add.append(adds)



z = 



def get_formatted_address(address):

        numeric = address.split(" ")[0]

        text = address.replace(numeric, "").strip()

        start = numeric.split("-")[0]

        end = numeric.split("-")[1]

        end = start[:-len(end)] + end

        return [start + " " + text,  end + " " + text]



get_address_list('./data/map_data_112318.csv')

[z.extend(get_formatted_address(i)) for i in dash_add]



with open('formatted_addresses.csv', 'w') as csvfile:

    writer = csv.writer(csvfile)

    writer.writerow(['Address'])

    for v in z:

        writer.writerows([[v]])

answered Dec 4 '18 at 21:45

jayohday

309

add a comment |

Thanks for the help and suggestions, all. This is how I ended up tackling it:

import re

import csv

from csv import DictReader



dash_add = 



def get_address_list(file):

    with open(file) as f:

        reader = csv.DictReader(f)

        for row in reader:

            adds = row['Siteaddr']

            if "-" in adds:

                dash_add.append(adds)



z = 



def get_formatted_address(address):

        numeric = address.split(" ")[0]

        text = address.replace(numeric, "").strip()

        start = numeric.split("-")[0]

        end = numeric.split("-")[1]

        end = start[:-len(end)] + end

        return [start + " " + text,  end + " " + text]



get_address_list('./data/map_data_112318.csv')

[z.extend(get_formatted_address(i)) for i in dash_add]



with open('formatted_addresses.csv', 'w') as csvfile:

    writer = csv.writer(csvfile)

    writer.writerow(['Address'])

    for v in z:

        writer.writerows([[v]])

answered Dec 4 '18 at 21:45

jayohday

309

Thanks for the help and suggestions, all. This is how I ended up tackling it:

import re

import csv

from csv import DictReader



dash_add = 



def get_address_list(file):

    with open(file) as f:

        reader = csv.DictReader(f)

        for row in reader:

            adds = row['Siteaddr']

            if "-" in adds:

                dash_add.append(adds)



z = 



def get_formatted_address(address):

        numeric = address.split(" ")[0]

        text = address.replace(numeric, "").strip()

        start = numeric.split("-")[0]

        end = numeric.split("-")[1]

        end = start[:-len(end)] + end

        return [start + " " + text,  end + " " + text]



get_address_list('./data/map_data_112318.csv')

[z.extend(get_formatted_address(i)) for i in dash_add]



with open('formatted_addresses.csv', 'w') as csvfile:

    writer = csv.writer(csvfile)

    writer.writerow(['Address'])

    for v in z:

        writer.writerows([[v]])

answered Dec 4 '18 at 21:45

jayohday

309

answered Dec 4 '18 at 21:45

jayohday

309

answered Dec 4 '18 at 21:45

jayohday

309

answered Dec 4 '18 at 21:45

jayohday

309

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk