Pythonic data cleaning of address ranges
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I need to know if the following task is possible via Python.
I have a list of 4,300 addresses, each that covers a range of addresses, like this:
- 8519-21 S Broadway
- 8700-6 Riverview Blvd
- 822-4 Angelica St
- 8442-6 N Broadway
I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.
In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?
python data-cleaning
add a comment |
I need to know if the following task is possible via Python.
I have a list of 4,300 addresses, each that covers a range of addresses, like this:
- 8519-21 S Broadway
- 8700-6 Riverview Blvd
- 822-4 Angelica St
- 8442-6 N Broadway
I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.
In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?
python data-cleaning
How is your data stored?
– 0liveradam8
Nov 25 '18 at 3:17
It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.
– jayohday
Nov 25 '18 at 3:21
add a comment |
I need to know if the following task is possible via Python.
I have a list of 4,300 addresses, each that covers a range of addresses, like this:
- 8519-21 S Broadway
- 8700-6 Riverview Blvd
- 822-4 Angelica St
- 8442-6 N Broadway
I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.
In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?
python data-cleaning
I need to know if the following task is possible via Python.
I have a list of 4,300 addresses, each that covers a range of addresses, like this:
- 8519-21 S Broadway
- 8700-6 Riverview Blvd
- 822-4 Angelica St
- 8442-6 N Broadway
I want to isolate the first address and the last in the range. So, using 1 as an example, I would need the output to give me 8519 S Broadway and 8521 S Broadway, each in its own row.
In some instances, like 1 and 3, I only need the first two digits of the first number to complete the second address. I tried in Excel using text to columns (-), then CONCATENATE and LEFT(cell,2) but then I don't get the address correct for instances like 2 and 4, where I need the first three digits of the address. Is there any way to write a python script to solve this?
python data-cleaning
python data-cleaning
asked Nov 25 '18 at 3:13
jayohdayjayohday
309
309
How is your data stored?
– 0liveradam8
Nov 25 '18 at 3:17
It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.
– jayohday
Nov 25 '18 at 3:21
add a comment |
How is your data stored?
– 0liveradam8
Nov 25 '18 at 3:17
It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.
– jayohday
Nov 25 '18 at 3:21
How is your data stored?
– 0liveradam8
Nov 25 '18 at 3:17
How is your data stored?
– 0liveradam8
Nov 25 '18 at 3:17
It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.
– jayohday
Nov 25 '18 at 3:21
It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.
– jayohday
Nov 25 '18 at 3:21
add a comment |
3 Answers
3
active
oldest
votes
Try this code:
addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
output=
for i in addresses:
output.append(i[0:i.find('-')]+i[i.find(' '):])
numEnd=i[i.find('-')+1:i.find(' ')]
numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
output.append(numStart)
print(output)
Where addresses is a list of the address ranges in the format you've given, and output is the data you want.
add a comment |
Let's say your input .csv file looks like this:
Address
8519-21 S Broadway
8700-6 Riverview Blvd
822-4 Angelica St
8442-6 N Broadway
and you want it to look something like this:
First Address,Last Address
8519 S Broadway,8521 S Broadway
8700 Riverview Blvd,8706 Riverview Blvd
822 Angelica St,824 Angelica St
8442 N Broadway,8446 N Broadway
You can accomplish this by using the csv module:
from csv import writer
# open both input and output csv
with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
csv_writer = writer(csv_out)
# Skip 'Address' header
next(csv_in)
# Write new headers
csv_writer.writerow(["First Address", "Last Address"])
# Go through each line in csv
for line in csv_in:
# Split ranges from rest of line
ranges, *rest = line.split()
# Split ranges themselves
start, end = ranges.split("-")
# Get difference between length of numbers
diff = len(start) - len(end)
# Create new end address
end = start[:diff] + end
# Write new lines to output csv
csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])
Which will output your results into output.csv.
add a comment |
Thanks for the help and suggestions, all. This is how I ended up tackling it:
import re
import csv
from csv import DictReader
dash_add =
def get_address_list(file):
with open(file) as f:
reader = csv.DictReader(f)
for row in reader:
adds = row['Siteaddr']
if "-" in adds:
dash_add.append(adds)
z =
def get_formatted_address(address):
numeric = address.split(" ")[0]
text = address.replace(numeric, "").strip()
start = numeric.split("-")[0]
end = numeric.split("-")[1]
end = start[:-len(end)] + end
return [start + " " + text, end + " " + text]
get_address_list('./data/map_data_112318.csv')
[z.extend(get_formatted_address(i)) for i in dash_add]
with open('formatted_addresses.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Address'])
for v in z:
writer.writerows([[v]])
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53464328%2fpythonic-data-cleaning-of-address-ranges%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try this code:
addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
output=
for i in addresses:
output.append(i[0:i.find('-')]+i[i.find(' '):])
numEnd=i[i.find('-')+1:i.find(' ')]
numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
output.append(numStart)
print(output)
Where addresses is a list of the address ranges in the format you've given, and output is the data you want.
add a comment |
Try this code:
addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
output=
for i in addresses:
output.append(i[0:i.find('-')]+i[i.find(' '):])
numEnd=i[i.find('-')+1:i.find(' ')]
numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
output.append(numStart)
print(output)
Where addresses is a list of the address ranges in the format you've given, and output is the data you want.
add a comment |
Try this code:
addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
output=
for i in addresses:
output.append(i[0:i.find('-')]+i[i.find(' '):])
numEnd=i[i.find('-')+1:i.find(' ')]
numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
output.append(numStart)
print(output)
Where addresses is a list of the address ranges in the format you've given, and output is the data you want.
Try this code:
addresses=["8519-21 S Broadway","8700-6 Riverview Blvd"]
output=
for i in addresses:
output.append(i[0:i.find('-')]+i[i.find(' '):])
numEnd=i[i.find('-')+1:i.find(' ')]
numStart=i[0:i.find('-')-len(numEnd)]+numEnd+i[i.find(' '):]
output.append(numStart)
print(output)
Where addresses is a list of the address ranges in the format you've given, and output is the data you want.
answered Nov 25 '18 at 3:28
0liveradam80liveradam8
606115
606115
add a comment |
add a comment |
Let's say your input .csv file looks like this:
Address
8519-21 S Broadway
8700-6 Riverview Blvd
822-4 Angelica St
8442-6 N Broadway
and you want it to look something like this:
First Address,Last Address
8519 S Broadway,8521 S Broadway
8700 Riverview Blvd,8706 Riverview Blvd
822 Angelica St,824 Angelica St
8442 N Broadway,8446 N Broadway
You can accomplish this by using the csv module:
from csv import writer
# open both input and output csv
with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
csv_writer = writer(csv_out)
# Skip 'Address' header
next(csv_in)
# Write new headers
csv_writer.writerow(["First Address", "Last Address"])
# Go through each line in csv
for line in csv_in:
# Split ranges from rest of line
ranges, *rest = line.split()
# Split ranges themselves
start, end = ranges.split("-")
# Get difference between length of numbers
diff = len(start) - len(end)
# Create new end address
end = start[:diff] + end
# Write new lines to output csv
csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])
Which will output your results into output.csv.
add a comment |
Let's say your input .csv file looks like this:
Address
8519-21 S Broadway
8700-6 Riverview Blvd
822-4 Angelica St
8442-6 N Broadway
and you want it to look something like this:
First Address,Last Address
8519 S Broadway,8521 S Broadway
8700 Riverview Blvd,8706 Riverview Blvd
822 Angelica St,824 Angelica St
8442 N Broadway,8446 N Broadway
You can accomplish this by using the csv module:
from csv import writer
# open both input and output csv
with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
csv_writer = writer(csv_out)
# Skip 'Address' header
next(csv_in)
# Write new headers
csv_writer.writerow(["First Address", "Last Address"])
# Go through each line in csv
for line in csv_in:
# Split ranges from rest of line
ranges, *rest = line.split()
# Split ranges themselves
start, end = ranges.split("-")
# Get difference between length of numbers
diff = len(start) - len(end)
# Create new end address
end = start[:diff] + end
# Write new lines to output csv
csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])
Which will output your results into output.csv.
add a comment |
Let's say your input .csv file looks like this:
Address
8519-21 S Broadway
8700-6 Riverview Blvd
822-4 Angelica St
8442-6 N Broadway
and you want it to look something like this:
First Address,Last Address
8519 S Broadway,8521 S Broadway
8700 Riverview Blvd,8706 Riverview Blvd
822 Angelica St,824 Angelica St
8442 N Broadway,8446 N Broadway
You can accomplish this by using the csv module:
from csv import writer
# open both input and output csv
with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
csv_writer = writer(csv_out)
# Skip 'Address' header
next(csv_in)
# Write new headers
csv_writer.writerow(["First Address", "Last Address"])
# Go through each line in csv
for line in csv_in:
# Split ranges from rest of line
ranges, *rest = line.split()
# Split ranges themselves
start, end = ranges.split("-")
# Get difference between length of numbers
diff = len(start) - len(end)
# Create new end address
end = start[:diff] + end
# Write new lines to output csv
csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])
Which will output your results into output.csv.
Let's say your input .csv file looks like this:
Address
8519-21 S Broadway
8700-6 Riverview Blvd
822-4 Angelica St
8442-6 N Broadway
and you want it to look something like this:
First Address,Last Address
8519 S Broadway,8521 S Broadway
8700 Riverview Blvd,8706 Riverview Blvd
822 Angelica St,824 Angelica St
8442 N Broadway,8446 N Broadway
You can accomplish this by using the csv module:
from csv import writer
# open both input and output csv
with open("addresses.csv") as csv_in, open("output.csv", "w", newline="") as csv_out:
csv_writer = writer(csv_out)
# Skip 'Address' header
next(csv_in)
# Write new headers
csv_writer.writerow(["First Address", "Last Address"])
# Go through each line in csv
for line in csv_in:
# Split ranges from rest of line
ranges, *rest = line.split()
# Split ranges themselves
start, end = ranges.split("-")
# Get difference between length of numbers
diff = len(start) - len(end)
# Create new end address
end = start[:diff] + end
# Write new lines to output csv
csv_writer.writerow([" ".join([start, *rest]), " ".join([end, *rest])])
Which will output your results into output.csv.
edited Nov 25 '18 at 4:39
answered Nov 25 '18 at 4:33
RoadRunnerRoadRunner
11.5k31441
11.5k31441
add a comment |
add a comment |
Thanks for the help and suggestions, all. This is how I ended up tackling it:
import re
import csv
from csv import DictReader
dash_add =
def get_address_list(file):
with open(file) as f:
reader = csv.DictReader(f)
for row in reader:
adds = row['Siteaddr']
if "-" in adds:
dash_add.append(adds)
z =
def get_formatted_address(address):
numeric = address.split(" ")[0]
text = address.replace(numeric, "").strip()
start = numeric.split("-")[0]
end = numeric.split("-")[1]
end = start[:-len(end)] + end
return [start + " " + text, end + " " + text]
get_address_list('./data/map_data_112318.csv')
[z.extend(get_formatted_address(i)) for i in dash_add]
with open('formatted_addresses.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Address'])
for v in z:
writer.writerows([[v]])
add a comment |
Thanks for the help and suggestions, all. This is how I ended up tackling it:
import re
import csv
from csv import DictReader
dash_add =
def get_address_list(file):
with open(file) as f:
reader = csv.DictReader(f)
for row in reader:
adds = row['Siteaddr']
if "-" in adds:
dash_add.append(adds)
z =
def get_formatted_address(address):
numeric = address.split(" ")[0]
text = address.replace(numeric, "").strip()
start = numeric.split("-")[0]
end = numeric.split("-")[1]
end = start[:-len(end)] + end
return [start + " " + text, end + " " + text]
get_address_list('./data/map_data_112318.csv')
[z.extend(get_formatted_address(i)) for i in dash_add]
with open('formatted_addresses.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Address'])
for v in z:
writer.writerows([[v]])
add a comment |
Thanks for the help and suggestions, all. This is how I ended up tackling it:
import re
import csv
from csv import DictReader
dash_add =
def get_address_list(file):
with open(file) as f:
reader = csv.DictReader(f)
for row in reader:
adds = row['Siteaddr']
if "-" in adds:
dash_add.append(adds)
z =
def get_formatted_address(address):
numeric = address.split(" ")[0]
text = address.replace(numeric, "").strip()
start = numeric.split("-")[0]
end = numeric.split("-")[1]
end = start[:-len(end)] + end
return [start + " " + text, end + " " + text]
get_address_list('./data/map_data_112318.csv')
[z.extend(get_formatted_address(i)) for i in dash_add]
with open('formatted_addresses.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Address'])
for v in z:
writer.writerows([[v]])
Thanks for the help and suggestions, all. This is how I ended up tackling it:
import re
import csv
from csv import DictReader
dash_add =
def get_address_list(file):
with open(file) as f:
reader = csv.DictReader(f)
for row in reader:
adds = row['Siteaddr']
if "-" in adds:
dash_add.append(adds)
z =
def get_formatted_address(address):
numeric = address.split(" ")[0]
text = address.replace(numeric, "").strip()
start = numeric.split("-")[0]
end = numeric.split("-")[1]
end = start[:-len(end)] + end
return [start + " " + text, end + " " + text]
get_address_list('./data/map_data_112318.csv')
[z.extend(get_formatted_address(i)) for i in dash_add]
with open('formatted_addresses.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Address'])
for v in z:
writer.writerows([[v]])
answered Dec 4 '18 at 21:45
jayohdayjayohday
309
309
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53464328%2fpythonic-data-cleaning-of-address-ranges%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How is your data stored?
– 0liveradam8
Nov 25 '18 at 3:17
It's in a .csv right now (all in one column, each address range in its own row) so it could be in a list, a dict, a dataframe; whatever is needed.
– jayohday
Nov 25 '18 at 3:21