Write in a column of a csv formatted file after n lines?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I'm new to python.
I'm having problems working with a csv file.
This is a file that has 12 lines of header and after starts the data.
I've to read some datas from columns (on that is ok) and after an elaboration I've to add to the same file a column with a value in each row but without any id in the first column and the column had to start from the 13th line not from the first.
I've tried to use pandas library but it doesn't work
df = pd.read_csv("./1540476113.gt.tie")
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
where package is the name of the column (but i know also the index) and packages is the array of string (the elements that I've to write).
This code works but starts to add from the first line (I don't know how can i set an offset) and add to the file the index in the first column (non wanted) and a char ' before each element.
sep is the separator of each column.
Sample input data:
# TIE output version: 1.0 (text format)
# generated by: . -a ndping_1.0 -r /home/giuseppe/Scrivania/gruppo30/1540476113/traffic.pcap
# Working Mode: off-line
# Session Type: biflow
# 1 plugins enabled: ndping
# begin trace interval: 1540476116.42434
# begin TIE Table
# id src_ip dst_ip proto sport dport dwpkts uppkts dwbytes upbytes t_start t_last app_id sub_id app_details confidence
17 192.168.20.105 216.58.205.42 6 50854 443 8 9 1507 1728 1540476136.698920 1540476136.879543 501 0 Google 100
26 192.168.20.105 151.101.66.202 6 40107 443 15 18 5874 1882 1540476194.196948 1540476204.641949 501 0 SSL_with_certificate 100
27 192.168.20.105 31.13.90.2 6 48133 443 10 15 4991 1598 1540476194.218949 1540476196.358946 501 0 Facebook 100
Sample output data:
# TIE output version: 1.0 (text format)
# generated by: . -a ndping_1.0 -r /home/giuseppe/Scrivania/gruppo30/1540476113/traffic.pcap
# Working Mode: off-line
# Session Type: biflow
# 1 plugins enabled: ndping
# begin trace interval: 1540476116.42434
# begin TIE Table
# id src_ip dst_ip proto sport dport dwpkts uppkts dwbytes upbytes t_start t_last app_id sub_id app_details confidence package
17 192.168.20.105 216.58.205.42 6 50854 443 8 9 1507 1728 1540476136.698920 1540476136.879543 501 0 Google 100 N/C
26 192.168.20.105 151.101.66.202 6 40107 443 15 18 5874 1882 1540476194.196948 1540476204.641949 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
27 192.168.20.105 31.13.90.2 6 48133 443 10 15 4991 1598 1540476194.218949 1540476196.358946 501 0 Facebook 100 com.joelapenna.foursquared
38 192.168.20.105 13.32.71.69 6 52108 443 9 12 5297 2062 1540476195.492946 1540476308.604998 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
0 34.246.212.92 192.168.20.105 6 443 37981 3 2 187 98 1540476116.042434 1540476189.868844 0 0 Other TCP 0 N/C
29 192.168.20.105 13.32.123.222 6 36481 443 11 15 6638 1914 1540476194.376945 1540476308.572998 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
31 192.168.20.105 8.8.8.8 17 1219 53 1 1 253 68 1540476194.898945 1540476194.931198 501 0 DNS 100
I do not care of the alinemen, the delimiter of each column is a 't'.
python pandas csv dataframe file-io
add a comment |
I'm new to python.
I'm having problems working with a csv file.
This is a file that has 12 lines of header and after starts the data.
I've to read some datas from columns (on that is ok) and after an elaboration I've to add to the same file a column with a value in each row but without any id in the first column and the column had to start from the 13th line not from the first.
I've tried to use pandas library but it doesn't work
df = pd.read_csv("./1540476113.gt.tie")
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
where package is the name of the column (but i know also the index) and packages is the array of string (the elements that I've to write).
This code works but starts to add from the first line (I don't know how can i set an offset) and add to the file the index in the first column (non wanted) and a char ' before each element.
sep is the separator of each column.
Sample input data:
# TIE output version: 1.0 (text format)
# generated by: . -a ndping_1.0 -r /home/giuseppe/Scrivania/gruppo30/1540476113/traffic.pcap
# Working Mode: off-line
# Session Type: biflow
# 1 plugins enabled: ndping
# begin trace interval: 1540476116.42434
# begin TIE Table
# id src_ip dst_ip proto sport dport dwpkts uppkts dwbytes upbytes t_start t_last app_id sub_id app_details confidence
17 192.168.20.105 216.58.205.42 6 50854 443 8 9 1507 1728 1540476136.698920 1540476136.879543 501 0 Google 100
26 192.168.20.105 151.101.66.202 6 40107 443 15 18 5874 1882 1540476194.196948 1540476204.641949 501 0 SSL_with_certificate 100
27 192.168.20.105 31.13.90.2 6 48133 443 10 15 4991 1598 1540476194.218949 1540476196.358946 501 0 Facebook 100
Sample output data:
# TIE output version: 1.0 (text format)
# generated by: . -a ndping_1.0 -r /home/giuseppe/Scrivania/gruppo30/1540476113/traffic.pcap
# Working Mode: off-line
# Session Type: biflow
# 1 plugins enabled: ndping
# begin trace interval: 1540476116.42434
# begin TIE Table
# id src_ip dst_ip proto sport dport dwpkts uppkts dwbytes upbytes t_start t_last app_id sub_id app_details confidence package
17 192.168.20.105 216.58.205.42 6 50854 443 8 9 1507 1728 1540476136.698920 1540476136.879543 501 0 Google 100 N/C
26 192.168.20.105 151.101.66.202 6 40107 443 15 18 5874 1882 1540476194.196948 1540476204.641949 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
27 192.168.20.105 31.13.90.2 6 48133 443 10 15 4991 1598 1540476194.218949 1540476196.358946 501 0 Facebook 100 com.joelapenna.foursquared
38 192.168.20.105 13.32.71.69 6 52108 443 9 12 5297 2062 1540476195.492946 1540476308.604998 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
0 34.246.212.92 192.168.20.105 6 443 37981 3 2 187 98 1540476116.042434 1540476189.868844 0 0 Other TCP 0 N/C
29 192.168.20.105 13.32.123.222 6 36481 443 11 15 6638 1914 1540476194.376945 1540476308.572998 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
31 192.168.20.105 8.8.8.8 17 1219 53 1 1 253 68 1540476194.898945 1540476194.931198 501 0 DNS 100
I do not care of the alinemen, the delimiter of each column is a 't'.
python pandas csv dataframe file-io
Please show sample input data and desired output data. See Minimal, Complete, and Verifiable example guidelines.
– Mark Tolonen
Nov 24 '18 at 19:01
Done, updated the question
– Giuseppe Ferrara
Nov 25 '18 at 14:50
add a comment |
I'm new to python.
I'm having problems working with a csv file.
This is a file that has 12 lines of header and after starts the data.
I've to read some datas from columns (on that is ok) and after an elaboration I've to add to the same file a column with a value in each row but without any id in the first column and the column had to start from the 13th line not from the first.
I've tried to use pandas library but it doesn't work
df = pd.read_csv("./1540476113.gt.tie")
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
where package is the name of the column (but i know also the index) and packages is the array of string (the elements that I've to write).
This code works but starts to add from the first line (I don't know how can i set an offset) and add to the file the index in the first column (non wanted) and a char ' before each element.
sep is the separator of each column.
Sample input data:
# TIE output version: 1.0 (text format)
# generated by: . -a ndping_1.0 -r /home/giuseppe/Scrivania/gruppo30/1540476113/traffic.pcap
# Working Mode: off-line
# Session Type: biflow
# 1 plugins enabled: ndping
# begin trace interval: 1540476116.42434
# begin TIE Table
# id src_ip dst_ip proto sport dport dwpkts uppkts dwbytes upbytes t_start t_last app_id sub_id app_details confidence
17 192.168.20.105 216.58.205.42 6 50854 443 8 9 1507 1728 1540476136.698920 1540476136.879543 501 0 Google 100
26 192.168.20.105 151.101.66.202 6 40107 443 15 18 5874 1882 1540476194.196948 1540476204.641949 501 0 SSL_with_certificate 100
27 192.168.20.105 31.13.90.2 6 48133 443 10 15 4991 1598 1540476194.218949 1540476196.358946 501 0 Facebook 100
Sample output data:
# TIE output version: 1.0 (text format)
# generated by: . -a ndping_1.0 -r /home/giuseppe/Scrivania/gruppo30/1540476113/traffic.pcap
# Working Mode: off-line
# Session Type: biflow
# 1 plugins enabled: ndping
# begin trace interval: 1540476116.42434
# begin TIE Table
# id src_ip dst_ip proto sport dport dwpkts uppkts dwbytes upbytes t_start t_last app_id sub_id app_details confidence package
17 192.168.20.105 216.58.205.42 6 50854 443 8 9 1507 1728 1540476136.698920 1540476136.879543 501 0 Google 100 N/C
26 192.168.20.105 151.101.66.202 6 40107 443 15 18 5874 1882 1540476194.196948 1540476204.641949 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
27 192.168.20.105 31.13.90.2 6 48133 443 10 15 4991 1598 1540476194.218949 1540476196.358946 501 0 Facebook 100 com.joelapenna.foursquared
38 192.168.20.105 13.32.71.69 6 52108 443 9 12 5297 2062 1540476195.492946 1540476308.604998 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
0 34.246.212.92 192.168.20.105 6 443 37981 3 2 187 98 1540476116.042434 1540476189.868844 0 0 Other TCP 0 N/C
29 192.168.20.105 13.32.123.222 6 36481 443 11 15 6638 1914 1540476194.376945 1540476308.572998 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
31 192.168.20.105 8.8.8.8 17 1219 53 1 1 253 68 1540476194.898945 1540476194.931198 501 0 DNS 100
I do not care of the alinemen, the delimiter of each column is a 't'.
python pandas csv dataframe file-io
I'm new to python.
I'm having problems working with a csv file.
This is a file that has 12 lines of header and after starts the data.
I've to read some datas from columns (on that is ok) and after an elaboration I've to add to the same file a column with a value in each row but without any id in the first column and the column had to start from the 13th line not from the first.
I've tried to use pandas library but it doesn't work
df = pd.read_csv("./1540476113.gt.tie")
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
where package is the name of the column (but i know also the index) and packages is the array of string (the elements that I've to write).
This code works but starts to add from the first line (I don't know how can i set an offset) and add to the file the index in the first column (non wanted) and a char ' before each element.
sep is the separator of each column.
Sample input data:
# TIE output version: 1.0 (text format)
# generated by: . -a ndping_1.0 -r /home/giuseppe/Scrivania/gruppo30/1540476113/traffic.pcap
# Working Mode: off-line
# Session Type: biflow
# 1 plugins enabled: ndping
# begin trace interval: 1540476116.42434
# begin TIE Table
# id src_ip dst_ip proto sport dport dwpkts uppkts dwbytes upbytes t_start t_last app_id sub_id app_details confidence
17 192.168.20.105 216.58.205.42 6 50854 443 8 9 1507 1728 1540476136.698920 1540476136.879543 501 0 Google 100
26 192.168.20.105 151.101.66.202 6 40107 443 15 18 5874 1882 1540476194.196948 1540476204.641949 501 0 SSL_with_certificate 100
27 192.168.20.105 31.13.90.2 6 48133 443 10 15 4991 1598 1540476194.218949 1540476196.358946 501 0 Facebook 100
Sample output data:
# TIE output version: 1.0 (text format)
# generated by: . -a ndping_1.0 -r /home/giuseppe/Scrivania/gruppo30/1540476113/traffic.pcap
# Working Mode: off-line
# Session Type: biflow
# 1 plugins enabled: ndping
# begin trace interval: 1540476116.42434
# begin TIE Table
# id src_ip dst_ip proto sport dport dwpkts uppkts dwbytes upbytes t_start t_last app_id sub_id app_details confidence package
17 192.168.20.105 216.58.205.42 6 50854 443 8 9 1507 1728 1540476136.698920 1540476136.879543 501 0 Google 100 N/C
26 192.168.20.105 151.101.66.202 6 40107 443 15 18 5874 1882 1540476194.196948 1540476204.641949 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
27 192.168.20.105 31.13.90.2 6 48133 443 10 15 4991 1598 1540476194.218949 1540476196.358946 501 0 Facebook 100 com.joelapenna.foursquared
38 192.168.20.105 13.32.71.69 6 52108 443 9 12 5297 2062 1540476195.492946 1540476308.604998 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
0 34.246.212.92 192.168.20.105 6 443 37981 3 2 187 98 1540476116.042434 1540476189.868844 0 0 Other TCP 0 N/C
29 192.168.20.105 13.32.123.222 6 36481 443 11 15 6638 1914 1540476194.376945 1540476308.572998 501 0 SSL_with_certificate 100 com.joelapenna.foursquared
31 192.168.20.105 8.8.8.8 17 1219 53 1 1 253 68 1540476194.898945 1540476194.931198 501 0 DNS 100
I do not care of the alinemen, the delimiter of each column is a 't'.
python pandas csv dataframe file-io
python pandas csv dataframe file-io
edited Nov 25 '18 at 8:36
Giuseppe Ferrara
asked Nov 24 '18 at 17:26
Giuseppe FerraraGiuseppe Ferrara
103
103
Please show sample input data and desired output data. See Minimal, Complete, and Verifiable example guidelines.
– Mark Tolonen
Nov 24 '18 at 19:01
Done, updated the question
– Giuseppe Ferrara
Nov 25 '18 at 14:50
add a comment |
Please show sample input data and desired output data. See Minimal, Complete, and Verifiable example guidelines.
– Mark Tolonen
Nov 24 '18 at 19:01
Done, updated the question
– Giuseppe Ferrara
Nov 25 '18 at 14:50
Please show sample input data and desired output data. See Minimal, Complete, and Verifiable example guidelines.
– Mark Tolonen
Nov 24 '18 at 19:01
Please show sample input data and desired output data. See Minimal, Complete, and Verifiable example guidelines.
– Mark Tolonen
Nov 24 '18 at 19:01
Done, updated the question
– Giuseppe Ferrara
Nov 25 '18 at 14:50
Done, updated the question
– Giuseppe Ferrara
Nov 25 '18 at 14:50
add a comment |
1 Answer
1
active
oldest
votes
You can skip to the data by passing some arguments to read_csv
.
df = pd.read_csv("./1540476113.gt.tie", header=None, skiprows=12)
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
Then explicitly name your columns:
df.columns = [col_names]
If the 13th row is a header row with the column names that you want then do not pass the header=None
argument.
Check out more in the docs here.
I've tried this(df = pd.read_csv("./1540476113.gt.tie",header=None,skiprows=11) df["package"] = pd.Series(packages) df.columns = ["package"] df.to_csv("./1540476113.gt.tie", sep = "t")
because in the file the 12th row is still the header but i've to insert the column label. However it doesn't work and says me error : ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements Deleting the 'code'f.columns=[col_names] it skips the header in the new file and delete that!
– Giuseppe Ferrara
Nov 25 '18 at 8:09
You are writingdf.columns = ['package']
instead ofdf.columns = ['id', ......, 'package']
.
– foobarna
Nov 25 '18 at 8:21
I've tried to pass the array of all the column names but the error is ValueError: Length mismatch: Expected 2 elements, new values have 23 elements the 23 is bad but just because some column has two 't' as delimiter, i can correct that but however is not two columns!
– Giuseppe Ferrara
Nov 25 '18 at 8:40
@GiuseppeFerrara Try just reading in the data without skipping rows andheader=None
then slice just the data you need withdf.iloc[data_start_row:, [col_indices]]
. After that, make sure that the length of the list of column names matches the length of the column index list.
– Austin Mackillop
Nov 25 '18 at 20:59
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53460651%2fwrite-in-a-column-of-a-csv-formatted-file-after-n-lines%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can skip to the data by passing some arguments to read_csv
.
df = pd.read_csv("./1540476113.gt.tie", header=None, skiprows=12)
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
Then explicitly name your columns:
df.columns = [col_names]
If the 13th row is a header row with the column names that you want then do not pass the header=None
argument.
Check out more in the docs here.
I've tried this(df = pd.read_csv("./1540476113.gt.tie",header=None,skiprows=11) df["package"] = pd.Series(packages) df.columns = ["package"] df.to_csv("./1540476113.gt.tie", sep = "t")
because in the file the 12th row is still the header but i've to insert the column label. However it doesn't work and says me error : ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements Deleting the 'code'f.columns=[col_names] it skips the header in the new file and delete that!
– Giuseppe Ferrara
Nov 25 '18 at 8:09
You are writingdf.columns = ['package']
instead ofdf.columns = ['id', ......, 'package']
.
– foobarna
Nov 25 '18 at 8:21
I've tried to pass the array of all the column names but the error is ValueError: Length mismatch: Expected 2 elements, new values have 23 elements the 23 is bad but just because some column has two 't' as delimiter, i can correct that but however is not two columns!
– Giuseppe Ferrara
Nov 25 '18 at 8:40
@GiuseppeFerrara Try just reading in the data without skipping rows andheader=None
then slice just the data you need withdf.iloc[data_start_row:, [col_indices]]
. After that, make sure that the length of the list of column names matches the length of the column index list.
– Austin Mackillop
Nov 25 '18 at 20:59
add a comment |
You can skip to the data by passing some arguments to read_csv
.
df = pd.read_csv("./1540476113.gt.tie", header=None, skiprows=12)
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
Then explicitly name your columns:
df.columns = [col_names]
If the 13th row is a header row with the column names that you want then do not pass the header=None
argument.
Check out more in the docs here.
I've tried this(df = pd.read_csv("./1540476113.gt.tie",header=None,skiprows=11) df["package"] = pd.Series(packages) df.columns = ["package"] df.to_csv("./1540476113.gt.tie", sep = "t")
because in the file the 12th row is still the header but i've to insert the column label. However it doesn't work and says me error : ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements Deleting the 'code'f.columns=[col_names] it skips the header in the new file and delete that!
– Giuseppe Ferrara
Nov 25 '18 at 8:09
You are writingdf.columns = ['package']
instead ofdf.columns = ['id', ......, 'package']
.
– foobarna
Nov 25 '18 at 8:21
I've tried to pass the array of all the column names but the error is ValueError: Length mismatch: Expected 2 elements, new values have 23 elements the 23 is bad but just because some column has two 't' as delimiter, i can correct that but however is not two columns!
– Giuseppe Ferrara
Nov 25 '18 at 8:40
@GiuseppeFerrara Try just reading in the data without skipping rows andheader=None
then slice just the data you need withdf.iloc[data_start_row:, [col_indices]]
. After that, make sure that the length of the list of column names matches the length of the column index list.
– Austin Mackillop
Nov 25 '18 at 20:59
add a comment |
You can skip to the data by passing some arguments to read_csv
.
df = pd.read_csv("./1540476113.gt.tie", header=None, skiprows=12)
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
Then explicitly name your columns:
df.columns = [col_names]
If the 13th row is a header row with the column names that you want then do not pass the header=None
argument.
Check out more in the docs here.
You can skip to the data by passing some arguments to read_csv
.
df = pd.read_csv("./1540476113.gt.tie", header=None, skiprows=12)
df["package"] = pd.Series(packages)
df.to_csv("./1540476113.gt.tie", sep = "t")
Then explicitly name your columns:
df.columns = [col_names]
If the 13th row is a header row with the column names that you want then do not pass the header=None
argument.
Check out more in the docs here.
answered Nov 24 '18 at 18:13
Austin MackillopAustin Mackillop
37538
37538
I've tried this(df = pd.read_csv("./1540476113.gt.tie",header=None,skiprows=11) df["package"] = pd.Series(packages) df.columns = ["package"] df.to_csv("./1540476113.gt.tie", sep = "t")
because in the file the 12th row is still the header but i've to insert the column label. However it doesn't work and says me error : ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements Deleting the 'code'f.columns=[col_names] it skips the header in the new file and delete that!
– Giuseppe Ferrara
Nov 25 '18 at 8:09
You are writingdf.columns = ['package']
instead ofdf.columns = ['id', ......, 'package']
.
– foobarna
Nov 25 '18 at 8:21
I've tried to pass the array of all the column names but the error is ValueError: Length mismatch: Expected 2 elements, new values have 23 elements the 23 is bad but just because some column has two 't' as delimiter, i can correct that but however is not two columns!
– Giuseppe Ferrara
Nov 25 '18 at 8:40
@GiuseppeFerrara Try just reading in the data without skipping rows andheader=None
then slice just the data you need withdf.iloc[data_start_row:, [col_indices]]
. After that, make sure that the length of the list of column names matches the length of the column index list.
– Austin Mackillop
Nov 25 '18 at 20:59
add a comment |
I've tried this(df = pd.read_csv("./1540476113.gt.tie",header=None,skiprows=11) df["package"] = pd.Series(packages) df.columns = ["package"] df.to_csv("./1540476113.gt.tie", sep = "t")
because in the file the 12th row is still the header but i've to insert the column label. However it doesn't work and says me error : ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements Deleting the 'code'f.columns=[col_names] it skips the header in the new file and delete that!
– Giuseppe Ferrara
Nov 25 '18 at 8:09
You are writingdf.columns = ['package']
instead ofdf.columns = ['id', ......, 'package']
.
– foobarna
Nov 25 '18 at 8:21
I've tried to pass the array of all the column names but the error is ValueError: Length mismatch: Expected 2 elements, new values have 23 elements the 23 is bad but just because some column has two 't' as delimiter, i can correct that but however is not two columns!
– Giuseppe Ferrara
Nov 25 '18 at 8:40
@GiuseppeFerrara Try just reading in the data without skipping rows andheader=None
then slice just the data you need withdf.iloc[data_start_row:, [col_indices]]
. After that, make sure that the length of the list of column names matches the length of the column index list.
– Austin Mackillop
Nov 25 '18 at 20:59
I've tried this
(df = pd.read_csv("./1540476113.gt.tie",header=None,skiprows=11) df["package"] = pd.Series(packages) df.columns = ["package"] df.to_csv("./1540476113.gt.tie", sep = "t")
because in the file the 12th row is still the header but i've to insert the column label. However it doesn't work and says me error : ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements Deleting the 'code'f.columns=[col_names] it skips the header in the new file and delete that!– Giuseppe Ferrara
Nov 25 '18 at 8:09
I've tried this
(df = pd.read_csv("./1540476113.gt.tie",header=None,skiprows=11) df["package"] = pd.Series(packages) df.columns = ["package"] df.to_csv("./1540476113.gt.tie", sep = "t")
because in the file the 12th row is still the header but i've to insert the column label. However it doesn't work and says me error : ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements Deleting the 'code'f.columns=[col_names] it skips the header in the new file and delete that!– Giuseppe Ferrara
Nov 25 '18 at 8:09
You are writing
df.columns = ['package']
instead of df.columns = ['id', ......, 'package']
.– foobarna
Nov 25 '18 at 8:21
You are writing
df.columns = ['package']
instead of df.columns = ['id', ......, 'package']
.– foobarna
Nov 25 '18 at 8:21
I've tried to pass the array of all the column names but the error is ValueError: Length mismatch: Expected 2 elements, new values have 23 elements the 23 is bad but just because some column has two 't' as delimiter, i can correct that but however is not two columns!
– Giuseppe Ferrara
Nov 25 '18 at 8:40
I've tried to pass the array of all the column names but the error is ValueError: Length mismatch: Expected 2 elements, new values have 23 elements the 23 is bad but just because some column has two 't' as delimiter, i can correct that but however is not two columns!
– Giuseppe Ferrara
Nov 25 '18 at 8:40
@GiuseppeFerrara Try just reading in the data without skipping rows and
header=None
then slice just the data you need with df.iloc[data_start_row:, [col_indices]]
. After that, make sure that the length of the list of column names matches the length of the column index list.– Austin Mackillop
Nov 25 '18 at 20:59
@GiuseppeFerrara Try just reading in the data without skipping rows and
header=None
then slice just the data you need with df.iloc[data_start_row:, [col_indices]]
. After that, make sure that the length of the list of column names matches the length of the column index list.– Austin Mackillop
Nov 25 '18 at 20:59
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53460651%2fwrite-in-a-column-of-a-csv-formatted-file-after-n-lines%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please show sample input data and desired output data. See Minimal, Complete, and Verifiable example guidelines.
– Mark Tolonen
Nov 24 '18 at 19:01
Done, updated the question
– Giuseppe Ferrara
Nov 25 '18 at 14:50