Cast int96 timestamp from parquet to golang
up vote
1
down vote
favorite
Having this 12 byte array (int96) to timestamp.
[128 76 69 116 64 7 0 0 48 131 37 0]
How do I cast it to timestamp?
I understand the first 8 byte should be cast to int64 millisecond that represent an epoch datetime.
go parquet
add a comment |
up vote
1
down vote
favorite
Having this 12 byte array (int96) to timestamp.
[128 76 69 116 64 7 0 0 48 131 37 0]
How do I cast it to timestamp?
I understand the first 8 byte should be cast to int64 millisecond that represent an epoch datetime.
go parquet
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Having this 12 byte array (int96) to timestamp.
[128 76 69 116 64 7 0 0 48 131 37 0]
How do I cast it to timestamp?
I understand the first 8 byte should be cast to int64 millisecond that represent an epoch datetime.
go parquet
Having this 12 byte array (int96) to timestamp.
[128 76 69 116 64 7 0 0 48 131 37 0]
How do I cast it to timestamp?
I understand the first 8 byte should be cast to int64 millisecond that represent an epoch datetime.
go parquet
go parquet
edited Nov 2 at 11:37
dlsniper
2,84611325
2,84611325
asked Nov 1 at 14:53
ZAky
490314
490314
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
The first 8 bytes are time in nanosecs, not millisecs. They are not measured from the epoch either, but from midnight. The date part is stored separatly in the last 4 bytes as Julian day number.
Here is the result of an experiment I did earlier that may help. I stored '2000-01-01 12:34:56' as an int96 and dumped with parquet-tools:
$ parquet-tools dump hdfs://path/to/parquet/file | tail -n 1
value 1: R:0 D:1 V:117253024523396126668760320
Since 117253024523396126668760320 = 0x60FD4B3229000059682500, the 12 bytes are 00 60 FD 4B 32 29 00 00 | 59 68 25 00, where | shows the boundary between the time and the date parts.
00 60 FD 4B 32 29 00 00 is the time part. We need to reverse the bytes because int96 timestamp use a reverse byte order, thus we get 0x000029324BFD6000 = 45296 * 10^9 nanoseconds = 45296 seconds = 12 hours + 34 minutes + 56 seconds.
59 68 25 00 is the date part, if we reverse the bytes we get 0x00256859 = 2451545 as the Julian day number, which corresponds to 2000-01-01.
add a comment |
up vote
0
down vote
@Zoltan you definitely deserve the vote although you didn't supply a Golang sulotion.
Thanks to you and to https://github.com/carlosjhr64/jd
I wrote a function func int96ToJulian(parquetDate byte) time.Time
playground
func int96ToJulian(parquetDate byte) time.Time {
nano := binary.LittleEndian.Uint64(parquetDate[:8])
dt := binary.LittleEndian.Uint32(parquetDate[8:])
l := dt + 68569
n := 4 * l / 146097
l = l - (146097*n+3)/4
i := 4000 * (l + 1) / 1461001
l = l - 1461*i/4 + 31
j := 80 * l / 2447
k := l - 2447*j/80
l = j / 11
j = j + 2 - 12*l
i = 100*(n-49) + i + l
tm := time.Date(int(i), time.Month(j), int(k), 0, 0, 0, 0, time.UTC)
return tm.Add(time.Duration(nano))
}
I do not know any Go (which is why I did not provide any code in my answer), but your use of binary.BigEndian above suggest that there is a binary.LittleEndian as well (which I confirmed with a quick Google search). If you used that, you wouldn't have to reverse the bytes manually, since that's exactly what endianness means.
– Zoltan
Nov 4 at 8:09
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
The first 8 bytes are time in nanosecs, not millisecs. They are not measured from the epoch either, but from midnight. The date part is stored separatly in the last 4 bytes as Julian day number.
Here is the result of an experiment I did earlier that may help. I stored '2000-01-01 12:34:56' as an int96 and dumped with parquet-tools:
$ parquet-tools dump hdfs://path/to/parquet/file | tail -n 1
value 1: R:0 D:1 V:117253024523396126668760320
Since 117253024523396126668760320 = 0x60FD4B3229000059682500, the 12 bytes are 00 60 FD 4B 32 29 00 00 | 59 68 25 00, where | shows the boundary between the time and the date parts.
00 60 FD 4B 32 29 00 00 is the time part. We need to reverse the bytes because int96 timestamp use a reverse byte order, thus we get 0x000029324BFD6000 = 45296 * 10^9 nanoseconds = 45296 seconds = 12 hours + 34 minutes + 56 seconds.
59 68 25 00 is the date part, if we reverse the bytes we get 0x00256859 = 2451545 as the Julian day number, which corresponds to 2000-01-01.
add a comment |
up vote
1
down vote
The first 8 bytes are time in nanosecs, not millisecs. They are not measured from the epoch either, but from midnight. The date part is stored separatly in the last 4 bytes as Julian day number.
Here is the result of an experiment I did earlier that may help. I stored '2000-01-01 12:34:56' as an int96 and dumped with parquet-tools:
$ parquet-tools dump hdfs://path/to/parquet/file | tail -n 1
value 1: R:0 D:1 V:117253024523396126668760320
Since 117253024523396126668760320 = 0x60FD4B3229000059682500, the 12 bytes are 00 60 FD 4B 32 29 00 00 | 59 68 25 00, where | shows the boundary between the time and the date parts.
00 60 FD 4B 32 29 00 00 is the time part. We need to reverse the bytes because int96 timestamp use a reverse byte order, thus we get 0x000029324BFD6000 = 45296 * 10^9 nanoseconds = 45296 seconds = 12 hours + 34 minutes + 56 seconds.
59 68 25 00 is the date part, if we reverse the bytes we get 0x00256859 = 2451545 as the Julian day number, which corresponds to 2000-01-01.
add a comment |
up vote
1
down vote
up vote
1
down vote
The first 8 bytes are time in nanosecs, not millisecs. They are not measured from the epoch either, but from midnight. The date part is stored separatly in the last 4 bytes as Julian day number.
Here is the result of an experiment I did earlier that may help. I stored '2000-01-01 12:34:56' as an int96 and dumped with parquet-tools:
$ parquet-tools dump hdfs://path/to/parquet/file | tail -n 1
value 1: R:0 D:1 V:117253024523396126668760320
Since 117253024523396126668760320 = 0x60FD4B3229000059682500, the 12 bytes are 00 60 FD 4B 32 29 00 00 | 59 68 25 00, where | shows the boundary between the time and the date parts.
00 60 FD 4B 32 29 00 00 is the time part. We need to reverse the bytes because int96 timestamp use a reverse byte order, thus we get 0x000029324BFD6000 = 45296 * 10^9 nanoseconds = 45296 seconds = 12 hours + 34 minutes + 56 seconds.
59 68 25 00 is the date part, if we reverse the bytes we get 0x00256859 = 2451545 as the Julian day number, which corresponds to 2000-01-01.
The first 8 bytes are time in nanosecs, not millisecs. They are not measured from the epoch either, but from midnight. The date part is stored separatly in the last 4 bytes as Julian day number.
Here is the result of an experiment I did earlier that may help. I stored '2000-01-01 12:34:56' as an int96 and dumped with parquet-tools:
$ parquet-tools dump hdfs://path/to/parquet/file | tail -n 1
value 1: R:0 D:1 V:117253024523396126668760320
Since 117253024523396126668760320 = 0x60FD4B3229000059682500, the 12 bytes are 00 60 FD 4B 32 29 00 00 | 59 68 25 00, where | shows the boundary between the time and the date parts.
00 60 FD 4B 32 29 00 00 is the time part. We need to reverse the bytes because int96 timestamp use a reverse byte order, thus we get 0x000029324BFD6000 = 45296 * 10^9 nanoseconds = 45296 seconds = 12 hours + 34 minutes + 56 seconds.
59 68 25 00 is the date part, if we reverse the bytes we get 0x00256859 = 2451545 as the Julian day number, which corresponds to 2000-01-01.
answered Nov 1 at 15:37
Zoltan
999212
999212
add a comment |
add a comment |
up vote
0
down vote
@Zoltan you definitely deserve the vote although you didn't supply a Golang sulotion.
Thanks to you and to https://github.com/carlosjhr64/jd
I wrote a function func int96ToJulian(parquetDate byte) time.Time
playground
func int96ToJulian(parquetDate byte) time.Time {
nano := binary.LittleEndian.Uint64(parquetDate[:8])
dt := binary.LittleEndian.Uint32(parquetDate[8:])
l := dt + 68569
n := 4 * l / 146097
l = l - (146097*n+3)/4
i := 4000 * (l + 1) / 1461001
l = l - 1461*i/4 + 31
j := 80 * l / 2447
k := l - 2447*j/80
l = j / 11
j = j + 2 - 12*l
i = 100*(n-49) + i + l
tm := time.Date(int(i), time.Month(j), int(k), 0, 0, 0, 0, time.UTC)
return tm.Add(time.Duration(nano))
}
I do not know any Go (which is why I did not provide any code in my answer), but your use of binary.BigEndian above suggest that there is a binary.LittleEndian as well (which I confirmed with a quick Google search). If you used that, you wouldn't have to reverse the bytes manually, since that's exactly what endianness means.
– Zoltan
Nov 4 at 8:09
add a comment |
up vote
0
down vote
@Zoltan you definitely deserve the vote although you didn't supply a Golang sulotion.
Thanks to you and to https://github.com/carlosjhr64/jd
I wrote a function func int96ToJulian(parquetDate byte) time.Time
playground
func int96ToJulian(parquetDate byte) time.Time {
nano := binary.LittleEndian.Uint64(parquetDate[:8])
dt := binary.LittleEndian.Uint32(parquetDate[8:])
l := dt + 68569
n := 4 * l / 146097
l = l - (146097*n+3)/4
i := 4000 * (l + 1) / 1461001
l = l - 1461*i/4 + 31
j := 80 * l / 2447
k := l - 2447*j/80
l = j / 11
j = j + 2 - 12*l
i = 100*(n-49) + i + l
tm := time.Date(int(i), time.Month(j), int(k), 0, 0, 0, 0, time.UTC)
return tm.Add(time.Duration(nano))
}
I do not know any Go (which is why I did not provide any code in my answer), but your use of binary.BigEndian above suggest that there is a binary.LittleEndian as well (which I confirmed with a quick Google search). If you used that, you wouldn't have to reverse the bytes manually, since that's exactly what endianness means.
– Zoltan
Nov 4 at 8:09
add a comment |
up vote
0
down vote
up vote
0
down vote
@Zoltan you definitely deserve the vote although you didn't supply a Golang sulotion.
Thanks to you and to https://github.com/carlosjhr64/jd
I wrote a function func int96ToJulian(parquetDate byte) time.Time
playground
func int96ToJulian(parquetDate byte) time.Time {
nano := binary.LittleEndian.Uint64(parquetDate[:8])
dt := binary.LittleEndian.Uint32(parquetDate[8:])
l := dt + 68569
n := 4 * l / 146097
l = l - (146097*n+3)/4
i := 4000 * (l + 1) / 1461001
l = l - 1461*i/4 + 31
j := 80 * l / 2447
k := l - 2447*j/80
l = j / 11
j = j + 2 - 12*l
i = 100*(n-49) + i + l
tm := time.Date(int(i), time.Month(j), int(k), 0, 0, 0, 0, time.UTC)
return tm.Add(time.Duration(nano))
}
@Zoltan you definitely deserve the vote although you didn't supply a Golang sulotion.
Thanks to you and to https://github.com/carlosjhr64/jd
I wrote a function func int96ToJulian(parquetDate byte) time.Time
playground
func int96ToJulian(parquetDate byte) time.Time {
nano := binary.LittleEndian.Uint64(parquetDate[:8])
dt := binary.LittleEndian.Uint32(parquetDate[8:])
l := dt + 68569
n := 4 * l / 146097
l = l - (146097*n+3)/4
i := 4000 * (l + 1) / 1461001
l = l - 1461*i/4 + 31
j := 80 * l / 2447
k := l - 2447*j/80
l = j / 11
j = j + 2 - 12*l
i = 100*(n-49) + i + l
tm := time.Date(int(i), time.Month(j), int(k), 0, 0, 0, 0, time.UTC)
return tm.Add(time.Duration(nano))
}
edited Nov 5 at 3:44
answered Nov 3 at 17:48
ZAky
490314
490314
I do not know any Go (which is why I did not provide any code in my answer), but your use of binary.BigEndian above suggest that there is a binary.LittleEndian as well (which I confirmed with a quick Google search). If you used that, you wouldn't have to reverse the bytes manually, since that's exactly what endianness means.
– Zoltan
Nov 4 at 8:09
add a comment |
I do not know any Go (which is why I did not provide any code in my answer), but your use of binary.BigEndian above suggest that there is a binary.LittleEndian as well (which I confirmed with a quick Google search). If you used that, you wouldn't have to reverse the bytes manually, since that's exactly what endianness means.
– Zoltan
Nov 4 at 8:09
I do not know any Go (which is why I did not provide any code in my answer), but your use of binary.BigEndian above suggest that there is a binary.LittleEndian as well (which I confirmed with a quick Google search). If you used that, you wouldn't have to reverse the bytes manually, since that's exactly what endianness means.
– Zoltan
Nov 4 at 8:09
I do not know any Go (which is why I did not provide any code in my answer), but your use of binary.BigEndian above suggest that there is a binary.LittleEndian as well (which I confirmed with a quick Google search). If you used that, you wouldn't have to reverse the bytes manually, since that's exactly what endianness means.
– Zoltan
Nov 4 at 8:09
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53103762%2fcast-int96-timestamp-from-parquet-to-golang%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password