DatetimeIndex reverse-lookup memo as I’m not quite sure

Introduction

I almost always feel confused about pandas.DatetimeIndex when I set DatetimeIndex in pandas.

More, I tend to drown in the sea of superabundant information, so I decided to make personal notes here.

Click the table of contents to jump to the relevant item!

For more details, please check official pandas website.

Data used for case studies and versions of pandas and Python

This post uses the following DataFrame as variable name “df”.

First of all, let’s make sure the data type of index is DatetimeIndex.

print(df.index)
DatetimeIndex(['2020-01-02 03:04:05', 

'2021-06-07 08:09:10','2022-11-12 13:14:15'],

 dtype='datetime64[ns]', name='YMDHMS', freq=None)

Good good.

Then versions of pandas and Python are 0.24.2 and 3.7.2 respectively.

* ‘Extract “day of the week” as a string; day_name ()’ was curried out in pandas 1.0.5 (added on June 26, 2020)

しときます。

Well then, prepare in advance;

import pandas as pd
import datetime

Case studies

Confirming DataFrame configuration; index, columns, values and name

print(df.index) #confirm index 

DatetimeIndex(['2020-01-02 03:04:05', '2021-06-07 08:09:10',
'2022-11-12 13:14:15'],
dtype='datetime64[ns]', name='YMDHMS', freq=None)

print(df.columns) #confirm columns

Index(['a', 'b', 'c'], dtype='object')df.values 

print(df.index.name) #confirm index name

'YMDHMS'

print(df.values) #confirm values

array([['a0', 'b0', 'c0'],
['a1', 'b1', 'c1'],
['a2', 'b2', 'c2']], dtype=object)

print(df.columns.name) # confirm column names
None

指定した行番号のindex名を表示, index

Print the index name of the specified line number; index

print(df.index[0]) #row 1

Timestamp('2020-01-02 03:04:05')

print(df.index[1]) #row 2

Timestamp('2021-06-07 08:09:10')

print(df.index[0:2]) #row 1-2

DatetimeIndex(['2020-01-02 03:04:05', '2021-06-07 08:09:10'], dtype='datetime64[ns]', name='YMDHMS', freq=None)

Print column name of specified column number; columns

print(df.columns[0]) #column 1

'a'

print(df.columns[1]) #column 2

'b'

print(df.columns[0:2]) #columns 1-2

Index(['a', 'b'], dtype='object')

Extract “year”; year

print(df.index.year) #years in the DatetimeIndex

Int64Index([2020, 2021, 2022], dtype='int64', name='YMDHMS')
print(df.index[0].year) #year at row 1

2020

Extract “month”; month

print(df.index.month) #months in the DatetimeIndex

Int64Index([1, 6, 11], dtype='int64', name='YMDHMS')
print(df.index[0].month) #month at row 1

1

Extract “day”; day

print(df.index.day) #days in the DatetimeIndex

Int64Index([2, 7, 12], dtype='int64', name='YMDHMS')
print(df.index[0].day) #day at row 1
2

Extract “date”; date

print(df.index.date) #date in the DatetimeIndex

array([datetime.date(2020, 1, 2), datetime.date(2021, 6, 7), datetime.date(2022, 11, 12)], dtype=object)
display(df.index[0].date)  #date at row 1. Oops, this doesn't work.

<built-in method date of Timestamp object at 0x115d0fa48>


display(df.index.date[0]) #The first item of the date in the DatatimeIndex. Hum!

datetime.date(2020, 1, 2)

Extract “week number of year”; week

print(df.index.week) #week number of DatetimeIndex

Int64Index([1, 23, 45], dtype='int64', name='YMDHMS')
display(df.index[0].week) #week number of row 1

Extract “weekday”; weekday

Monday;0, Tuesday;1, Wednesday;2, Thursday;3, Friday;4, Saturday;5, SUnday;6

print(df.index.weekday) #weekday in the DatetimeIndex(Thursday, Monday and Saturday in this case)

Int64Index([3, 0, 5], dtype='int64', name='YMDHMS')
print(df.index[0].weekday) #weekday in row 1. Oops, this doesn't work.

<built-in method weekday of Timestamp object at 0x115d0fa48>
print(df.index.weekday[0]) #the first item of weekdays in the DatetimeIndex. (3;Thursday in this case)

Extract “day of the week” as a string; day_name ()

*This was curried out in pandas 1.0.5 (added on June 26, 2020)

Use “day_name ()” to extract “day of the week” as a string when pandas version is 1.0.0 or more, since “weekday_name” was abolished after pandas 1.0.0.

print(df.index.day_name())

Index(['Thursday', 'Monday', 'Saturday'], dtype='object', name='YMDHMS')

print(df.index[0].day_name())

Thursday

Extract “hour”; hour

print(df.index.hour) #hours in the DatetimeIndex

Int64Index([3, 8, 13], dtype='int64', name='YMDHMS')
print(df.index[0].hour) #hour in row 1

Extract “minute”; minute

print(df.index.minute) #minutes in the DatetimeIndex

Int64Index([4, 9, 14], dtype='int64', name='YMDHMS')
print(df.index[0].minute) #minute in row 1

Extract “second”; second

print(df.index.second) #seconds in the DatetimeIndex

Int64Index([5, 10, 15], dtype='int64', name='YMDHMS')
print(df.index[0].second) #second in row 1

Extract “hour”, “minute” and “second”; time

display(df.index[0].time) #Attempt to get "hour", "minute" and "second" in row 1, but this doesn't work.

<built-in method time of Timestamp object at 0x115d0fa48>


print(df.index.time[0]) #This way works.

03:04:05

Extract a row by specifying the time and datetime with datetime.time type; loc

print(df.loc[datetime.time(3,4,5)]) # Specify the datetime-type-time of (03:04:05) with datetime.time

                      a   b   c
YMDHMS                         
2020-01-02 03:04:05  a0  b0  c0
print(df.loc[datetime.datetime(2020, 1, 2, 3, 4, 5 )]) # Specify "2020-01-02 03:04:05" with datetime.datetime

a    a0
b    b0
c    c0
Name: 2020-01-02 03:04:05, dtype: object

Note if it is written in a way like the below;

print(df.loc[datetime.date(2020, 1, 2)])

or

print(df.loc[datetime.datetime(2020, 1, 2, 3, 4)])

those does not include all element of datetime, it gets an error…

In these cases, converting datetime.date into a string goes well even if not including all element of datetime.

print(df.loc[str(datetime.date(2020, 1, 2))])# converting datetime.date into a string goes well.
                      a   b   c
YMDHMS                         
2020-01-02 03:04:05  a0  b0  c0

But in the case of datetime.datetime, an error occurs when converting partial datetime into a string.

print(df.loc[str(datetime.datetime(2020, 1, 2, 3, 4))]) #this doesn't work

Extract a row by specifying the time and datetime with strings; loc

#Specify the date and time with strings
print(df.loc['2020']) 
print(df.loc['2020-01'])
print(df.loc['2020-01-02'])
print(df.loc['2020-01-02 03'])
print(df.loc['2020-01-02 03:04'])

#without loc goes well....
print(df['2020']) 
print(df['2020-01'])
print(df['2020-01-02'])
print(df['2020-01-02 03'])
print(df['2020-01-02 03:04'])

#All of the above gives the following results. (Since only one row meets these specifications, results are the same)
                      a   b   c
YMDHMS                         
2020-01-02 03:04:05  a0  b0  c0


print(df.loc['2020-01-02 03:04:05']) #specify whole datetime
#or
print(df['2020-01-02 03:04:05'])

a    a0
b    b0
c    c0
Name: 2020-01-02 03:04:05, dtype: object

With time alone, it is not accepted.

print(df.loc['03:04:05']) #This gets an error.

So, when extracting just by specifying the time…

df.loc[datetime.date(3, 4, 5)] 

This way works.

Extract value by specifying index and column

print(df['2020']['a']) #index and column including 2020


YMDHMS
2020-01-02 03:04:05    a0
Name: a, dtype: object
print(df.loc['2020']['a']) #loc, as well

YMDHMS
2020-01-02 03:04:05    a0
Name: a, dtype: object
print(df.loc[datetime.datetime(2020, 1, 2, 3, 4,5)]['a']) #with datetime.datetime

a0
display(df.iloc[0][0]) #iloc when specifying the position by row number and column number

a0
display(df.iloc[0][0:2]) #with slice

a    a0
b    b0
Name: 2020-01-02 03:04:05, dtype: object
display(df.at[datetime.datetime(2020,1,2, 3, 4, 5), 'a']) #with at

a0
display(df[['b', 'c']]) #multiple columns work

                      b   c
YMDHMS                     
2020-01-02 03:04:05  b0  c0
2021-06-07 08:09:10  b1  c1
2022-11-12 13:14:15  b2  c2
display(df.iat[0, 1]) #with iat, by row number and column number

b0
display(df.at[df.index[0], 'a']) #row number and column name

a0
display(df.at[df['2020'], 'a']) #This gives an error...

Confirm and set the time zone; tzinfo(), tz_localize(), tz_convert()

print(df.index.tzinfo) #confirm the time zone

None #not set yet

#set the time zone
df.index = df.index.tz_localize('Asia/Tokyo')
df.index = df.index.tz_convert('Asia/Tokyo')
print(df.index.tzinfo)

Asia/Tokyo #done!

print(df) #You can see that +09:00 has been added.

                            a   b   c
YMDHMS                               
2020-01-02 03:04:05+09:00  a0  b0  c0
2021-06-07 08:09:10+09:00  a1  b1  c1
2022-11-12 13:14:15+09:00  a2  b2  c2

Convert python datetime.datetime object to pandas datetime (Timestamp); to_datetime ()

mae = datetime.datetime(2020,1,2,3,4,5)
print(mae)

2020-01-02 03:04:05
print(type(mae))

<class 'datetime.datetime'>


ato = pd.to_datetime(mae)
print(ato)

2020-01-02 03:04:05

print(type(ato))

<class 'pandas._libs.tslibs.timestamps.Timestamp'>

ちょっと広告です
https://business.xserver.ne.jp/

https://www.xdomain.ne.jp/

★LOLIPOP★

.tokyo

MuuMuu Domain!