Reading csv file with multiple delimiters in pandas
data:image/s3,"s3://crabby-images/e48ca/e48ca9d26fb10e5d6d2c34676c7d98d896417ddd" alt="Reading csv file with multiple delimiters in pandas"
Introduction
This is a memorandum about reading a csv file with read_csv of Python pandas with multiple delimiters.
specifying the delimiter using sep (or delimiter) with stuffing these delimiters into “[]”
So I’ll try it right away.
Details
Suppose I have the following csv file (tempo.csv) and I want to read it as separated with some delimiters (the right side of the time has a tab).
s1,s2;s3,datetime f1,f2,f3 a,b;c,2020/07/27 03:00 1.2,3.4,5.6 d,e;f,2021/09/28 13:03 2.3,4.5,6.7 g,h;i,2022/11/29 23:45 3.4,5.6,7.8
Here, let’s use the following seven types of delimiters to separate them.
“,” “;” “/” ” ” (space) “:” “t”(tab) “.”
How to specify the delimiter with sep (or delimiter) is just writing multiple delimiters in [] like this.
sep = “[]”
And specify engine =’python’ together.
# import pandas
import pandas as pd
#specifying the delimiter with sep (or delimiter), put multiple delimiters into "[ ]" .
#and specify engine ='python'
df = pd.read_csv("tempo.csv", sep = "[,;/ :t.]", engine='python')
df
data:image/s3,"s3://crabby-images/7b193/7b1930549250e7462993950a2cda12dd5e6ee4a1" alt="csv_delimiter"
Done!
By the way, if you read a file without specifying anything, the default delimiter will be
“,”
Therefore
# import pandas
import pandas as pd
#default delimiter is ","
df = pd.read_csv("tempo.csv")
df
data:image/s3,"s3://crabby-images/800a9/800a95c4cec84dec56de0b4899031e1be5ae1cd0" alt="csv_default"
It will be like above.
And more;reorder header
You may already know by now… Reading a csv file as divided by multiple delimiters, the column header will be shifted and indexed weirdly….
data:image/s3,"s3://crabby-images/7b193/7b1930549250e7462993950a2cda12dd5e6ee4a1" alt="csv_delimiter"
So, replace the header with a list of column names according to the newly generated columns.
# import pandas
import pandas as pd
#Create a list of colmumn names in advance
cols = ["s1","s2","s3","year","month","day","h","m","f1-1","f1-2","f2-1","f2-2","f3-1","f3-2"]
#Specify the list of column names as "names"
df = pd.read_csv("tempo.csv", sep = "[,;/ :t.]", engine='python', names = cols)
#drop original header
df.drop(df.index[0], inplace = True)
df
data:image/s3,"s3://crabby-images/29463/294637e52502b6261258cd946528e110cfc6a4d7" alt="csv_revised"
All done!
Reference site. Thank you.
https://stackoverflow.com/questions/26551662/import-text-to-pandas-with-multiple-delimiters
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=delimiter%20csv
Environment
Python; 3.7.2
pandas; 1.0.5
ちょっと広告です
https://business.xserver.ne.jp/
data:image/s3,"s3://crabby-images/ac781/ac7818ffb13d4cbbbbcca37d763a6f1bcaf8a354" alt=""
https://www.xdomain.ne.jp/
data:image/s3,"s3://crabby-images/740e3/740e3a1791958ef7332e0824f8a23af90fe67347" alt=""
★LOLIPOP★
data:image/s3,"s3://crabby-images/8c414/8c41478137d36539f37657d6d7506ccffad2f84c" alt=""
.tokyo
data:image/s3,"s3://crabby-images/053f8/053f8db793371eaea37f0f96f3daf67a86446e28" alt=""
MuuMuu Domain!
data:image/s3,"s3://crabby-images/ab486/ab486dc431903a66ff73dfc608d61daf1804e888" alt=""