Working with CSV files in Python
Introduction
CSV (Comma Separated Values) is a basic file format for tabular data. Most programs create CSV files. They allow you to handle data from spreadsheets and databases. For example, tabular data can be exported to a CSV file and imported into a spreadsheet to analyze, graph, or publish data mining results.
These files have extensive support from several programming languages. CSV files may be directly accessed and manipulated by text file input and string manipulation language.
Prerequisites
- Have Python IDE installed.
- Have some background information in the python programming language.
Table of contents
- Introduction
- Prerequisites
- Table of contents
- Write a CSV file
- Read a CSV file
- Working with large CSV files in Python
- Convert multiple JSON files to CSV files
- Creating a data frame using CSV files
- Conclusion
Write a CSV file
Python provides a built-in CSV module. This module contains two CSV writing classes:
- Using
CSV.writer
class - Using
CSV.DictWriter
class
Using csv.writer class
csv.writer
writes data to a CSV file. By default, user data is transformed into a delimited string. The CSV file object will not be recognized if quoted fields do not include \n
.
To write to a CSV file, use the writer class. The writer class is a subclass of the csv.Dialect
class. The csv.Dialect
class provides a set of parameters that can be used to customize the CSV file.
writerow()
: This technique writes one row. This technique may create a field row.writerows()
: This technique writes numerous rows at once. This is for row lists.
To illustrate the use of the writerow()
class, let's create a sample CSV file, student_file.csv
, as shown below.
import csv
with open('student_file.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(["2021", "Student details"])
writer.writerow([1, "carteblanche kin", "computer science"])
writer.writerow([2, "Marion koech", "data science"])
The code above will output the following file-
2021, Student details
1, carteblanche kin, computer science
2, Marion koech, data science
To illustrate the use of the writerows()
class, let's create a sample CSV file, student_file.csv
as shown below:
import csv
csv_rowlist = [["2021", "Student details"], [1, "carteblanche kin", "computer science"],
[2, "Marion koech", "data science"]]
with open('protagonist.csv', 'w') as file:
writer = csv.writer(file)
writer.writerows(csv_rowlist)
The program's output is the same as in the writerow()
example above.
Using csv.DictWriter class
This class builds a column-to-dictionary writer object. This class supports two CSV writing methods. They are:
-
writeheader()
: A simple CSV file with field names you choose publishes the first row. -
writerows()
:writerows
function writes all rows with just values.
Read a CSV file
The CSV module or pandas library can read CSV files. To be able to read CSV files, one can use one of the below methods:
Using csv.reader():
The CSV file is first opened using the open()
function in r
mode (specify read mode when opening a file) and then read using the reader()
method of the CSV module.
The
with
keyword simplifies exception handling and immediately ends the CSV file.
Using CSV.DictReader() class:
The CSV file is opened using open()
, then read using the CSV module's DictReader
class, which works like a reader but converts CSV data to a dictionary. The file's first line contains dictionary keys.
Using pandas.read_csv() method:
Using pandas library methods to read a CSV file is straightforward. Consider the fonteds.csv
CSV file. This is the file used to illustrate one of the methods.
import pandas
csvFile = pandas.read_csv('fonteds.csv')
print(csvFile)
The output
SCHOOL CEO YEAR
0 KU JACOB MUDAVAI 2010
1 MUST GARETH JASON 2015
2 MMU DICKSON NJOGU 2013
From the code above, import pandas
is used to import the pandas module, csvFile = pandas.read_csv('fonteds.csv')
is used to read the fonteds.csv
file and print(csvFile)
is used to output the read csv file.
Reading from a specific row
Here we will create a CSV file with multiple rows and columns to illustrate how to read from a specific row.
Create a CSV file by entering the below data in your notepad and saving it as student-data.csv
. The file will be used to show how to manipulate the CSV files.
RegNo Name Course year-of-study Department
001 James BCS 2.1 Computing
002 John BFF 1.2 IT
003 Christine BSS 4.2 SPAS
004 Lilian BCOM 3.1 Business
005 Beth BIT 2.2 IT
To read a specific row in the CSV file, we use the read_cv
function from Panda's library. The example below illustrates how to read from a specific row.
# Import pandas
import pandas as pd
# Specify the file location of our CSV file
data = pd.read_csv('File-location/student-data.csv')
# Extract top four data of the specified rows
print (data[0:4]['year-of-study'])
The above code will output the following information:
0 2.1
1 1.2
2 4.2
3 3.1
Name: Name, dtype: float64
Read a precise column
The pandas library's read_csv
method may additionally read specified columns. This is done using the .loc()
multi-axes indexing function. First, let's look into an example program. This example will show the Name
and Course
columns for all rows.
We will use the
student-data.csv
file from the previous example.
# Import the pandas module
import pandas as pd
# Specify the file location of our CSV file
data = pd.read_csv('File-location/student-data.csv')
print (data.loc[:,['Name','Course']])
The output
Name Course
0 James BCS
1 John BFF
2 Christine BSS
3 Lilian BCOM
4 Beth BIT
Manipulate CSV files
Since you can't edit a CSV file while reading from it, you must create a new one and write to it.
We will use the
student-data.csv
file from the previous example.
From the student-data.csv
file above, the data is written using uppercase. To demonstrate how to edit and save CSV files, we will change the uppercase letters in our file to lower case letters.
with open('student-data.csv','r') as f:
with open('lowwer-case.csv','w') as ff:
ff.write(f.readline())
ff.write(f.read().lower())
The above code creates a new CSV file with all letters in it changed to lower case.
Working with large CSV files in Python
When dealing with CSV data, usually read it in using pandas before munging and analyzing it. However, reading huge files straight into pandas may be difficult (or impossible) on a consumer machine due to memory constraints.
While it is simple to load data from CSV files into a database, there may be situations where you don't have access to or don't want to set up a database server. However, if you need to look at data in these big files for a short time, here is one method to accomplish it using Python and pandas.
Here is a method for handling large.csv files. The dataset we will be using is gender_voice_dataset.
Using pandas.read_csv
Large files may be handled by reading them in manageable size pieces, processing them before reading the next part. The chunk size option determines the number of lines. This method returns an iterator. For processing, a section of the file is read at a time.
To read a dataset without chunks, use the code below:
import pandas as pd # import pandas module
import numpy as np # Import numpy module
import time # import time module
s_time = time.time() # This initilizes time module
df = pd.read_csv("voice.csv") # This captures the time taken to read data from our file
e_time = time.time()
print("Read without chunks: ", (e_time-s_time), "seconds") # print command is used to output the line specified while e_time-s_time outputs the time in seconds
df.sample(10) # This specifies the time taken, 10 seconds
Convert multiple JSON files to CSV files
A JSON file contains basic data structures and objects in JavaScript Object Notation (JSON). The most common use case is sending data between an internet app and a server.
A CSV file is created by concatenating, merging or joining several JSON files (at least one column must be the same in each file) and saving the result as a flattened data frame. The following sample will help you understand the task's whole procedure:
Example program:
We will input two JSON files and output a CSV. The used JSON files are:
first.json file
{
"NO":{ // Declaring the regestration number of the student
"001":11,
"002":12,
"003":13,
},
"Name":{// Declaring the name of the student
"001":"Kelvin",
"002":"Dennis",
"003":"John",
},
"Marks":{// Declaring the marks of the student
"001":80,
"002":84,
"003":30,
},
"Grade":{// Declaring the grade of the student
"0011":"A",
"002":"A",
"003":"D",
}
}
second.json file
{
"NO":{ // Declaring the regestration number of the student
"001":14,
"002":15,
"003":16,
},
"Name":{ // Declaring the name of the student
"001":"Mark",
"002":"James",
"003":"Avatar",
},
"Marks":{ // Declaring the marks of the student
"001":55,
"002":90,
"003":65,
},
"Grade":{ // Declaring the grade of the student
"0011":"C",
"002":"A",
"003":"B",
}
}
Follow the steps below to be able to convert
- Load JSON files with a pandas data frame.
- Merge the data frames.
- Create a CSV file from the concatenated data.
The result is shown in the code.
import pandas as pd
df1 = pd.read_json('first.json')#
print(df1)
df2 = pd.read_json('second.json')
print(df2)
df = pd.concat([df1,df2])
print(df)
df.to_csv("CSV.csv",index=False)
result = pd.read_csv("CSV.csv")
print(result)
The output:
Creating a data frame using CSV files
As with an excel file, CSV files include comma-separated values. Pandas is the core Python data science module. When evaluating data, we often deal with large datasets in CSV format. Creating a pandas data frame from CSV files is easy.
Download the example CSV file here.
A dataframe can be created by:
read_csv()
methodread_table()
methodcsv
module
Let's look at an example using the read_table()
method.
import pandas as pd
df = pd.read_table("dataframe.csv", delimiter =", ")
print(df.head())
Conclusion
Thank you for reading till the end. This tutorial taught us how to work with CSV files in Python. We learned how to write and read CSV files, work with large CSV files, convert multiple JSON files to CSV files, and finally, create a data frame using CSV files.
Happy coding!
Peer Review Contributions by: Odhiambo Paul