How to create a DataFrame in Python?

A Data Frame is a two-dimension collection of data. It is a data structure where data is stored in tabular form. Datasets are arranged in rows and columns; we can store multiple datasets in the data frame. We can perform various arithmetic operations, such as adding column/row selection and columns/rows in the data frame.

In Python, a DataFrame, a pivotal component of the Pandas library, serves as a comprehensive two-dimensional data container. Resembling a table, it encapsulates data with clarity, employing rows and columns, each endowed with a distinctive index. Its versatility allows accommodation of diverse data types within columns, affording flexibility in handling complex datasets.

Pandas DataFrames empower users with an extensive array of functionalities. From the creation of structured data using dictionaries or other data structures to employing robust indexing for seamless data access, Pandas facilitates effortless data manipulation. The library provides an intuitive interface for executing operations such as filtering rows based on conditions, grouping data for aggregation, and performing statistical analyses with ease.

We can import the DataFrames from the external storage; these storages can be referred to as the SQL Database, CSV file, and an Excel file. We can also use the lists, dictionary, and from a list of dictionary, etc.

In this tutorial, we will learn to create the data frame in multiple ways. Let's understand these different ways.

First, we need to install the pandas library into the Python environment.

An empty dataframe

We can create a basic empty Dataframe. The dataframe constructor needs to be called to create the DataFrame. Let's understand the following example.

Example -

# Here, we are importing the pandas library as pd  
import pandas as pd  
# Here, we are Calling DataFrame constructor  
df = pd.DataFrame()  
print(df)   # here, we are printing the dataframe

Output:

Empty DataFrame
Columns: []
Index: []

Method - 2: Create a dataframe using List

We can create dataframe using a single list or list of lists. Let's understand the following example.

Example -

# Here, we are importing the pandas library as pd   
import pandas as pd    
# Here, we are declaring the string values in the list   
lst = ['Java', 'Python', 'C', 'C++',  
         'JavaScript', 'Swift', 'Go']    
# Here, we are calling DataFrame constructor on list  
dframe = pd.DataFrame(lst)  
print(dframe)     # here, we are printing the dataframe

Output:

0        Java
1      Python
2           C
3         C++
4   JavaScript
5       Swift
6          Go

Explanation:

Import Pandas: import pandas as pd imports the Pandas library and monikers it as pd for curtness.
Create List: lst is a rundown containing string values addressing programming dialects.
DataFrame Development: pd.DataFrame(lst) builds a DataFrame from the rundown lst. Of course, when a solitary rundown is given, Pandas makes a DataFrame with a solitary section.
Printing DataFrame: print(dframe) prints the subsequent DataFrame.

Method - 3: Create Dataframe from dict of ndarray/lists

The dict of ndarray/lists can be used to create a dataframe, all the ndarray must be of the same length. The index will be a range(n) by default; where n denotes the array length. Let's understand the following example.

Example -

# Here, we are importing the pandas library as pd
import pandas as pd  
# Here, we are assigning the data of lists.  
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}  
# Here, we are creating the DataFrame  
df = pd.DataFrame(data)    # here, we are printing the dataframe
# Here, we are printing the output.  
print(df)  # here, we are printing the dataframe

Output:

     Name  Age
0     Tom   20
1  Joseph   21
2   Krish   19
3    John   18

Explanation:

Import Pandas: import pandas as pd imports the Pandas library and monikers it as pd.
Create Dictionary: information is a word reference where keys are segment names ('Name' and 'Age'), and values are records containing relating information.
DataFrame Development: pd.DataFrame(data) builds a DataFrame from the word reference. The keys become section names, and the rundowns become the segments.
Printing DataFrame: print(df) prints the subsequent DataFrame.

Method - 4: Create a indexes Dataframe using arrays

Let's understand the following example to create the indexes dataframe using arrays.

Example -

# Here, we are implementing the DataFrame using arrays.  
import pandas as pd     # Here, we are importing the pandas library as pd
# Here, we are assigning the data of lists.  
data = {'Name':['Renault', 'Duster', 'Maruti', 'Honda City'], 'Ratings':[9.0, 8.0, 5.0, 3.0]}  
# Here, we are creating the pandas DataFrame.  
df = pd.DataFrame(data, index =['position1', 'position2', 'position3', 'position4'])  
# Here, we are printing the data  
print(df)  

Output:

               Name      Ratings
position1     Renault      9.0
position2      Duster      8.0
position3      Maruti      5.0
position4    Honda City      3.0

Explanation:

Import Pandas: import pandas as pd imports the Pandas library and monikers it as pd.
Create Dictionary: information is a word reference where keys are segment names ('Name' and 'Evaluations'), and values are records containing relating information.
DataFrame Development: pd.DataFrame(data, index=['position1', 'position2', 'position3', 'position4']) builds a DataFrame from the word reference. The predefined list is alloted to the lines.
Printing DataFrame: print(df) prints the subsequent DataFrame.

Method - 5: Create Dataframe from list of dicts

We can pass the lists of dictionaries as input data to create the Pandas dataframe. The column names are taken as keys by default. Let's understand the following example.

Example -

# Here, we are implementing an example to create  
# Pandas DataFrame by using the lists of dicts.  
import pandas as pd      # Here, we are importing the pandas library as pd
# Here, we are assigning the values to lists.  
data = [{'A': 10, 'B': 20, 'C':30}, {'x':100, 'y': 200, 'z': 300}]  
# Here, we are creating the DataFrame.  
df = pd.DataFrame(data)  
# Here, we are printing the data of the dataframe  
print(df)  

Output:

    A      B      C      x      y      z
0  10.0  20.0  30.0    NaN    NaN    NaN
1   NaN   NaN   NaN  100.0  200.0  300.0

Let's understand another example to create the pandas dataframe from list of dictionaries with both row index as well as column index.

Explanation:

Import Pandas: import pandas as pd imports the Pandas library and monikers it as pd.
Create List and Dictionary: information is a rundown where every component is a word reference addressing a column in the DataFrame. The keys of the word references become segment names.
DataFrame Development: pd.DataFrame(data) builds a DataFrame from the rundown of word references. The keys of the word references become sections, and the qualities become the information in the DataFrame.
Printing DataFrame: print(df) prints the subsequent DataFrame.

Example - 2:

# Here, we are importing the pandas library as pd
import pandas as pd  
# Here, we are assigning the values to the lists.  
data = [{'x': 1, 'y': 2}, {'A': 15, 'B': 17, 'C': 19}]  
# Here, we are declaring the two column indices, values same as the dictionary keys  
dframe1 = pd.DataFrame(data, index =['first', 'second'], columns =['x', 'y'])    
# Here, we are declaring the variable dframe1 with the parameters data and the indexes
# Here, we are declaring the two column indices with  
# one index with other name  
dframe2 = pd.DataFrame(data, index =['first', 'second'], columns =['x', 'y1'])  
# Here, we are declaring the variable dframe2 with the parameters data and the indexes
# Here, we are printing the first data frame i.e., dframe1 
print (dframe1, "\n")  
# Here, we are printing the first data frame i.e., dframe2 
print (dframe2)  

Output:

             x    y
first   1.0   2.0
second  NaN NaN 

             x    y1
first   1.0 NaN
second NaN NaN

Explanation:

The pandas library is utilized to make two unmistakable DataFrames, meant as dframe1 and dframe2, starting from a rundown of word references named information. These word references act as portrayals of individual lines inside the DataFrames, wherein the keys relate to segment names and the related qualities address the relevant information. The underlying DataFrame, dframe1, is started up with explicit line files ('first' and 'second') and section records ('x' and 'y'). Thusly, a second DataFrame, dframe2, is created using similar informational collection yet with a disparity in section files, explicitly signified as 'x' and 'y1'. The code closes by printing both DataFrames to the control center, clarifying the particular section designs of each DataFrame. This code fills in as an extensive outline of DataFrame creation and control inside the pandas library, offering experiences into how varieties in section records can be executed.

Example - 3

# The example is to create  
# Pandas DataFrame by passing lists of  
# Dictionaries and row indices.  
import pandas as pd      # Here, we are importing the pandas library as pd
# assign values to lists  
data = [{'x': 2, 'z':3}, {'x': 10, 'y': 20, 'z': 30}]  
# Creates padas DataFrame by passing  
# Lists of dictionaries and row index.  
dframe = pd.DataFrame(data, index =['first', 'second'])  
# Print the dataframe 
print(dframe)  

Output:

         x     y   z
first    2   NaN   3
second  10  20.0  30

Explanation:

In this Python code, a Pandas DataFrame is developed utilizing the pandas library by giving arrangements of word references and determining column records. The cycle starts with the import of the pandas library, assigned by the false name "pd" for brevity. Hence, a rundown of word references named information is characterized, where every word reference addresses a line of the DataFrame. The keys inside these word references mean the segment names, while the relating values indicate the important pieces of information.

The DataFrame, indicated as dframe, is then made utilizing the pd.DataFrame() constructor, consolidating the gave information and expressly setting the line records to 'first' and 'second'. The subsequent DataFrame displays an even design with sections named 'x', 'y', and 'z'. Any missing qualities are signified as "NaN."

Method - 6: Create Dataframe using the zip() function

The zip() function is used to merge the two lists. Let's understand the following example.

Example -

# The example is to create  
# pandas dataframe from lists using zip.  
import pandas as pd       # Here, we are importing the pandas library as pd 
# List1  
Name = ['tom', 'krish', 'arun', 'juli']  
# List2  
Marks = [95, 63, 54, 47]  
#  two lists.  
# and merge them by using zip().  
list_tuples = list(zip(Name, Marks))  
# Assign data to tuples.  
print(list_tuples)  
# Converting lists of tuples into  
# pandas Dataframe.  
dframe = pd.DataFrame(list_tuples, columns=['Name', 'Marks'])  
# Print data.  
print(dframe)  

Output:

[('john', 95), ('krish', 63), ('arun', 54), ('juli', 47)]
    Name  Marks
0   john     95
1  krish     63
2   arun     54
3   juli     47

Explanation:

This Python code shows the production of a Pandas DataFrame from two records, specifically 'Name' and 'Stamps', by utilizing the pandas library and the compress capability. Following the import of the pandas library, the 'Name' and 'Checks' records are characterized, addressing the ideal sections of the DataFrame. The zip capability is utilized to join comparing components from these rundowns into tuples, framing another rundown named list_tuples.

The code then, at that point, prints the rundown of tuples to give a brief look at the joined information. Consequently, a Pandas DataFrame named dframe is made utilizing the pd.DataFrame() constructor, wherein the rundown of tuples is changed into an organized even configuration. The segments 'Name' and 'Stamps' are unequivocally alloted during this DataFrame creation process.

Method - 7: Create Dataframe from Dicts of series

The dictionary can be passed to create a dataframe. We can use the Dicts of series where the subsequent index is the union of all the series of passed index value. Let's understand the following example.

Example -

# Pandas Dataframe from Dicts of series.  
import pandas as pd         # Here, we are importing the pandas library as pd
# Initialize data to Dicts of series.  
d = {'Electronics' : pd.Series([97, 56, 87, 45], index =['John', 'Abhinay', 'Peter', 'Andrew']),  
   'Civil' : pd.Series([97, 88, 44, 96], index =['John', 'Abhinay', 'Peter', 'Andrew'])}  
# creates Dataframe.  
dframe = pd.DataFrame(d)  
# print the data.  
print(dframe)  

Output:

        Electronics      Civil
John             97        97
Abhinay      56        88
Peter           87        44
Andrew      45        96

Explanation:

In this Python code, a Pandas DataFrame is made from word references of series utilizing the pandas library. Two subjects, 'Gadgets' and 'Common,' are addressed as sections, and individual scores with explicit files are coordinated into a DataFrame named dframe. The subsequent plain construction is printed to the control center, showing a compact technique for coordinating and investigating marked information utilizing Pandas.

In this tutorial, we have discussed the different ways to create the DataFrames.

Next TopicHow to develop a game in Python

← prev next →