Deal with Sequence of Data in Python

List

  • Simply Python built-in data structure that can be used as a container to hold a dynamically changing sequence of different data types (e.g., int, float, object)

  • Suited for dealing with a small amount of data

  • 'Lists are mutable, so they are naturally suitable for dealing with a dynamic sequence of data. Oftentimes, when I need to ‘remember’ some values while iterating through a for loop, I will create a list and just append the value to the list. In addition, a list allows a mixture of data types, which is useful when I have no clue about the upcoming data types.'


Pandas

  • Works well with Tabular data (like Excel Spreadsheets) and time series data

  • Consume more memory

  • Has better performance when number of rows is 500K or more

  • Indexing of pandas series is very slow

  • 'When it comes to tabular data with row index and column index, my go-to choice is pandas.DataFrame, as it allows flexible access to values using integer position or index.'



# Importing pandas library
import pandas as pd
  
# Creating and initializing a nested list
age = [['Aman', 95.5, "Male"], ['Sunny', 65.7, "Female"],
 ['Monty', 85.1, "Male"], ['toni', 75.4, "Male"]]
  
# Creating a pandas dataframe
df = pd.DataFrame(age, columns=['Name', 'Marks', 'Gender'])






NumPy

  • A numpy array is a grid of values (of the same type) that are indexed by a tuple of positive integers

  • Works well with numerical data and signals; suited for fast scientific computing

  • Memory efficient

  • Better performance when number of rows is 50K or less

  • Indexing of numpy Arrays is very fast



# Importing Numpy package
import numpy as np
  
# Creating a 3-D numpy array using np.array()
org_array = np.array([[23, 46, 85],
 [43, 56, 99],
 [11, 34, 55]])








(Source: Jiahui Wang. Python List, NumPy, and Pandas. 2019.)



Reading list

19 views0 comments

Recent Posts

See All

PCA vs. ICA "In layman terms PCA helps to compress data and ICA helps to separate data." -- Prof. Luis Argerich Principal component analysis: a review and recent development Intro to PCA and ICA (PPT)

Datasets/format Motion & HR data from Apple watch with PSG ground truth (Python code for training machine learning models with this dataset) Sleep Bioradiolocation Database Large-scale cohort PSG data

Start with the basics A few useful things to know about machine learning Parameters and hyperparameters (Andrew Ng's video) Grid search for model tuning Discretization An Introduction to Discretizatio