Deal with Sequence of Data in Python

List

  • Simply Python built-in data structure that can be used as a container to hold a dynamically changing sequence of different data types (e.g., int, float, object)

  • Suited for dealing with a small amount of data

  • 'Lists are mutable, so they are naturally suitable for dealing with a dynamic sequence of data. Oftentimes, when I need to ‘remember’ some values while iterating through a for loop, I will create a list and just append the value to the list. In addition, a list allows a mixture of data types, which is useful when I have no clue about the upcoming data types.'


Pandas

  • Works well with Tabular data (like Excel Spreadsheets) and time series data

  • Consume more memory

  • Has better performance when number of rows is 500K or more

  • Indexing of pandas series is very slow

  • 'When it comes to tabular data with row index and column index, my go-to choice is pandas.DataFrame, as it allows flexible access to values using integer position or index.'



# Importing pandas library
import pandas as pd
  
# Creating and initializing a nested list
age = [['Aman', 95.5, "Male"], ['Sunny', 65.7, "Female"],
 ['Monty', 85.1, "Male"], ['toni', 75.4, "Male"]]
  
# Creating a pandas dataframe
df = pd.DataFrame(age, columns=['Name', 'Marks', 'Gender'])






NumPy

  • A numpy array is a grid of values (of the same type) that are indexed by a tuple of positive integers

  • Works well with numerical data and signals; suited for fast scientific computing

  • Memory efficient

  • Better performance when number of rows is 50K or less

  • Indexing of numpy Arrays is very fast



# Importing Numpy package
import numpy as np
  
# Creating a 3-D numpy array using np.array()
org_array = np.array([[23, 46, 85],
 [43, 56, 99],
 [11, 34, 55]])








(Source: Jiahui Wang. Python List, NumPy, and Pandas. 2019.)



Reading list

3 views0 comments

Recent Posts

See All

Important Topics in Physiological Signal Processing

PCA vs. ICA "In layman terms PCA helps to compress data and ICA helps to separate data." -- Prof. Luis Argerich Principal component analysis: a review and recent development Intro to PCA and ICA (PPT)

Datasets, Checklist, Questionnaires, Code

Datasets/format Motion & HR data from Apple watch with PSG ground truth (Python code for training machine learning models with this dataset) Sleep Bioradiolocation Database Large-scale cohort PSG data

Important Topics in Machine Learning

Start with the basics A few useful things to know about machine learning Parameters and hyperparameters (Andrew Ng's video) Grid search for model tuning Discretization An Introduction to Discretizatio