before we start this portion of the lesson:

check if you have pip installed since we are going to be installing some libraries today!!!!!! if you arnt sure if you have pip, check it by running this command:

pip

if your terminal says "command not found" or something else on linux, run this:

python3 -m ensurepip --default-pip

Overview:

Pandas is a powerful tool in Python that is used for data analysis and manipulation. In this lesson, we will explore how to use Pandas to work with datasets, analyze them, and visualize the results.

Learning Objectives:

By the end of this lesson, students should be able to:

  • Understand what Pandas is and why it is useful for data analysis
  • Load data into Pandas and create tables to store it
  • Use different functions in Pandas to manipulate data, such as filtering, sorting, and grouping
  • Visualize data using graphs and charts

Question

Who here has used numpy????

(should be all odf you because all of you have used it in this class before. )

what is pandas?

this:

  • Pandas is a Python library used for data analysis and manipulation.
  • it can handle different types of data, including CSV files and databases.
  • it also allows you to create tables to store and work with your data.
  • it has functions for filtering, sorting, and grouping data to make it easier to work with.
  • it also has tools for visualizing data with graphs and charts.
  • it is widely used in the industry for data analysis and is a valuable skill to learn.
  • companies that use Pandas include JPMorgan Chase, Google, NASA, the New York Times, and many others.

Question #2 & 3:

  • which companies use pandas?
  • what is pandas?

but why is pandas useful?

  • it can provides tools for handling and manipulating tabular data, which is a common format for storing and analyzing data.
  • it can handle different types of data, including CSV files and databases.
  • it allows you to perform tasks such as filtering, sorting, and grouping data, making it easier to analyze and work with.
  • it has functions for handling missing data and can fill in or remove missing values, which is important for accurate data analysis.
  • it also has tools for creating visualizations such as graphs and charts, making it easier to communicate insights from the data.
  • it is fast and efficient, even for large datasets, which is important for time-critical data analysis.
  • it is widely used in the industry and has a large community of users and developers, making it easy to find support and resources.

Question #4:

  • why is pandas useful?

how do i flipping use it? its so hard, my puny brain cant understand it

it is actually really simple

here is numpy doing simple math:

import pandas as pd

df = pd.read_csv('files/example.csv')

print(df.head())

print("Average age:", df['Age'].mean())

females = df[df['Gender'] == 'Female']
print(females)

sorted_data = df.sort_values(by='Salary', ascending=False)
print(sorted_data)
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb Cell 10 in <cell line: 5>()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X12sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> # dummy code, just like you.
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X12sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a> import pandas as pd
----> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X12sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4'>5</a> df = pd.read_csv('files/example.csv')
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X12sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6'>7</a> print(df.head())
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X12sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8'>9</a> print("Average age:", df['Age'].mean())

File ~/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
    678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:581, in _read(filepath_or_buffer, kwds)
    578     return parser
    580 with parser:
--> 581     return parser.read(nrows)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1254, in TextFileReader.read(self, nrows)
   1252 nrows = validate_integer("nrows", nrows)
   1253 try:
-> 1254     index, columns, col_dict = self._engine.read(nrows)
   1255 except Exception:
   1256     self.close()

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:225, in CParserWrapper.read(self, nrows)
    223 try:
    224     if self.low_memory:
--> 225         chunks = self._reader.read_low_memory(nrows)
    226         # destructive to chunks
    227         data = _concatenate_chunks(chunks)

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:805, in pandas._libs.parsers.TextReader.read_low_memory()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:861, in pandas._libs.parsers.TextReader._read_rows()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:1960, in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 28, saw 367

uh oh!!! no pandas 😢

if see this error, enter these into your terminal:

pip install wheel
pip install pandas

on stack overflow, it said pandas is disturbed through pip as a wheel. so you need that too.

link to full forum if curious: https://stackoverflow.com/questions/33481974/importerror-no-module-named-pandas

ps: do this for this to work on ur laptop:

wget https://raw.githubusercontent.com/KKcbal/amongus/master/_notebooks/files/example.csv

example code on how to load a csv into a chart

import pandas as pd

# read the CSV file
df = pd.read_csv('files/example.csv')

# print the first five rows
print(df.head())

# define a function to assign each age to an age group
def assign_age_group(age):
    if age < 30:
        return '<30'
    elif age < 40:
        return '30-40'
    elif age < 50:
        return '40-50'
    else:
        return '>50'

# apply the function to the Age column to create a new column with age groups
df['Age Group'] = df['Age'].apply(assign_age_group)

# group by age group and count the number of people in each group
age_counts = df.groupby('Age Group')['Name'].count()

# print the age group counts
print(age_counts)
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb Cell 14 in <cell line: 4>()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X16sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> import pandas as pd
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X16sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a> # read the CSV file
----> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X16sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a> df = pd.read_csv('files/example.csv')
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X16sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5'>6</a> # print the first five rows
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X16sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6'>7</a> print(df.head())

File ~/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
    678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:581, in _read(filepath_or_buffer, kwds)
    578     return parser
    580 with parser:
--> 581     return parser.read(nrows)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1254, in TextFileReader.read(self, nrows)
   1252 nrows = validate_integer("nrows", nrows)
   1253 try:
-> 1254     index, columns, col_dict = self._engine.read(nrows)
   1255 except Exception:
   1256     self.close()

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:225, in CParserWrapper.read(self, nrows)
    223 try:
    224     if self.low_memory:
--> 225         chunks = self._reader.read_low_memory(nrows)
    226         # destructive to chunks
    227         data = _concatenate_chunks(chunks)

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:805, in pandas._libs.parsers.TextReader.read_low_memory()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:861, in pandas._libs.parsers.TextReader._read_rows()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:1960, in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 28, saw 367

how to manipulate the data in pandas.

import pandas as pd

# load the csv file
df = pd.read_csv('example.csv')

# print the first five rows
print(df.head())

# filter the data to include only people aged 30 or older
df_filtered = df[df['Age'] >= 30]

# sort the data by age in descending order
df_sorted = df.sort_values('Age', ascending=False)

# group the data by gender and calculate the mean age for each group
age_by_gender = df.groupby('Gender')['Age'].mean()

# print the filtered data
print(df_filtered)

# print the sorted data
print(df_sorted)

# print the mean age by gender
print(age_by_gender)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb Cell 16 in <cell line: 4>()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> import pandas as pd
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a> # load the csv file
----> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a> df = pd.read_csv('example.csv')
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5'>6</a> # print the first five rows
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6'>7</a> print(df.head())

File ~/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
    678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:575, in _read(filepath_or_buffer, kwds)
    572 _validate_names(kwds.get("names", None))
    574 # Create the parser.
--> 575 parser = TextFileReader(filepath_or_buffer, **kwds)
    577 if chunksize or iterator:
    578     return parser

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:933, in TextFileReader.__init__(self, f, engine, **kwds)
    930     self.options["has_index_names"] = kwds["has_index_names"]
    932 self.handles: IOHandles | None = None
--> 933 self._engine = self._make_engine(f, self.engine)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1217, in TextFileReader._make_engine(self, f, engine)
   1213     mode = "rb"
   1214 # error: No overload variant of "get_handle" matches argument types
   1215 # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
   1216 # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217 self.handles = get_handle(  # type: ignore[call-overload]
   1218     f,
   1219     mode,
   1220     encoding=self.options.get("encoding", None),
   1221     compression=self.options.get("compression", None),
   1222     memory_map=self.options.get("memory_map", False),
   1223     is_text=is_text,
   1224     errors=self.options.get("encoding_errors", "strict"),
   1225     storage_options=self.options.get("storage_options", None),
   1226 )
   1227 assert self.handles is not None
   1228 f = self.handles.handle

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/common.py:789, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    784 elif isinstance(handle, str):
    785     # Check whether the filename is to be opened in binary mode.
    786     # Binary mode does not support 'encoding' and 'newline'.
    787     if ioargs.encoding and "b" not in ioargs.mode:
    788         # Encoding
--> 789         handle = open(
    790             handle,
    791             ioargs.mode,
    792             encoding=ioargs.encoding,
    793             errors=errors,
    794             newline="",
    795         )
    796     else:
    797         # Binary mode
    798         handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'example.csv'

how do i put it into a chart 😩

here is how:

import pandas as pd
import matplotlib.pyplot as plt

# read the CSV file
df = pd.read_csv('files/example.csv')

# create a bar chart of the number of people in each age group
age_groups = ['<30', '30-40', '40-50', '>50']
age_counts = pd.cut(df['Age'], bins=[0, 30, 40, 50, df['Age'].max()], labels=age_groups, include_lowest=True).value_counts()
plt.bar(age_counts.index, age_counts.values)
plt.title('Number of people in each age group')
plt.xlabel('Age group')
plt.ylabel('Number of people')
plt.show()

# create a pie chart of the gender distribution
gender_counts = df['Gender'].value_counts()
plt.pie(gender_counts.values, labels=gender_counts.index, autopct='%1.1f%%')
plt.title('Gender distribution')
plt.show()

# create a scatter plot of age vs. income
plt.scatter(df['Age'], df['Income'])
plt.title('Age vs. Income')
plt.xlabel('Age')
plt.ylabel('Income')
plt.show()
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb Cell 18 in <cell line: 5>()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a> import matplotlib.pyplot as plt
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a> # read the CSV file
----> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4'>5</a> df = pd.read_csv('files/example.csv')
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6'>7</a> # create a bar chart of the number of people in each age group
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7'>8</a> age_groups = ['<30', '30-40', '40-50', '>50']

File ~/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
    678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:581, in _read(filepath_or_buffer, kwds)
    578     return parser
    580 with parser:
--> 581     return parser.read(nrows)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1254, in TextFileReader.read(self, nrows)
   1252 nrows = validate_integer("nrows", nrows)
   1253 try:
-> 1254     index, columns, col_dict = self._engine.read(nrows)
   1255 except Exception:
   1256     self.close()

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:225, in CParserWrapper.read(self, nrows)
    223 try:
    224     if self.low_memory:
--> 225         chunks = self._reader.read_low_memory(nrows)
    226         # destructive to chunks
    227         data = _concatenate_chunks(chunks)

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:805, in pandas._libs.parsers.TextReader.read_low_memory()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:861, in pandas._libs.parsers.TextReader._read_rows()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:1960, in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 28, saw 367

uh oh!!!! another error!??!!??!?! install this library:

pip install matplotlib
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# read the CSV file
df = pd.read_csv('files/example.csv')

# define age groups
age_groups = ['<30', '30-40', '40-50', '>50']

# create a new column with the age group for each person
df['Age Group'] = pd.cut(df['Age'], bins=[0, 30, 40, 50, np.inf], labels=age_groups, include_lowest=True)

# group by age group and count the number of people in each group
age_counts = df.groupby('Age Group')['Name'].count()

# create a bar chart of the age counts
age_counts.plot(kind='bar')

# set the title and axis labels
plt.title('Number of People in Each Age Group')
plt.xlabel('Age Group')
plt.ylabel('Number of People')

# show the chart
plt.show()
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb Cell 20 in <cell line: 6>()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X25sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a> import numpy as np
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X25sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4'>5</a> # read the CSV file
----> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X25sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5'>6</a> df = pd.read_csv('files/example.csv')
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X25sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7'>8</a> # define age groups
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/id/vscode/test-fastpages/_notebooks/2023-04-24-pandas.ipynb#X25sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8'>9</a> age_groups = ['<30', '30-40', '40-50', '>50']

File ~/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
    678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:581, in _read(filepath_or_buffer, kwds)
    578     return parser
    580 with parser:
--> 581     return parser.read(nrows)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1254, in TextFileReader.read(self, nrows)
   1252 nrows = validate_integer("nrows", nrows)
   1253 try:
-> 1254     index, columns, col_dict = self._engine.read(nrows)
   1255 except Exception:
   1256     self.close()

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:225, in CParserWrapper.read(self, nrows)
    223 try:
    224     if self.low_memory:
--> 225         chunks = self._reader.read_low_memory(nrows)
    226         # destructive to chunks
    227         data = _concatenate_chunks(chunks)

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:805, in pandas._libs.parsers.TextReader.read_low_memory()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:861, in pandas._libs.parsers.TextReader._read_rows()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:1960, in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 28, saw 367

magic!!!!!!

Hacks

  1. make your own data using your brian, google or chatgpt, should look different than mine.
  2. modify my code or write your own
  3. output your data other than a bar graph.
  4. write an 850+ word essay on how pandas, python or irl, affected your life. If AI score below 85%, then -1 grading point (CRINGE DONT USE)
  5. answer the questions below, the more explained the better.

Questions

  1. What are the two primary data structures in pandas and how do they differ?
  2. How do you read a CSV file into a pandas DataFrame?
  3. How do you select a single column from a pandas DataFrame?
  4. How do you filter rows in a pandas DataFrame based on a condition?
  5. How do you group rows in a pandas DataFrame by a particular column?
  6. How do you aggregate data in a pandas DataFrame using functions like sum and mean?
  7. How do you handle missing values in a pandas DataFrame?
  8. How do you merge two pandas DataFrames together?
  9. How do you export a pandas DataFrame to a CSV file?
  10. What is the difference between a Series and a DataFrame in Pandas?

note

all hacks due saturday night, the more earlier you get them in the higher score you will get. if you miss the due date, you will get a 0. there will be no tolerance.

no questions answered

Tonight- 2.9

Friday Night- 2.8

Saturday Night - 2.7

Sunday Night - 0.0

questions answered

Tonight- 3.0

Friday Night- 2.9

Saturday Night - 2.8

Sunday Night - 0.0

wdfasdf