data analysis hacks

How can Numpy and Pandas be used to preprocess data for predictive analysis?

ANS: By using pandas, we can organize our data in neat data structures for sorting and processing, then we can use numpy for its specific math functions to be used for the actual predictive data analysis.

What machine learning algorithms can be used for predictive analysis, and how do they differ?

ANS: I looked up some and found 2 examples: linear regression and k-means clustering. linear regression differs in that it simply just finds the line of best fit, so it would be good for information like age or simple numbers. k-means clustering finds this average by using clusters of data, not by determining a line of best fit.

Can you discuss some real-world applications of predictive analysis in different industries?

ANS: An example of this would be using predictive analysis to predict how many covid cases a certain country will gain per day, to determine the best course of action to take. Another example would be predicting weather patterns or climate change patterns to make well informed decisions in the matter.

Can you explain the role of feature engineering in predictive analysis, and how it can improve model accuracy?

ANS: Feature enginnering is used in predictive analysis to improve upon it's design and help it make better decisions. with Feature engineering, we can design specific features or tools like APIS or deployments to run these in real time to improve upon their designs.

How can machine learning models be deployed in real-time applications for predictive analysis?

ANS: We can use AWS or other methods of deployment to set up our models, or we can set it up in a shell and run it with a standard language like python and such.

Can you discuss some limitations of Numpy and Pandas, and when it might be necessary to use other data analysis tools?

ANS: Pandas and Numpy are both limited by their language, meaning we couldn't benefit from other languages and their potential gains in data analysis. Still, these tools are pretty simple to use and are good for basic machine learning algorithms.

How can predictive analysis be used to improve decision-making and optimize business processes?

ANS: with predictive analysis, we can predict things like what items to stock in a store, what are the trending items, or what items were least purchased. We basically use this to make decisions based on previous data, that would be hard to deduce without algorithms.

pandas hacks

Questions

  1. What are the two primary data structures in pandas and how do they differ?

ANS: the 2 primary data structures in pandas are CSV files and databases. For example, you can use pandas to sort through a csv file with a bunch of data, or you can even sort through an Sqlite database and sort out the data there.

  1. How do you read a CSV file into a pandas DataFrame?

ANS: you will first need to use pd.read_csv() which will allow you to read a csv file and do things like print it out or organize it in a graph.

  1. How do you select a single column from a pandas DataFrame?

ANS: just print the csv file, but make sure to use a bracket after it with the specific column you wanted to print out.

  1. How do you filter rows in a pandas DataFrame based on a condition?

ANS: just take the csv and print it out, however make sure that you specify after the csv what condition you are using. for example, using groupby after your csv would allow you to group by a certain value or something, just make sure to specify what.

  1. How do you group rows in a pandas DataFrame by a particular column?

ANS: you can use groupby() to group rows by a particular column, like grouping the users in a csv file by their age.

  1. How do you aggregate data in a pandas DataFrame using functions like sum and mean?

ANS: after using groupby() to get a specific set of data, you can then use functions like sum to find the sum of your group. for example, you can use df.agg['sum'] to find the sum of rows in our dataset.

  1. How do you handle missing values in a pandas DataFrame?

ANS: there are a couple of things you could do. You could try using .isnull() to tell you which values dont exist, then you can go in and deal with them with a number of ways. You can then go to that specific index of the CSV and set it to NaN, which will render it as a value that is no longer taken into account. (basically null)

  1. How do you merge two pandas DataFrames together?

we can use .concat() to merge 2 panda dataframes or csvs together. If the columns are not matching, the new columns will be added in the new database. if we are combining a column a and a column b, the 2 will be next 2 each other in the final dataframe.

  1. How do you export a pandas DataFrame to a CSV file?

ANS: the easiest way is by using .to_csv(), which will easily convert it to a csv file.

  1. What is the difference between a Series and a DataFrame in Pandas?

ANS: a series is a 1 dimensional data structure, capable of storing any data. However, it will be unorganized since it is just 1 dimensional. A dataframe would be basically a framework for the series, putting them into neat columns for easier viewing and overall a better way to store data. It is also 2 dimensional compared to the 1 dimensional series.

PANDAS CODE HACKS

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# read the CSV file
df = pd.read_csv('files/games.csv').sort_values(by=['user_reviews'], ascending=False)

da = pd.read_csv('files/dataforapcsp.csv')
y = df.head(75)

price = y

price.plot(kind='scatter', x = 'positive_ratio', y = 'price_final')

# set the title and axis labels

plt.title("positive ratio of steam game reviews and prices")

# show the chart
plt.show()
print(y['title'])

print(da) # tried making my own dataset, wasn't that successful
19041    Counter-Strike: Global Offensive
19489                 PUBG: BATTLEGROUNDS
8746                               Dota 2
5347                   Grand Theft Auto V
19473     Tom Clancy's Rainbow Six® Siege
                       ...               
28943              SCP: Secret Laboratory
2505                       Risk of Rain 2
17538                  Deep Rock Galactic
3003                         BeamNG.drive
3596                         DOOM Eternal
Name: title, Length: 75, dtype: object
   number1  number2  Unnamed: 2
0        1        2         NaN