You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1147 lines
41 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"id": "global-nursery",
"metadata": {},
"source": [
"# TP4 : projet:data\n"
]
},
{
"cell_type": "markdown",
"id": "conditional-lobby",
"metadata": {},
"source": [
"Dans ce TP, nous allons charger notre base de donnée de la SAE 2.04\n"
]
},
{
"cell_type": "markdown",
"id": "6da5789a",
"metadata": {},
"source": [
"## Problématique :\n"
]
},
{
"cell_type": "markdown",
"id": "f0c31a3f",
"metadata": {},
"source": [
"### <span style=\"color: #FF0000\">**Qu'est ce qui fait qu'une voiture est vendue plus chère qu'une autre ?**</span>\n"
]
},
{
"cell_type": "markdown",
"id": "f64fb802",
"metadata": {},
"source": [
"## I/ Charger et explorer les données\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c0f0ed8f",
"metadata": {},
"outputs": [],
"source": [
"# On charge les données, avec la librairie Pandas:\n",
"import pandas as pd\n",
"import numpy as np\n",
"df = pd.read_csv(\"carDetailsV4.csv\", encoding=\"latin-1\")"
]
},
{
"cell_type": "markdown",
"id": "every-islam",
"metadata": {},
"source": [
"Nous affichons notre **DataFrame** pandas.\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "65ea7cfb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Make</th>\n",
" <th>Model</th>\n",
" <th>Price</th>\n",
" <th>Year</th>\n",
" <th>Kilometer</th>\n",
" <th>Fuel Type</th>\n",
" <th>Transmission</th>\n",
" <th>Location</th>\n",
" <th>Color</th>\n",
" <th>Owner</th>\n",
" <th>Seller Type</th>\n",
" <th>Engine</th>\n",
" <th>Max Power</th>\n",
" <th>Max Torque</th>\n",
" <th>Drivetrain</th>\n",
" <th>Length</th>\n",
" <th>Width</th>\n",
" <th>Height</th>\n",
" <th>Seating Capacity</th>\n",
" <th>Fuel Tank Capacity</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Honda</td>\n",
" <td>Amaze 1.2 VX i-VTEC</td>\n",
" <td>505000</td>\n",
" <td>2017</td>\n",
" <td>87150</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>Pune</td>\n",
" <td>Grey</td>\n",
" <td>First</td>\n",
" <td>Corporate</td>\n",
" <td>1198 cc</td>\n",
" <td>87 bhp @ 6000 rpm</td>\n",
" <td>109 Nm @ 4500 rpm</td>\n",
" <td>FWD</td>\n",
" <td>3990.0</td>\n",
" <td>1680.0</td>\n",
" <td>1505.0</td>\n",
" <td>5.0</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Maruti Suzuki</td>\n",
" <td>Swift DZire VDI</td>\n",
" <td>450000</td>\n",
" <td>2014</td>\n",
" <td>75000</td>\n",
" <td>Diesel</td>\n",
" <td>Manual</td>\n",
" <td>Ludhiana</td>\n",
" <td>White</td>\n",
" <td>Second</td>\n",
" <td>Individual</td>\n",
" <td>1248 cc</td>\n",
" <td>74 bhp @ 4000 rpm</td>\n",
" <td>190 Nm @ 2000 rpm</td>\n",
" <td>FWD</td>\n",
" <td>3995.0</td>\n",
" <td>1695.0</td>\n",
" <td>1555.0</td>\n",
" <td>5.0</td>\n",
" <td>42.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Hyundai</td>\n",
" <td>i10 Magna 1.2 Kappa2</td>\n",
" <td>220000</td>\n",
" <td>2011</td>\n",
" <td>67000</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>Lucknow</td>\n",
" <td>Maroon</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1197 cc</td>\n",
" <td>79 bhp @ 6000 rpm</td>\n",
" <td>112.7619 Nm @ 4000 rpm</td>\n",
" <td>FWD</td>\n",
" <td>3585.0</td>\n",
" <td>1595.0</td>\n",
" <td>1550.0</td>\n",
" <td>5.0</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Toyota</td>\n",
" <td>Glanza G</td>\n",
" <td>799000</td>\n",
" <td>2019</td>\n",
" <td>37500</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>Mangalore</td>\n",
" <td>Red</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1197 cc</td>\n",
" <td>82 bhp @ 6000 rpm</td>\n",
" <td>113 Nm @ 4200 rpm</td>\n",
" <td>FWD</td>\n",
" <td>3995.0</td>\n",
" <td>1745.0</td>\n",
" <td>1510.0</td>\n",
" <td>5.0</td>\n",
" <td>37.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Toyota</td>\n",
" <td>Innova 2.4 VX 7 STR [2016-2020]</td>\n",
" <td>1950000</td>\n",
" <td>2018</td>\n",
" <td>69000</td>\n",
" <td>Diesel</td>\n",
" <td>Manual</td>\n",
" <td>Mumbai</td>\n",
" <td>Grey</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>2393 cc</td>\n",
" <td>148 bhp @ 3400 rpm</td>\n",
" <td>343 Nm @ 1400 rpm</td>\n",
" <td>RWD</td>\n",
" <td>4735.0</td>\n",
" <td>1830.0</td>\n",
" <td>1795.0</td>\n",
" <td>7.0</td>\n",
" <td>55.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2054</th>\n",
" <td>Mahindra</td>\n",
" <td>XUV500 W8 [2015-2017]</td>\n",
" <td>850000</td>\n",
" <td>2016</td>\n",
" <td>90300</td>\n",
" <td>Diesel</td>\n",
" <td>Manual</td>\n",
" <td>Surat</td>\n",
" <td>White</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>2179 cc</td>\n",
" <td>138 bhp @ 3750 rpm</td>\n",
" <td>330 Nm @ 1600 rpm</td>\n",
" <td>FWD</td>\n",
" <td>4585.0</td>\n",
" <td>1890.0</td>\n",
" <td>1785.0</td>\n",
" <td>7.0</td>\n",
" <td>70.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2055</th>\n",
" <td>Hyundai</td>\n",
" <td>Eon D-Lite +</td>\n",
" <td>275000</td>\n",
" <td>2014</td>\n",
" <td>83000</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>Ahmedabad</td>\n",
" <td>White</td>\n",
" <td>Second</td>\n",
" <td>Individual</td>\n",
" <td>814 cc</td>\n",
" <td>55 bhp @ 5500 rpm</td>\n",
" <td>75 Nm @ 4000 rpm</td>\n",
" <td>FWD</td>\n",
" <td>3495.0</td>\n",
" <td>1550.0</td>\n",
" <td>1500.0</td>\n",
" <td>5.0</td>\n",
" <td>32.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2056</th>\n",
" <td>Ford</td>\n",
" <td>Figo Duratec Petrol ZXI 1.2</td>\n",
" <td>240000</td>\n",
" <td>2013</td>\n",
" <td>73000</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>Thane</td>\n",
" <td>Silver</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1196 cc</td>\n",
" <td>70 bhp @ 6250 rpm</td>\n",
" <td>102 Nm @ 4000 rpm</td>\n",
" <td>FWD</td>\n",
" <td>3795.0</td>\n",
" <td>1680.0</td>\n",
" <td>1427.0</td>\n",
" <td>5.0</td>\n",
" <td>45.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2057</th>\n",
" <td>BMW</td>\n",
" <td>5-Series 520d Luxury Line [2017-2019]</td>\n",
" <td>4290000</td>\n",
" <td>2018</td>\n",
" <td>60474</td>\n",
" <td>Diesel</td>\n",
" <td>Automatic</td>\n",
" <td>Coimbatore</td>\n",
" <td>White</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1995 cc</td>\n",
" <td>188 bhp @ 4000 rpm</td>\n",
" <td>400 Nm @ 1750 rpm</td>\n",
" <td>RWD</td>\n",
" <td>4936.0</td>\n",
" <td>1868.0</td>\n",
" <td>1479.0</td>\n",
" <td>5.0</td>\n",
" <td>65.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2058</th>\n",
" <td>Mahindra</td>\n",
" <td>Bolero Power Plus ZLX [2016-2019]</td>\n",
" <td>670000</td>\n",
" <td>2017</td>\n",
" <td>72000</td>\n",
" <td>Diesel</td>\n",
" <td>Manual</td>\n",
" <td>Guwahati</td>\n",
" <td>White</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1493 cc</td>\n",
" <td>70 bhp @ 3600 rpm</td>\n",
" <td>195 Nm @ 1400 rpm</td>\n",
" <td>RWD</td>\n",
" <td>3995.0</td>\n",
" <td>1745.0</td>\n",
" <td>1880.0</td>\n",
" <td>7.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2059 rows × 20 columns</p>\n",
"</div>"
],
"text/plain": [
" Make Model Price Year \\\n",
"0 Honda Amaze 1.2 VX i-VTEC 505000 2017 \n",
"1 Maruti Suzuki Swift DZire VDI 450000 2014 \n",
"2 Hyundai i10 Magna 1.2 Kappa2 220000 2011 \n",
"3 Toyota Glanza G 799000 2019 \n",
"4 Toyota Innova 2.4 VX 7 STR [2016-2020] 1950000 2018 \n",
"... ... ... ... ... \n",
"2054 Mahindra XUV500 W8 [2015-2017] 850000 2016 \n",
"2055 Hyundai Eon D-Lite + 275000 2014 \n",
"2056 Ford Figo Duratec Petrol ZXI 1.2 240000 2013 \n",
"2057 BMW 5-Series 520d Luxury Line [2017-2019] 4290000 2018 \n",
"2058 Mahindra Bolero Power Plus ZLX [2016-2019] 670000 2017 \n",
"\n",
" Kilometer Fuel Type Transmission Location Color Owner \\\n",
"0 87150 Petrol Manual Pune Grey First \n",
"1 75000 Diesel Manual Ludhiana White Second \n",
"2 67000 Petrol Manual Lucknow Maroon First \n",
"3 37500 Petrol Manual Mangalore Red First \n",
"4 69000 Diesel Manual Mumbai Grey First \n",
"... ... ... ... ... ... ... \n",
"2054 90300 Diesel Manual Surat White First \n",
"2055 83000 Petrol Manual Ahmedabad White Second \n",
"2056 73000 Petrol Manual Thane Silver First \n",
"2057 60474 Diesel Automatic Coimbatore White First \n",
"2058 72000 Diesel Manual Guwahati White First \n",
"\n",
" Seller Type Engine Max Power Max Torque \\\n",
"0 Corporate 1198 cc 87 bhp @ 6000 rpm 109 Nm @ 4500 rpm \n",
"1 Individual 1248 cc 74 bhp @ 4000 rpm 190 Nm @ 2000 rpm \n",
"2 Individual 1197 cc 79 bhp @ 6000 rpm 112.7619 Nm @ 4000 rpm \n",
"3 Individual 1197 cc 82 bhp @ 6000 rpm 113 Nm @ 4200 rpm \n",
"4 Individual 2393 cc 148 bhp @ 3400 rpm 343 Nm @ 1400 rpm \n",
"... ... ... ... ... \n",
"2054 Individual 2179 cc 138 bhp @ 3750 rpm 330 Nm @ 1600 rpm \n",
"2055 Individual 814 cc 55 bhp @ 5500 rpm 75 Nm @ 4000 rpm \n",
"2056 Individual 1196 cc 70 bhp @ 6250 rpm 102 Nm @ 4000 rpm \n",
"2057 Individual 1995 cc 188 bhp @ 4000 rpm 400 Nm @ 1750 rpm \n",
"2058 Individual 1493 cc 70 bhp @ 3600 rpm 195 Nm @ 1400 rpm \n",
"\n",
" Drivetrain Length Width Height Seating Capacity Fuel Tank Capacity \n",
"0 FWD 3990.0 1680.0 1505.0 5.0 35.0 \n",
"1 FWD 3995.0 1695.0 1555.0 5.0 42.0 \n",
"2 FWD 3585.0 1595.0 1550.0 5.0 35.0 \n",
"3 FWD 3995.0 1745.0 1510.0 5.0 37.0 \n",
"4 RWD 4735.0 1830.0 1795.0 7.0 55.0 \n",
"... ... ... ... ... ... ... \n",
"2054 FWD 4585.0 1890.0 1785.0 7.0 70.0 \n",
"2055 FWD 3495.0 1550.0 1500.0 5.0 32.0 \n",
"2056 FWD 3795.0 1680.0 1427.0 5.0 45.0 \n",
"2057 RWD 4936.0 1868.0 1479.0 5.0 65.0 \n",
"2058 RWD 3995.0 1745.0 1880.0 7.0 NaN \n",
"\n",
"[2059 rows x 20 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# display(df) produit un affichage \"spécial jupyter\" du contenu du DataFrame df\n",
"# Taper le nom d'une variable à la dernière ligne d'une cellule est un raccourci pour display.\n",
"df #.head(5) permet d'afficher juste les 5 premiers "
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "d846d8e4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 1198 cc\n",
"1 1248 cc\n",
"2 1197 cc\n",
"3 1197 cc\n",
"4 2393 cc\n",
" ... \n",
"2054 2179 cc\n",
"2055 814 cc\n",
"2056 1196 cc\n",
"2057 1995 cc\n",
"2058 1493 cc\n",
"Name: Engine, Length: 2059, dtype: object\n"
]
}
],
"source": [
"print(df[\"Engine\"])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "2aea6e9f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Type de données par colonne : \n",
" Make object\n",
"Model object\n",
"Price int64\n",
"Year int64\n",
"Kilometer int64\n",
"Fuel Type object\n",
"Transmission object\n",
"Location object\n",
"Color object\n",
"Owner object\n",
"Seller Type object\n",
"Engine object\n",
"Max Power object\n",
"Max Torque object\n",
"Drivetrain object\n",
"Length float64\n",
"Width float64\n",
"Height float64\n",
"Seating Capacity float64\n",
"Fuel Tank Capacity float64\n",
"dtype: object\n",
"\n",
"\n",
"Nb de lignes : 2059\n",
"Nb de colonnes : 20\n",
"\n",
"\n",
"Les colonnes les plus importantes pour nous sont : la marque, le modèle, le prix, le kilométrage et la puissance\n"
]
}
],
"source": [
"print(\"Type de données par colonne : \\n\", df.dtypes)\n",
"print(\"\\n\")\n",
"print(\"Nb de lignes : \", len(df))\n",
"print(\"Nb de colonnes : \", len(df.columns))\n",
"print(\"\\n\")\n",
"print(\"Les colonnes les plus importantes pour nous sont : la marque, le modèle, le prix, le kilométrage et la puissance\")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "b7406055",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'plt' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Input \u001b[0;32mIn [23]\u001b[0m, in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m prix \u001b[38;5;241m=\u001b[39m df[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mPrice\u001b[39m\u001b[38;5;124m\"\u001b[39m]\u001b[38;5;241m.\u001b[39mto_list()\n\u001b[0;32m----> 2\u001b[0m \u001b[43mplt\u001b[49m\u001b[38;5;241m.\u001b[39mhist(prix, bins\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m30\u001b[39m)\n\u001b[1;32m 3\u001b[0m plt\u001b[38;5;241m.\u001b[39mshow()\n\u001b[1;32m 5\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mPrix moyen d\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mun véhicule : \u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28mint\u001b[39m(df[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mPrice\u001b[39m\u001b[38;5;124m\"\u001b[39m]\u001b[38;5;241m.\u001b[39mmean()\u001b[38;5;241m.\u001b[39mround()), \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m₹\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
"\u001b[0;31mNameError\u001b[0m: name 'plt' is not defined"
]
}
],
"source": [
"prix = df[\"Price\"].to_list()\n",
"plt.hist(prix, bins=30)\n",
"plt.show()\n",
"\n",
"print(\"Prix moyen d'un véhicule : \", int(df[\"Price\"].mean().round()), \"₹\")\n",
"\n",
"print(\"Nombre de voitures en FWD : \", len(df[df[\"Drivetrain\"] == \"FWD\"]), \"nombre de voitures en RWD : \", len(df[df[\"Drivetrain\"] == \"RWD\"]), \"nombre de voitures en AWD : \", len(df[df[\"Drivetrain\"] == \"AWD\"]))\n",
"print(\"Soit en pourcentage : \", len(df[df[\"Drivetrain\"] == \"FWD\"])/len(df)*100, \"% de FWD, \", len(df[df[\"Drivetrain\"] == \"RWD\"])/len(df)*100, \"% de RWD et \", len(df[df[\"Drivetrain\"] == \"AWD\"])/len(df)*100, \"% de AWD\")"
]
},
{
"cell_type": "markdown",
"id": "5cc5c8ff",
"metadata": {},
"source": [
"## III/ Nettoyage et présentation de données\n"
]
},
{
"cell_type": "markdown",
"id": "3eedcf6a",
"metadata": {},
"source": [
"### Supprimer les colonnes non pertinentes"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "c068815f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Make</th>\n",
" <th>Model</th>\n",
" <th>Price</th>\n",
" <th>Year</th>\n",
" <th>Kilometer</th>\n",
" <th>Fuel Type</th>\n",
" <th>Transmission</th>\n",
" <th>Owner</th>\n",
" <th>Seller Type</th>\n",
" <th>Engine</th>\n",
" <th>Max Power</th>\n",
" <th>Max Torque</th>\n",
" <th>Drivetrain</th>\n",
" <th>Seating Capacity</th>\n",
" <th>Fuel Tank Capacity</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Honda</td>\n",
" <td>Amaze 1.2 VX i-VTEC</td>\n",
" <td>505000</td>\n",
" <td>2017</td>\n",
" <td>87150</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>First</td>\n",
" <td>Corporate</td>\n",
" <td>1198 cc</td>\n",
" <td>87 bhp @ 6000 rpm</td>\n",
" <td>109 Nm @ 4500 rpm</td>\n",
" <td>FWD</td>\n",
" <td>5.0</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Maruti Suzuki</td>\n",
" <td>Swift DZire VDI</td>\n",
" <td>450000</td>\n",
" <td>2014</td>\n",
" <td>75000</td>\n",
" <td>Diesel</td>\n",
" <td>Manual</td>\n",
" <td>Second</td>\n",
" <td>Individual</td>\n",
" <td>1248 cc</td>\n",
" <td>74 bhp @ 4000 rpm</td>\n",
" <td>190 Nm @ 2000 rpm</td>\n",
" <td>FWD</td>\n",
" <td>5.0</td>\n",
" <td>42.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Hyundai</td>\n",
" <td>i10 Magna 1.2 Kappa2</td>\n",
" <td>220000</td>\n",
" <td>2011</td>\n",
" <td>67000</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1197 cc</td>\n",
" <td>79 bhp @ 6000 rpm</td>\n",
" <td>112.7619 Nm @ 4000 rpm</td>\n",
" <td>FWD</td>\n",
" <td>5.0</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Toyota</td>\n",
" <td>Glanza G</td>\n",
" <td>799000</td>\n",
" <td>2019</td>\n",
" <td>37500</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1197 cc</td>\n",
" <td>82 bhp @ 6000 rpm</td>\n",
" <td>113 Nm @ 4200 rpm</td>\n",
" <td>FWD</td>\n",
" <td>5.0</td>\n",
" <td>37.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Toyota</td>\n",
" <td>Innova 2.4 VX 7 STR [2016-2020]</td>\n",
" <td>1950000</td>\n",
" <td>2018</td>\n",
" <td>69000</td>\n",
" <td>Diesel</td>\n",
" <td>Manual</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>2393 cc</td>\n",
" <td>148 bhp @ 3400 rpm</td>\n",
" <td>343 Nm @ 1400 rpm</td>\n",
" <td>RWD</td>\n",
" <td>7.0</td>\n",
" <td>55.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2054</th>\n",
" <td>Mahindra</td>\n",
" <td>XUV500 W8 [2015-2017]</td>\n",
" <td>850000</td>\n",
" <td>2016</td>\n",
" <td>90300</td>\n",
" <td>Diesel</td>\n",
" <td>Manual</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>2179 cc</td>\n",
" <td>138 bhp @ 3750 rpm</td>\n",
" <td>330 Nm @ 1600 rpm</td>\n",
" <td>FWD</td>\n",
" <td>7.0</td>\n",
" <td>70.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2055</th>\n",
" <td>Hyundai</td>\n",
" <td>Eon D-Lite +</td>\n",
" <td>275000</td>\n",
" <td>2014</td>\n",
" <td>83000</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>Second</td>\n",
" <td>Individual</td>\n",
" <td>814 cc</td>\n",
" <td>55 bhp @ 5500 rpm</td>\n",
" <td>75 Nm @ 4000 rpm</td>\n",
" <td>FWD</td>\n",
" <td>5.0</td>\n",
" <td>32.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2056</th>\n",
" <td>Ford</td>\n",
" <td>Figo Duratec Petrol ZXI 1.2</td>\n",
" <td>240000</td>\n",
" <td>2013</td>\n",
" <td>73000</td>\n",
" <td>Petrol</td>\n",
" <td>Manual</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1196 cc</td>\n",
" <td>70 bhp @ 6250 rpm</td>\n",
" <td>102 Nm @ 4000 rpm</td>\n",
" <td>FWD</td>\n",
" <td>5.0</td>\n",
" <td>45.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2057</th>\n",
" <td>BMW</td>\n",
" <td>5-Series 520d Luxury Line [2017-2019]</td>\n",
" <td>4290000</td>\n",
" <td>2018</td>\n",
" <td>60474</td>\n",
" <td>Diesel</td>\n",
" <td>Automatic</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1995 cc</td>\n",
" <td>188 bhp @ 4000 rpm</td>\n",
" <td>400 Nm @ 1750 rpm</td>\n",
" <td>RWD</td>\n",
" <td>5.0</td>\n",
" <td>65.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2058</th>\n",
" <td>Mahindra</td>\n",
" <td>Bolero Power Plus ZLX [2016-2019]</td>\n",
" <td>670000</td>\n",
" <td>2017</td>\n",
" <td>72000</td>\n",
" <td>Diesel</td>\n",
" <td>Manual</td>\n",
" <td>First</td>\n",
" <td>Individual</td>\n",
" <td>1493 cc</td>\n",
" <td>70 bhp @ 3600 rpm</td>\n",
" <td>195 Nm @ 1400 rpm</td>\n",
" <td>RWD</td>\n",
" <td>7.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2059 rows × 15 columns</p>\n",
"</div>"
],
"text/plain": [
" Make Model Price Year \\\n",
"0 Honda Amaze 1.2 VX i-VTEC 505000 2017 \n",
"1 Maruti Suzuki Swift DZire VDI 450000 2014 \n",
"2 Hyundai i10 Magna 1.2 Kappa2 220000 2011 \n",
"3 Toyota Glanza G 799000 2019 \n",
"4 Toyota Innova 2.4 VX 7 STR [2016-2020] 1950000 2018 \n",
"... ... ... ... ... \n",
"2054 Mahindra XUV500 W8 [2015-2017] 850000 2016 \n",
"2055 Hyundai Eon D-Lite + 275000 2014 \n",
"2056 Ford Figo Duratec Petrol ZXI 1.2 240000 2013 \n",
"2057 BMW 5-Series 520d Luxury Line [2017-2019] 4290000 2018 \n",
"2058 Mahindra Bolero Power Plus ZLX [2016-2019] 670000 2017 \n",
"\n",
" Kilometer Fuel Type Transmission Owner Seller Type Engine \\\n",
"0 87150 Petrol Manual First Corporate 1198 cc \n",
"1 75000 Diesel Manual Second Individual 1248 cc \n",
"2 67000 Petrol Manual First Individual 1197 cc \n",
"3 37500 Petrol Manual First Individual 1197 cc \n",
"4 69000 Diesel Manual First Individual 2393 cc \n",
"... ... ... ... ... ... ... \n",
"2054 90300 Diesel Manual First Individual 2179 cc \n",
"2055 83000 Petrol Manual Second Individual 814 cc \n",
"2056 73000 Petrol Manual First Individual 1196 cc \n",
"2057 60474 Diesel Automatic First Individual 1995 cc \n",
"2058 72000 Diesel Manual First Individual 1493 cc \n",
"\n",
" Max Power Max Torque Drivetrain Seating Capacity \\\n",
"0 87 bhp @ 6000 rpm 109 Nm @ 4500 rpm FWD 5.0 \n",
"1 74 bhp @ 4000 rpm 190 Nm @ 2000 rpm FWD 5.0 \n",
"2 79 bhp @ 6000 rpm 112.7619 Nm @ 4000 rpm FWD 5.0 \n",
"3 82 bhp @ 6000 rpm 113 Nm @ 4200 rpm FWD 5.0 \n",
"4 148 bhp @ 3400 rpm 343 Nm @ 1400 rpm RWD 7.0 \n",
"... ... ... ... ... \n",
"2054 138 bhp @ 3750 rpm 330 Nm @ 1600 rpm FWD 7.0 \n",
"2055 55 bhp @ 5500 rpm 75 Nm @ 4000 rpm FWD 5.0 \n",
"2056 70 bhp @ 6250 rpm 102 Nm @ 4000 rpm FWD 5.0 \n",
"2057 188 bhp @ 4000 rpm 400 Nm @ 1750 rpm RWD 5.0 \n",
"2058 70 bhp @ 3600 rpm 195 Nm @ 1400 rpm RWD 7.0 \n",
"\n",
" Fuel Tank Capacity \n",
"0 35.0 \n",
"1 42.0 \n",
"2 35.0 \n",
"3 37.0 \n",
"4 55.0 \n",
"... ... \n",
"2054 70.0 \n",
"2055 32.0 \n",
"2056 45.0 \n",
"2057 65.0 \n",
"2058 NaN \n",
"\n",
"[2059 rows x 15 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"del df[\"Color\"]\n",
"del df['Location']\n",
"del df['Length']\n",
"del df['Width']\n",
"del df['Height']\n",
"\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "8fe83d4b",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"carDetailsV4.csv\", encoding=\"latin-1\")\n",
"df1=df\n",
"df1[\"Engine\"] =df1[\"Engine\"].dropna()\n",
"df1[\"Engine\"] = df1[\"Engine\"].astype(str).apply(lambda x: x[:-3])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "69d69464",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1198\n",
"1 1248\n",
"2 1197\n",
"3 1197\n",
"4 2393\n",
" ... \n",
"2054 2179\n",
"2055 814\n",
"2056 1196\n",
"2057 1995\n",
"2058 1493\n",
"Name: Engine, Length: 2059, dtype: object"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1[\"Engine\"]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "6704d8d5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Make Model Price Year Kilometer \\\n",
"33 Honda CR-V 2.4 AT 860000 2013 67000 \n",
"69 Audi A4 2.0 TDI (143 bhp) 1250000 2012 50000 \n",
"94 Mercedes-Benz GLC 220 d Sport 3900000 2018 83400 \n",
"108 Honda Brio S MT 229000 2013 38175 \n",
"127 Tata Nexon EV XZ Plus 1375000 2021 16000 \n",
"... ... ... ... ... ... \n",
"1906 MG ZS EV Exclusive [2020-2021] 2100000 2020 38500 \n",
"1928 Porsche Cayenne 3.2 V6 Petrol 3600000 2014 43000 \n",
"1980 Maruti Suzuki Wagon R VXi 1.0 [2019-2019] 420000 2018 50000 \n",
"2009 Audi A4 2.0 TDI Sline 775000 2012 89000 \n",
"2025 Honda Accord 2.4 iVtec AT 195000 2008 57885 \n",
"\n",
" Fuel Type Transmission Location Color Owner Seller Type \\\n",
"33 Petrol Automatic Mumbai Brown First Individual \n",
"69 Diesel Automatic Mumbai White First Individual \n",
"94 Diesel Automatic Hyderabad White Second Individual \n",
"108 Petrol Manual Kolkata Blue First Individual \n",
"127 Electric Automatic Mumbai White First Individual \n",
"... ... ... ... ... ... ... \n",
"1906 Electric Automatic Delhi Blue First Individual \n",
"1928 Petrol Automatic Mumbai White Second Individual \n",
"1980 Petrol Manual Bhubaneswar White UnRegistered Car Individual \n",
"2009 Diesel Automatic Mohali Black Second Individual \n",
"2025 Petrol Automatic Delhi Beige First Individual \n",
"\n",
" Engine Max Power Max Torque Drivetrain Length Width Height \\\n",
"33 NaN NaN NaN NaN NaN NaN \n",
"69 NaN NaN NaN NaN NaN NaN \n",
"94 NaN NaN NaN NaN NaN NaN \n",
"108 NaN NaN NaN NaN NaN NaN \n",
"127 NaN NaN FWD 3993.0 1811.0 1606.0 \n",
"... ... ... ... ... ... ... ... \n",
"1906 NaN NaN FWD 4314.0 1809.0 1620.0 \n",
"1928 NaN NaN NaN NaN NaN NaN \n",
"1980 NaN NaN NaN 3655.0 1620.0 1675.0 \n",
"2009 NaN NaN NaN NaN NaN NaN \n",
"2025 NaN NaN NaN NaN NaN NaN \n",
"\n",
" Seating Capacity Fuel Tank Capacity \n",
"33 NaN NaN \n",
"69 NaN NaN \n",
"94 NaN NaN \n",
"108 NaN NaN \n",
"127 5.0 NaN \n",
"... ... ... \n",
"1906 5.0 NaN \n",
"1928 NaN NaN \n",
"1980 5.0 32.0 \n",
"2009 NaN NaN \n",
"2025 NaN NaN \n",
"\n",
"[80 rows x 20 columns]\n"
]
}
],
"source": [
"print(df[df['Engine']==''])"
]
},
{
"cell_type": "markdown",
"id": "24c19ef3",
"metadata": {},
"source": [
"#### Les données comprennent-elles des caractéristiques pertinentes pour la problématique ?\n"
]
},
{
"cell_type": "markdown",
"id": "fcfa439d",
"metadata": {},
"source": [
"Oui, de nombreuses caractériqtiques présente dans notre base de données peuvent influer sur le prix tel que la réputation de la marque, le nombre de kilomètrage, la puissance du vehicule, son type de carburant, son type de boite de vitesse.\n"
]
},
{
"cell_type": "markdown",
"id": "f40870ca",
"metadata": {},
"source": [
"#### Avez-vous calculé des statistiques de la base pour les colonnes clés ?\n"
]
},
{
"cell_type": "markdown",
"id": "3f10377d",
"metadata": {},
"source": [
"On va le faire\n"
]
},
{
"cell_type": "markdown",
"id": "3845c3c4",
"metadata": {},
"source": [
"## V/ Choisir les variables explicatives et la variable à expliquer : faire une régression et commenter les resultats\n"
]
},
{
"cell_type": "markdown",
"id": "8fbf061b",
"metadata": {},
"source": [
"les variables explicatives sont :\n",
"\n",
"- les marques et son model\n",
"- l'année\n",
"- le kilometrage\n",
"- le type de carburant\n",
"- le type de transmission (boite et type de motorisation)\n",
"- la ville où elle est disponible\n",
"- la puissance\n",
"- la taille\n",
"- la capacité de carburant\n",
"\n",
"La variable à expliquer sera le **prix**\n"
]
},
{
"cell_type": "markdown",
"id": "ad3c25e3",
"metadata": {},
"source": [
"### Afficher le pourcentage de chaque marque dans un camembert "
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "bc06a3c1",
"metadata": {},
"outputs": [],
"source": [
"marque = df[\"Make\"]"
]
},
{
"cell_type": "markdown",
"id": "3ec4b075",
"metadata": {},
"source": [
"### Quels sont les types de données présents (symbolique, numérique, etc.) ?"
]
},
{
"cell_type": "markdown",
"id": "cae0b428",
"metadata": {},
"source": [
"Les types de données présentes sont des valeurs numériques avec des unités (exemple : la puissance du véhicule), des chaînes de caractère (exemple : nom de la marque) ou des valeurs numériques simples (exemple : le prix)."
]
},
{
"cell_type": "markdown",
"id": "4fcb98c8",
"metadata": {},
"source": [
"### Est-il possible de ne garder que les colonnes pertinentes ?\n"
]
},
{
"cell_type": "markdown",
"id": "a3bf3a6f",
"metadata": {},
"source": [
"Oui, on peut supprimer les colonnes qui nous paraissent non pertinentes car elles n'affectent pas ou très peu le prix."
]
},
{
"cell_type": "markdown",
"id": "d2c0c065",
"metadata": {},
"source": [
"### Quels sont les colonnes qui semblent sans intérêt et peuvent être exclus ?"
]
},
{
"cell_type": "markdown",
"id": "4cc721a2",
"metadata": {},
"source": [
"On peut exclure les colonnes de la couleur, la longueur, la largeur, la hauteur et la localisation du véhicule car ces données ne sont pas très pertinente et n'affectent pas énormément le prix car tout les véhicules d'un même modèle ont ces mêmes valeurs sauf la couleur et la localisation."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}