{ "cells": [ { "cell_type": "markdown", "id": "global-nursery", "metadata": {}, "source": [ "# TP4 : projet:data\n" ] }, { "cell_type": "markdown", "id": "conditional-lobby", "metadata": {}, "source": [ "Dans ce TP, nous allons charger notre base de donnée de la SAE 2.04\n" ] }, { "cell_type": "markdown", "id": "6da5789a", "metadata": {}, "source": [ "## Problématique :\n" ] }, { "cell_type": "markdown", "id": "f0c31a3f", "metadata": {}, "source": [ "# **Qu'est ce qui fait qu'une voiture est vendue plus chère qu'une autre ?**\n" ] }, { "cell_type": "markdown", "id": "f64fb802", "metadata": {}, "source": [ "## I/ Charger et explorer les données\n" ] }, { "cell_type": "code", "execution_count": 176, "id": "c0f0ed8f", "metadata": {}, "outputs": [], "source": [ "# On charge les données, avec la librairie Pandas:\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "df = pd.read_csv(\"carDetailsV4.csv\", encoding=\"latin-1\")" ] }, { "cell_type": "markdown", "id": "every-islam", "metadata": {}, "source": [ "Nous affichons notre **DataFrame** pandas.\n" ] }, { "cell_type": "code", "execution_count": 177, "id": "65ea7cfb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MakeModelPriceYearKilometerFuel TypeTransmissionLocationColorOwnerSeller TypeEngineMax PowerMax TorqueDrivetrainLengthWidthHeightSeating CapacityFuel Tank Capacity
0HondaAmaze 1.2 VX i-VTEC505000201787150PetrolManualPuneGreyFirstCorporate1198 cc87 bhp @ 6000 rpm109 Nm @ 4500 rpmFWD3990.01680.01505.05.035.0
1Maruti SuzukiSwift DZire VDI450000201475000DieselManualLudhianaWhiteSecondIndividual1248 cc74 bhp @ 4000 rpm190 Nm @ 2000 rpmFWD3995.01695.01555.05.042.0
2Hyundaii10 Magna 1.2 Kappa2220000201167000PetrolManualLucknowMaroonFirstIndividual1197 cc79 bhp @ 6000 rpm112.7619 Nm @ 4000 rpmFWD3585.01595.01550.05.035.0
3ToyotaGlanza G799000201937500PetrolManualMangaloreRedFirstIndividual1197 cc82 bhp @ 6000 rpm113 Nm @ 4200 rpmFWD3995.01745.01510.05.037.0
4ToyotaInnova 2.4 VX 7 STR [2016-2020]1950000201869000DieselManualMumbaiGreyFirstIndividual2393 cc148 bhp @ 3400 rpm343 Nm @ 1400 rpmRWD4735.01830.01795.07.055.0
...............................................................
2054MahindraXUV500 W8 [2015-2017]850000201690300DieselManualSuratWhiteFirstIndividual2179 cc138 bhp @ 3750 rpm330 Nm @ 1600 rpmFWD4585.01890.01785.07.070.0
2055HyundaiEon D-Lite +275000201483000PetrolManualAhmedabadWhiteSecondIndividual814 cc55 bhp @ 5500 rpm75 Nm @ 4000 rpmFWD3495.01550.01500.05.032.0
2056FordFigo Duratec Petrol ZXI 1.2240000201373000PetrolManualThaneSilverFirstIndividual1196 cc70 bhp @ 6250 rpm102 Nm @ 4000 rpmFWD3795.01680.01427.05.045.0
2057BMW5-Series 520d Luxury Line [2017-2019]4290000201860474DieselAutomaticCoimbatoreWhiteFirstIndividual1995 cc188 bhp @ 4000 rpm400 Nm @ 1750 rpmRWD4936.01868.01479.05.065.0
2058MahindraBolero Power Plus ZLX [2016-2019]670000201772000DieselManualGuwahatiWhiteFirstIndividual1493 cc70 bhp @ 3600 rpm195 Nm @ 1400 rpmRWD3995.01745.01880.07.0NaN
\n", "

2059 rows × 20 columns

\n", "
" ], "text/plain": [ " Make Model Price Year \\\n", "0 Honda Amaze 1.2 VX i-VTEC 505000 2017 \n", "1 Maruti Suzuki Swift DZire VDI 450000 2014 \n", "2 Hyundai i10 Magna 1.2 Kappa2 220000 2011 \n", "3 Toyota Glanza G 799000 2019 \n", "4 Toyota Innova 2.4 VX 7 STR [2016-2020] 1950000 2018 \n", "... ... ... ... ... \n", "2054 Mahindra XUV500 W8 [2015-2017] 850000 2016 \n", "2055 Hyundai Eon D-Lite + 275000 2014 \n", "2056 Ford Figo Duratec Petrol ZXI 1.2 240000 2013 \n", "2057 BMW 5-Series 520d Luxury Line [2017-2019] 4290000 2018 \n", "2058 Mahindra Bolero Power Plus ZLX [2016-2019] 670000 2017 \n", "\n", " Kilometer Fuel Type Transmission Location Color Owner \\\n", "0 87150 Petrol Manual Pune Grey First \n", "1 75000 Diesel Manual Ludhiana White Second \n", "2 67000 Petrol Manual Lucknow Maroon First \n", "3 37500 Petrol Manual Mangalore Red First \n", "4 69000 Diesel Manual Mumbai Grey First \n", "... ... ... ... ... ... ... \n", "2054 90300 Diesel Manual Surat White First \n", "2055 83000 Petrol Manual Ahmedabad White Second \n", "2056 73000 Petrol Manual Thane Silver First \n", "2057 60474 Diesel Automatic Coimbatore White First \n", "2058 72000 Diesel Manual Guwahati White First \n", "\n", " Seller Type Engine Max Power Max Torque \\\n", "0 Corporate 1198 cc 87 bhp @ 6000 rpm 109 Nm @ 4500 rpm \n", "1 Individual 1248 cc 74 bhp @ 4000 rpm 190 Nm @ 2000 rpm \n", "2 Individual 1197 cc 79 bhp @ 6000 rpm 112.7619 Nm @ 4000 rpm \n", "3 Individual 1197 cc 82 bhp @ 6000 rpm 113 Nm @ 4200 rpm \n", "4 Individual 2393 cc 148 bhp @ 3400 rpm 343 Nm @ 1400 rpm \n", "... ... ... ... ... \n", "2054 Individual 2179 cc 138 bhp @ 3750 rpm 330 Nm @ 1600 rpm \n", "2055 Individual 814 cc 55 bhp @ 5500 rpm 75 Nm @ 4000 rpm \n", "2056 Individual 1196 cc 70 bhp @ 6250 rpm 102 Nm @ 4000 rpm \n", "2057 Individual 1995 cc 188 bhp @ 4000 rpm 400 Nm @ 1750 rpm \n", "2058 Individual 1493 cc 70 bhp @ 3600 rpm 195 Nm @ 1400 rpm \n", "\n", " Drivetrain Length Width Height Seating Capacity Fuel Tank Capacity \n", "0 FWD 3990.0 1680.0 1505.0 5.0 35.0 \n", "1 FWD 3995.0 1695.0 1555.0 5.0 42.0 \n", "2 FWD 3585.0 1595.0 1550.0 5.0 35.0 \n", "3 FWD 3995.0 1745.0 1510.0 5.0 37.0 \n", "4 RWD 4735.0 1830.0 1795.0 7.0 55.0 \n", "... ... ... ... ... ... ... \n", "2054 FWD 4585.0 1890.0 1785.0 7.0 70.0 \n", "2055 FWD 3495.0 1550.0 1500.0 5.0 32.0 \n", "2056 FWD 3795.0 1680.0 1427.0 5.0 45.0 \n", "2057 RWD 4936.0 1868.0 1479.0 5.0 65.0 \n", "2058 RWD 3995.0 1745.0 1880.0 7.0 NaN \n", "\n", "[2059 rows x 20 columns]" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# display(df) produit un affichage \"spécial jupyter\" du contenu du DataFrame df\n", "# Taper le nom d'une variable à la dernière ligne d'une cellule est un raccourci pour display.\n", "df #.head(5) permet d'afficher juste les 5 premiers " ] }, { "cell_type": "code", "execution_count": 178, "id": "d846d8e4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1198 cc\n", "1 1248 cc\n", "2 1197 cc\n", "3 1197 cc\n", "4 2393 cc\n", " ... \n", "2054 2179 cc\n", "2055 814 cc\n", "2056 1196 cc\n", "2057 1995 cc\n", "2058 1493 cc\n", "Name: Engine, Length: 2059, dtype: object\n" ] } ], "source": [ "print(df[\"Engine\"])" ] }, { "cell_type": "code", "execution_count": 179, "id": "2aea6e9f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type de données par colonne : \n", " Make object\n", "Model object\n", "Price int64\n", "Year int64\n", "Kilometer int64\n", "Fuel Type object\n", "Transmission object\n", "Location object\n", "Color object\n", "Owner object\n", "Seller Type object\n", "Engine object\n", "Max Power object\n", "Max Torque object\n", "Drivetrain object\n", "Length float64\n", "Width float64\n", "Height float64\n", "Seating Capacity float64\n", "Fuel Tank Capacity float64\n", "dtype: object\n", "\n", "\n", "Nb de lignes : 2059\n", "Nb de colonnes : 20\n", "\n", "\n", "Les colonnes les plus importantes pour nous sont : la marque, le modèle, le prix, le kilométrage et la puissance\n" ] } ], "source": [ "print(\"Type de données par colonne : \\n\", df.dtypes)\n", "print(\"\\n\")\n", "print(\"Nb de lignes : \", len(df))\n", "print(\"Nb de colonnes : \", len(df.columns))\n", "print(\"\\n\")\n", "print(\"Les colonnes les plus importantes pour nous sont : la marque, le modèle, le prix, le kilométrage et la puissance\")" ] }, { "cell_type": "code", "execution_count": 180, "id": "b7406055", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEFCAYAAAAPCDf9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAARzklEQVR4nO3df4xlZ13H8feHbltElC3tpNbdla260RQiUjelSkII649SCNtEIEuMLLhmo1ZFMZFFExsxJCUaK/gDs6GVxZBCLWhXKOKmLUETWpgilP4AOpYf3U2hI/0BWBUXv/5xn623w8zOnbmzd+74vF/JzZzzPM8953tPZj/3zHPPPZuqQpLUhyetdwGSpMkx9CWpI4a+JHXE0Jekjhj6ktSRTetdwMmcc845tX379vUuQ5I2lNtvv/3fqmpmsb6pDv3t27czOzu73mVI0oaS5ItL9Tm9I0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHZnqb+SOa/uBD4w07gtXvvgUVyJJ08EzfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6siyoZ/kmiQPJrlzqO0Pk3wmyR1J/jbJ5qG+NySZS/LZJD8z1H5Ja5tLcmDNX4kkaVmjnOm/A7hkQdsR4FlV9SPA54A3ACS5ANgDPLM95y+SnJbkNODPgRcBFwCvbGMlSRO0bOhX1UeAhxa0/WNVHW+rtwJb2/Ju4N1V9V9V9XlgDrioPeaq6r6q+ibw7jZWkjRBazGn/wvAB9vyFuD+ob6jrW2p9m+TZH+S2SSz8/Pza1CeJOmEsUI/ye8Cx4F3rU05UFUHq2pnVe2cmZlZq81Kkhjj1spJXg28BNhVVdWajwHbhoZtbW2cpF2SNCGrOtNPcgnw28BLq+qxoa7DwJ4kZyY5H9gBfAz4OLAjyflJzmDwYe/h8UqXJK3Usmf6Sa4FXgCck+QocAWDq3XOBI4kAbi1qn6pqu5Kch1wN4Npn8ur6lttO78KfAg4Dbimqu46Ba9HknQSy4Z+Vb1ykearTzL+TcCbFmm/EbhxRdVJktaU38iVpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSPLhn6Sa5I8mOTOobanJzmS5N7286zWniRvTTKX5I4kFw49Z28bf2+Svafm5UiSTmaUM/13AJcsaDsA3FRVO4Cb2jrAi4Ad7bEfeBsM3iSAK4DnAhcBV5x4o5AkTc6yoV9VHwEeWtC8GzjUlg8Blw21v7MGbgU2JzkP+BngSFU9VFUPA0f49jcSSdIptto5/XOr6oG2/GXg3La8Bbh/aNzR1rZU+7dJsj/JbJLZ+fn5VZYnSVrM2B/kVlUBtQa1nNjewaraWVU7Z2Zm1mqzkiRWH/pfadM2tJ8PtvZjwLahcVtb21LtkqQJWm3oHwZOXIGzF7hhqP1V7Sqei4FH2zTQh4CfTnJW+wD3p1ubJGmCNi03IMm1wAuAc5IcZXAVzpXAdUn2AV8EXtGG3whcCswBjwGvAaiqh5L8AfDxNu6NVbXww2FJ0im2bOhX1SuX6Nq1yNgCLl9iO9cA16yoOknSmvIbuZLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1ZKzQT/KbSe5KcmeSa5M8Ocn5SW5LMpfkPUnOaGPPbOtzrX/7mrwCSdLIVh36SbYAvw7srKpnAacBe4A3A1dV1Q8CDwP72lP2AQ+39qvaOEnSBI07vbMJ+I4km4CnAA8ALwSub/2HgMva8u62TuvflSRj7l+StAKrDv2qOgb8EfAlBmH/KHA78EhVHW/DjgJb2vIW4P723ONt/NkLt5tkf5LZJLPz8/OrLU+StIhxpnfOYnD2fj7wvcB3ApeMW1BVHayqnVW1c2ZmZtzNSZKGjDO985PA56tqvqr+G3gf8Dxgc5vuAdgKHGvLx4BtAK3/acBXx9i/JGmFxgn9LwEXJ3lKm5vfBdwN3AK8rI3ZC9zQlg+3dVr/zVVVY+xfkrRC48zp38bgA9lPAJ9u2zoIvB54XZI5BnP2V7enXA2c3dpfBxwYo25J0ipsWn7I0qrqCuCKBc33ARctMvY/gZePsz9J0nj8Rq4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0JakjY4V+ks1Jrk/ymST3JPnxJE9PciTJve3nWW1skrw1yVySO5JcuDYvQZI0qnHP9N8C/ENV/TDwbOAe4ABwU1XtAG5q6wAvAna0x37gbWPuW5K0QqsO/SRPA54PXA1QVd+sqkeA3cChNuwQcFlb3g28swZuBTYnOW+1+5ckrdw4Z/rnA/PAXyX5lyRvT/KdwLlV9UAb82Xg3La8Bbh/6PlHW9sTJNmfZDbJ7Pz8/BjlSZIWGif0NwEXAm+rqucA/87/TeUAUFUF1Eo2WlUHq2pnVe2cmZkZozxJ0kLjhP5R4GhV3dbWr2fwJvCVE9M27eeDrf8YsG3o+VtbmyRpQlYd+lX1ZeD+JD/UmnYBdwOHgb2tbS9wQ1s+DLyqXcVzMfDo0DSQJGkCNo35/F8D3pXkDOA+4DUM3kiuS7IP+CLwijb2RuBSYA54rI2VJE3QWKFfVZ8Edi7StWuRsQVcPs7+JEnj8Ru5ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHVk7NBPclqSf0ny/rZ+fpLbkswleU+SM1r7mW19rvVvH3ffkqSVWYsz/dcC9wytvxm4qqp+EHgY2Nfa9wEPt/ar2jhJ0gSNFfpJtgIvBt7e1gO8ELi+DTkEXNaWd7d1Wv+uNl6SNCHjnun/CfDbwP+09bOBR6rqeFs/Cmxpy1uA+wFa/6Nt/BMk2Z9kNsns/Pz8mOVJkoatOvSTvAR4sKpuX8N6qKqDVbWzqnbOzMys5aYlqXubxnju84CXJrkUeDLw3cBbgM1JNrWz+a3AsTb+GLANOJpkE/A04Ktj7F+StEKrPtOvqjdU1daq2g7sAW6uqp8DbgFe1obtBW5oy4fbOq3/5qqq1e5fkrRyp+I6/dcDr0syx2DO/urWfjVwdmt/HXDgFOxbknQS40zvPK6qPgx8uC3fB1y0yJj/BF6+FvuTJK2O38iVpI4Y+pLUkTWZ3tnoth/4wEjjvnDli09xJZJ0anmmL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjqw69JNsS3JLkruT3JXkta396UmOJLm3/TyrtSfJW5PMJbkjyYVr9SIkSaMZ50z/OPBbVXUBcDFweZILgAPATVW1A7iprQO8CNjRHvuBt42xb0nSKqw69Kvqgar6RFv+OnAPsAXYDRxqww4Bl7Xl3cA7a+BWYHOS81a7f0nSyq3JnH6S7cBzgNuAc6vqgdb1ZeDctrwFuH/oaUdb28Jt7U8ym2R2fn5+LcqTJDVjh36SpwLvBX6jqr423FdVBdRKtldVB6tqZ1XtnJmZGbc8SdKQsUI/yekMAv9dVfW+1vyVE9M27eeDrf0YsG3o6VtbmyRpQsa5eifA1cA9VfXHQ12Hgb1teS9ww1D7q9pVPBcDjw5NA0mSJmDTGM99HvDzwKeTfLK1/Q5wJXBdkn3AF4FXtL4bgUuBOeAx4DVj7FuStAqrDv2q+mcgS3TvWmR8AZevdn/TYPuBD4w07gtXvvgUVyJJq+M3ciWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI6Mc8M1LcF79EiaVp7pS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI54yeY68tJOSZPmmb4kdcQz/Q1g1L8IwL8KJJ2cZ/qS1BFDX5I6YuhLUkcmPqef5BLgLcBpwNur6spJ1/D/mVcESTqZiYZ+ktOAPwd+CjgKfDzJ4aq6e5J1yDcHqVeTPtO/CJirqvsAkrwb2A0Y+lNqJVcOrYf1fFNa6zdO34g1CZMO/S3A/UPrR4HnDg9Ish/Y31a/keSzq9zXOcC/rfK562Ej1Ts1tebNIw1b13pHrPGEZWtd4fZOtan5XRjBRqoVxqv3GUt1TN11+lV1EDg47naSzFbVzjUoaSI2Ur0bqVbYWPVupFphY9W7kWqFU1fvpK/eOQZsG1rf2tokSRMw6dD/OLAjyflJzgD2AIcnXIMkdWui0ztVdTzJrwIfYnDJ5jVVddcp2t3YU0QTtpHq3Ui1wsaqdyPVChur3o1UK5yielNVp2K7kqQp5DdyJakjhr4kdWTDh36SS5J8NslckgOL9J+Z5D2t/7Yk29ehzBO1LFfrq5PMJ/lke/zietQ5VM81SR5McucS/Uny1vZ67khy4aRrHKpluVpfkOTRoWP7e5OucaiWbUluSXJ3kruSvHaRMdN0bEepdyqOb5InJ/lYkk+1Wn9/kTHTlAmj1Lu2uVBVG/bB4MPgfwW+HzgD+BRwwYIxvwL8ZVveA7xnimt9NfBn631ch+p5PnAhcOcS/ZcCHwQCXAzcNsW1vgB4/3of01bLecCFbfm7gM8t8rswTcd2lHqn4vi24/XUtnw6cBtw8YIxU5EJK6h3TXNho5/pP35bh6r6JnDitg7DdgOH2vL1wK4kmWCNJ4xS61Spqo8AD51kyG7gnTVwK7A5yXmTqe6JRqh1alTVA1X1ibb8deAeBt9WHzZNx3aUeqdCO17faKunt8fCq1WmJRNGrXdNbfTQX+y2Dgt/GR8fU1XHgUeBsydS3RJ1NIvVCvCz7c/565NsW6R/moz6mqbFj7c/oz+Y5JnrXQxAm1p4DoMzvGFTeWxPUi9MyfFNclqSTwIPAkeqaslju86ZAIxUL6xhLmz00P//5u+B7VX1I8AR/u9sROP7BPCMqno28KfA361vOZDkqcB7gd+oqq+tdz3LWabeqTm+VfWtqvpRBt/4vyjJs9arllGMUO+a5sJGD/1Rbuvw+Jgkm4CnAV+dSHVL1NF8W61V9dWq+q+2+nbgxyZU22ptmNtqVNXXTvwZXVU3AqcnOWe96klyOoMAfVdVvW+RIVN1bJerd9qOb6vjEeAW4JIFXdOSCU+wVL1rnQsbPfRHua3DYWBvW34ZcHO1T0cmbNlaF8zZvpTB3Ok0Owy8ql1pcjHwaFU9sN5FLSbJ95yYt01yEYPf/XX5h97quBq4p6r+eIlhU3NsR6l3Wo5vkpkkm9vydzD4vzs+s2DYtGTCSPWudS5M3V02V6KWuK1DkjcCs1V1mMEv618nmWPwQd+eKa7115O8FDjean31etR6QpJrGVyVcU6So8AVDD5ooqr+EriRwVUmc8BjwGvWp9KRan0Z8MtJjgP/AexZr3/owPOAnwc+3eZyAX4H+D6YvmPLaPVOy/E9DziUwX/Y9CTguqp6/zRmQjNKvWuaC96GQZI6stGndyRJK2DoS1JHDH1J6oihL0kdMfQlaUpkmRsHLhh71dBN2D6X5JGR9uHVO5I0HZI8H/gGg/sujfxN4iS/Bjynqn5hubGe6UvSlFjsxoFJfiDJPyS5Pck/JfnhRZ76SuDaUfaxob+cJUkdOAj8UlXdm+S5wF8ALzzRmeQZwPnAzaNszNCXpCnVbnL3E8DfDN39+cwFw/YA11fVt0bZpqEvSdPrScAj7S6cS9kDXL6SDUqSplC7hfXnk7wcHv9vNJ99or/N758FfHTUbRr6kjQl2o0DPwr8UJKjSfYBPwfsS/Ip4C6e+D/u7QHevZKb23nJpiR1xDN9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I68r9w7h0NWQ6N5QAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Prix moyen d'un véhicule : 1702992 ₹\n", "Nombre de voitures en FWD : 1330 nombre de voitures en RWD : 321 nombre de voitures en AWD : 272\n", "Soit en pourcentage : 64.59446333171442 % de FWD, 15.590092277804759 % de RWD et 13.210296260320545 % de AWD\n" ] } ], "source": [ "prix = df[\"Price\"].to_list()\n", "plt.hist(prix, bins=30)\n", "plt.show()\n", "\n", "print(\"Prix moyen d'un véhicule : \", int(df[\"Price\"].mean().round()), \"₹\")\n", "\n", "print(\"Nombre de voitures en FWD : \", len(df[df[\"Drivetrain\"] == \"FWD\"]), \"nombre de voitures en RWD : \", len(df[df[\"Drivetrain\"] == \"RWD\"]), \"nombre de voitures en AWD : \", len(df[df[\"Drivetrain\"] == \"AWD\"]))\n", "print(\"Soit en pourcentage : \", len(df[df[\"Drivetrain\"] == \"FWD\"])/len(df)*100, \"% de FWD, \", len(df[df[\"Drivetrain\"] == \"RWD\"])/len(df)*100, \"% de RWD et \", len(df[df[\"Drivetrain\"] == \"AWD\"])/len(df)*100, \"% de AWD\")" ] }, { "cell_type": "markdown", "id": "5cc5c8ff", "metadata": {}, "source": [ "## III/ Nettoyage et présentation de données\n" ] }, { "cell_type": "markdown", "id": "3eedcf6a", "metadata": {}, "source": [ "### Supprimer les colonnes non pertinentes" ] }, { "cell_type": "code", "execution_count": 181, "id": "c068815f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MakeModelPriceYearKilometerFuel TypeTransmissionLocationColorOwnerSeller TypeEngineMax PowerMax TorqueDrivetrainLengthWidthHeightSeating CapacityFuel Tank Capacity
31HyundaiCreta 1.6 SX Plus AT925000201666000DieselAutomaticRaipurBlackFirstIndividual1582 cc126 bhp @ 4000 rpm265 Nm @ 1900 rpmFWD4270.01780.01630.05.060.0
32FordEcosport Titanium+ 1.0L EcoBoost535000201528000PetrolManualMumbaiSilverFirstIndividual999 cc124 bhp @ 6000 rpm170 Nm @ 1400 rpmFWD3999.01765.01708.05.052.0
34HyundaiSantro GL (CNG)145000200972000CNGManualKanpurSilverSecondIndividual1086 cc62 bhp @ 5500 rpm96 Nm @ 3000 rpmFWD3565.01525.01590.05.035.0
\n", "
" ], "text/plain": [ " Make Model Price Year Kilometer \\\n", "31 Hyundai Creta 1.6 SX Plus AT 925000 2016 66000 \n", "32 Ford Ecosport Titanium+ 1.0L EcoBoost 535000 2015 28000 \n", "34 Hyundai Santro GL (CNG) 145000 2009 72000 \n", "\n", " Fuel Type Transmission Location Color Owner Seller Type Engine \\\n", "31 Diesel Automatic Raipur Black First Individual 1582 cc \n", "32 Petrol Manual Mumbai Silver First Individual 999 cc \n", "34 CNG Manual Kanpur Silver Second Individual 1086 cc \n", "\n", " Max Power Max Torque Drivetrain Length Width Height \\\n", "31 126 bhp @ 4000 rpm 265 Nm @ 1900 rpm FWD 4270.0 1780.0 1630.0 \n", "32 124 bhp @ 6000 rpm 170 Nm @ 1400 rpm FWD 3999.0 1765.0 1708.0 \n", "34 62 bhp @ 5500 rpm 96 Nm @ 3000 rpm FWD 3565.0 1525.0 1590.0 \n", "\n", " Seating Capacity Fuel Tank Capacity \n", "31 5.0 60.0 \n", "32 5.0 52.0 \n", "34 5.0 35.0 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "del df[\"Color\"]\n", "del df['Location']\n", "del df['Length']\n", "del df['Width']\n", "del df['Height']\n", "\n", "df = pd.read_csv(\"carDetailsV4.csv\", encoding=\"latin-1\")\n", "df=df.dropna(axis=0)\n", "\n", "#Permet d'afficher le dataframe\n", "display(df[30:33])\n", "\n", "df1=df\n", "# Permet de suppr les NAN\n", "df1[\"Engine\"] =df1[\"Engine\"].dropna()\n", "# Permet d'enlever les deux caractères cc\n", "df1[\"Engine\"] = df1[\"Engine\"].replace('cc', '')\n", "df1[\"Engine\"] = df1[\"Engine\"].astype(str).apply(lambda x: x[:-3])" ] }, { "cell_type": "code", "execution_count": 182, "id": "69d69464", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1198\n", "1 1248\n", "2 1197\n", "3 1197\n", "4 2393\n", " ... \n", "2053 1197\n", "2054 2179\n", "2055 814\n", "2056 1196\n", "2057 1995\n", "Name: Engine, Length: 1874, dtype: object" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"Engine\"]" ] }, { "cell_type": "code", "execution_count": 183, "id": "6ff99b5d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1198\n", "1 1248\n", "2 1197\n", "3 1197\n", "4 2393\n", " ... \n", "2053 1197\n", "2054 2179\n", "2055 814\n", "2056 1196\n", "2057 1995\n", "Name: Engine, Length: 1874, dtype: object" ] }, "execution_count": 183, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"Engine\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "ee792795", "metadata": {}, "outputs": [ { "ename": "", "evalue": "", "output_type": "error", "traceback": [ "\u001b[1;31mLe kernel n’a pas pu démarrer en raison de l''pygments.formatters' de module manquant. Envisagez d’installer ce module.\n", "\u001b[1;31mCliquez sur ici pour plus d’informations." ] } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 184, "id": "6704d8d5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Empty DataFrame\n", "Columns: [Make, Model, Price, Year, Kilometer, Fuel Type, Transmission, Location, Color, Owner, Seller Type, Engine, Max Power, Max Torque, Drivetrain, Length, Width, Height, Seating Capacity, Fuel Tank Capacity]\n", "Index: []\n" ] } ], "source": [ "print(df[df['Engine']==''])" ] }, { "cell_type": "markdown", "id": "24c19ef3", "metadata": {}, "source": [ "#### Les données comprennent-elles des caractéristiques pertinentes pour la problématique ?\n" ] }, { "cell_type": "markdown", "id": "fcfa439d", "metadata": {}, "source": [ "Oui, de nombreuses caractériqtiques présente dans notre base de données peuvent influer sur le prix tel que la réputation de la marque, le nombre de kilomètrage, la puissance du vehicule, son type de carburant, son type de boite de vitesse.\n" ] }, { "cell_type": "markdown", "id": "f40870ca", "metadata": {}, "source": [ "#### Avez-vous calculé des statistiques de la base pour les colonnes clés ?\n" ] }, { "cell_type": "markdown", "id": "3f10377d", "metadata": {}, "source": [ "On va le faire\n" ] }, { "cell_type": "markdown", "id": "3845c3c4", "metadata": {}, "source": [ "## V/ Choisir les variables explicatives et la variable à expliquer : faire une régression et commenter les resultats\n" ] }, { "cell_type": "markdown", "id": "8fbf061b", "metadata": {}, "source": [ "les variables explicatives sont :\n", "\n", "- les marques et son model\n", "- l'année\n", "- le kilometrage\n", "- le type de carburant\n", "- le type de transmission (boite et type de motorisation)\n", "- la ville où elle est disponible\n", "- la puissance\n", "- la taille\n", "- la capacité de carburant\n", "\n", "La variable à expliquer sera le **prix**\n" ] }, { "cell_type": "markdown", "id": "ad3c25e3", "metadata": {}, "source": [ "### Afficher le pourcentage de chaque marque dans un camembert " ] }, { "cell_type": "code", "execution_count": 185, "id": "bc06a3c1", "metadata": {}, "outputs": [], "source": [ "marque = df[\"Make\"]" ] }, { "cell_type": "markdown", "id": "3ec4b075", "metadata": {}, "source": [ "### Quels sont les types de données présents (symbolique, numérique, etc.) ?" ] }, { "cell_type": "markdown", "id": "cae0b428", "metadata": {}, "source": [ "Les types de données présentes sont des valeurs numériques avec des unités (exemple : la puissance du véhicule), des chaînes de caractère (exemple : nom de la marque) ou des valeurs numériques simples (exemple : le prix)." ] }, { "cell_type": "markdown", "id": "4fcb98c8", "metadata": {}, "source": [ "### Est-il possible de ne garder que les colonnes pertinentes ?\n" ] }, { "cell_type": "markdown", "id": "a3bf3a6f", "metadata": {}, "source": [ "Oui, on peut supprimer les colonnes qui nous paraissent non pertinentes car elles n'affectent pas ou très peu le prix." ] }, { "cell_type": "markdown", "id": "d2c0c065", "metadata": {}, "source": [ "### Quels sont les colonnes qui semblent sans intérêt et peuvent être exclus ?" ] }, { "cell_type": "markdown", "id": "4cc721a2", "metadata": {}, "source": [ "On peut exclure les colonnes de la couleur, la longueur, la largeur, la hauteur et la localisation du véhicule car ces données ne sont pas très pertinente et n'affectent pas énormément le prix car tout les véhicules d'un même modèle ont ces mêmes valeurs sauf la couleur et la localisation." ] }, { "cell_type": "markdown", "id": "6cd3a984", "metadata": {}, "source": [ "## Qu'est ce que la régression ?" ] }, { "cell_type": "markdown", "id": "60a1ce8d", "metadata": {}, "source": [ "Évolution qui ramène à un degré moindre.\n", "\n", "Une régression est basée sur l'idée qu'une variable dépendante est déterminée par une ou plusieurs variables indépendantes\n", "\n", "# Exemple de régression :" ] }, { "cell_type": "code", "execution_count": 186, "id": "1b0173e3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "la taille de notre échantillon est : (50,)\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "rng = np.random.RandomState(42) #pour générer les mêmes données\n", "\n", "#constituer un exmple de points aléatoires \n", "x = 10 * rng.rand(50) #genere un tbl de 50\n", "print('la taille de notre échantillon est :',x.shape)\n", "\n", "y=2*x-1 + rng.randn(50) # définir une relation entre x et y + bruit \n", "\n", "#afficher data y=f(x) [y en fonction de x] comme un nuage de points \n", "plt.scatter(x, y);" ] }, { "cell_type": "markdown", "id": "15f5561e", "metadata": {}, "source": [ "## Comparer deux véhicules" ] }, { "cell_type": "code", "execution_count": 187, "id": "846e7e8f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Prix moyen d'une Audi : 2808924 ₹\n", "\n", "Prix moyen d'une BMW : 4007202 ₹\n", "\n", "En moyenne, les BMW sont plus chers que les Audi\n", "\n", "Année moyenne d'un Audi : 2016 \n", "\n", "Année moyenne d'un BMW : 2018 \n", "\n", "La BMW est plus récent que l'Audi\n", "\n", "Kilométrage moyen d'un Audi : 53718 km\n", "\n", "Kilométrage moyen d'un BMW : 41790 km\n", "\n", "L'Audi a plus de kilomètres que le BMW\n", "\n", "Puissance moyenne d'un BMW : inf ch\n", "\n", "ligne : 0 1198\n", "1 1248\n", "2 1197\n", "3 1197\n", "4 2393\n", " ... \n", "2053 1197\n", "2054 2179\n", "2055 814\n", "2056 1196\n", "2057 1995\n", "Name: Engine, Length: 1874, dtype: object\n", "La BMW a plus de puissance que l'Audi\n", "\n" ] } ], "source": [ "# On compare le prix, l'année, le kilomètrage et la puissance de deux vehicules de marques différentes de df\n", "\n", "vehicule1 = df[df[\"Make\"] == \"Audi\"]\n", "vehicule2 = df[df[\"Make\"] == \"BMW\"]\n", "\n", "print(\"Prix moyen d'une Audi : \", int(vehicule1[\"Price\"].mean().round()), \"₹\\n\")\n", "print(\"Prix moyen d'une BMW : \", int(vehicule2[\"Price\"].mean().round()), \"₹\\n\")\n", "\n", "if vehicule1[\"Price\"].mean() > vehicule2[\"Price\"].mean():\n", " print(\"En moyenne, les Audi sont plus cher que les BMW\\n\")\n", "else:\n", " print(\"En moyenne, les BMW sont plus chers que les Audi\\n\")\n", "\n", "print(\"Année moyenne d'un Audi : \", int(vehicule1[\"Year\"].mean().round()), \"\\n\")\n", "print(\"Année moyenne d'un BMW : \", int(vehicule2[\"Year\"].mean().round()), \"\\n\")\n", "\n", "if vehicule1[\"Year\"].mean() > vehicule2[\"Year\"].mean():\n", " print(\"L'Audi est plus récent que le BMW\\n\")\n", "else:\n", " print(\"La BMW est plus récent que l'Audi\\n\")\n", "\n", "print(\"Kilométrage moyen d'un Audi : \", int(vehicule1[\"Kilometer\"].mean().round()), \"km\\n\")\n", "print(\"Kilométrage moyen d'un BMW : \", int(vehicule2[\"Kilometer\"].mean().round()), \"km\\n\")\n", "\n", "if vehicule1[\"Kilometer\"].mean() > vehicule2[\"Kilometer\"].mean():\n", " print(\"L'Audi a plus de kilomètres que le BMW\\n\")\n", "else:\n", " print(\"La BMW a plus de kilomètres que l'Audi\\n\")\n", " \n", "# print(vehicule1[\"Engine\"])\n", "\n", "# print(\"Puissance moyenne d'un Audi : \", int(vehicule1[\"Engine\"].mean().round()), \"ch\\n\")\n", "print(\"Puissance moyenne d'un BMW : \", float(vehicule2[\"Engine\"].mean().round()), \"ch\\n\")\n", "print(\"ligne : \", df[\"Engine\"])\n", "\n", "if vehicule1[\"Engine\"].mean() > vehicule2[\"Engine\"].mean():\n", " print(\"L'Audi a plus de puissance que le BMW\\n\")\n", "else:\n", " print(\"La BMW a plus de puissance que l'Audi\\n\")" ] }, { "cell_type": "markdown", "id": "104b1ad6", "metadata": {}, "source": [ "**On peut voir, que le prix est influé par l'année de sortie et le kilometrage. Ici, les BMW sont en moyenne plus récentes, ont en moyenne moins de kilomètrage et sont plus puissante ce qui peut expliquer leur prix plus élévé.**" ] }, { "cell_type": "code", "execution_count": null, "id": "dc0a7b57", "metadata": {}, "outputs": [], "source": [ "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.2" } }, "nbformat": 4, "nbformat_minor": 5 }