{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# TP3 : régression linéaire" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [], "source": [ "import numpy as np #charger les bibliothèque\n", "\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "la taille de notre échantillon est : (50,)\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# un exemple simple x réel et y aussi \n", "\n", "rng = np.random.RandomState(42) #pour générer les mêmes données\n", "\n", "#constituer un exmple de points aléatoires \n", "x = 10 * rng.rand(50) #genere un tbl de 50\n", "print('la taille de notre échantillon est :',x.shape)\n", "\n", "y=2*x-1 + rng.randn(50) # définir une relation entre x et y + bruit \n", "\n", "#afficher data y=f(x) [y en fonction de x] comme un nuage de points \n", "plt.scatter(x, y);\n", "#observation(x,y), i = 1 ... 50\n", "y = a*x + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Y-a-t-il une relation entre $x$ et $y$ : trouver $f$ tel que $y=f(x)$ ? \n", "\n", "Pour répondre à cette question, nous allons supposer que $f$ est une fonction affine de la forme : \n", "\n", "$$f(x)=a*x+b$$\n", "\n", "avec $a$ et $b$ sont des inconnus (réels) à déterminer. \n", "\n", "On connaît $(x_i,y_i)$ pour $i=1 \\ldots 50$\n", "\n", "Et la relation $y_i=a*x_i + b,$ pour $i=1 \\ldots 50$ \n", "\n", "qui forme un système linéaire facile à résoudre (plus de données que d'inconnus)\n", "\n", "Pour résumer : le but est de trouver la droite \"la plus proche\" de l'ensemble (nuage) de points. Un bon critère pour vérifier \"la plus proche\" est minimiser l'erreur quadratique moyenne : \n", "\n", "$$\\frac{1}{n} \\sum_{i=1}^{n}(y_i - a*x_i -b)^2$$\n", "\n", "Après calcul on trouve les valeurs optimales : \n", "\n", "$\\hat{a}=\\frac{\\sigma_{xy}}{\\sigma_{x}^2}$ et $\\hat{b}=\\bar{y_n}-\\bar{x_n}*\\frac{\\sigma_{xy}}{\\sigma_{x}^2}$ avec : \n", "\n", "\n", "- $\\bar{y_n}= \\frac{1}{n} \\sum_{i=1}^{n} y_i$\n", "\n", "- $\\bar{x_n}= \\frac{1}{n} \\sum_{i=1}^{n} x_i$\n", "\n", "- $\\sigma_{y}= \\frac{1}{n} \\sum_{i=1}^{n} (y_i-\\bar{y_n})^2$\n", "\n", "- $\\sigma_{x}= \\frac{1}{n} \\sum_{i=1}^{n} (x_i-\\bar{x_n})^2$\n", "\n", "- $\\sigma_{xy}= \\frac{1}{n} \\sum_{i=1}^{n} (x_i-\\bar{x_n})(y_i-\\bar{y_n})$\n", "\n", "Au lieu de faire le calcul à la main, nous allons utiliser des bibliothèques python pour trouver la solution. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exemple1 : Formulation avec la biblio sklearn\n" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "la tailles des entrées est : (50, 1)\n" ] }, { "data": { "text/html": [ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LinearRegression()" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# On peut résoudre ce problème de régression linéaire avec sklearn \n", "# on choisit et on charge le modèle \n", "from sklearn.linear_model import LinearRegression\n", "\n", "X = x[:, np.newaxis]\n", "print('la tailles des entrées est :',X.shape)\n", "\n", "models = LinearRegression(fit_intercept=True)\n", "models.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "----- la solution -----\n", "la valeur trouvée de a est : 1.9776566003853122\n", "la valeur trouvée de b est : -0.9033107255311235\n" ] } ], "source": [ "a=models.coef_\n", "print('-'*5,'la solution','-'*5)\n", "print('la valeur trouvée de a est : ', a[0]) # = 2 de base\n", "\n", "b=models.intercept_\n", "print('la valeur trouvée de b est : ', b) # = -1 de base" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Si maintenat on a un nouveau $xnew=2.5$ qui est différent de tous les $x_i$ observés on peut trouver son image $ynew$ avec la relation : $$ynew=a*xnew+b$$" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4.04083078]\n" ] } ], "source": [ "#solution pour un seul point\n", "xnew=np.array([2.50])\n", "ynew = models.predict(xnew.reshape(-1, 1))\n", "print(ynew)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On peut aussi appliquer la même méthode sur xnew comme tableau de valeurs au lieu d'un seul scalaire\n" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-2.88096733 -0.02435224 2.83226285 5.68887794 8.54549303 11.40210812\n", " 14.25872321 17.1153383 19.97195339 22.82856848]\n" ] } ], "source": [ "#solution pour un tableau de points\n", "xnew=np.linspace(-1,12,10)\n", "#s'assurer d'avoir le bon format \n", "xnew=xnew[:, np.newaxis]\n", "\n", "ynew = models.predict(xnew)\n", "print(ynew)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vérification visuelle\n" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAX80lEQVR4nO3df2zc9X3H8efbjt1ipxTyYx6F+o4FxkTYRoth/TFV3VoqmlVAB0JN7CQV7Tzsdk07mgpmLW0SeUVrxUB0SWQokOIvdIHSEZX0B2KVWKWundOxloZRCr1zYITECU1TPGEnfu+P711w7Dvf2b677/frez0k63v3va/v3hjnlU8+388Pc3dERCR5GqIuQERE5kYBLiKSUApwEZGEUoCLiCSUAlxEJKEW1fLDli1b5ul0upYfKSKSeHv37h1x9+VTz9c0wNPpNENDQ7X8SBGRxDOzbKHz6kIREUkoBbiISEIpwEVEEkoBLiKSUApwEZGEUoCLiFTb0aOwcmV4rCAFuIhItT36KOzbB3v2VPRtFeAiItWyZg0sXgzr14fP160Ln69ZU5G3V4CLiFTLli3Q3g5NTeHzpiZIpWDr1oq8vQJcRKRazjsvDPHxcWhtDY+bN8OKFRV5ewW4iEg17doVhvfmzeHxwQcr9tY1XQtFRKTubNwId9wBbW3Q1QX791fsrRXgIiLVdOmlrz9uawu/KkRdKCIiCaUAFxFJKAW4iEhCKcBFRBJKAS4iklAKcBGRhFKAi4gklAJcRCShFOAiIgmlABcRSSgFuIhIQinARUQSSgEuIpJQCnARkYQqGeBm9lYz+76Z7TOzn5vZhtz5JWb2mJk9mzueWf1yRURmoUq7wcdFOS3w48CN7n4h8A7gE2Z2IXAT8Li7nw88nnsuIhIfVdoNPi5KBri7v+TuP8k9PgY8DZwNXAXszF22E7i6SjWKiMxOlXeDj4tZ9YGbWRp4G/AjoM3dX8q9dAAouM2EmXWb2ZCZDR06dGg+tYqIlKfKu8HHRdkBbmaLgW8An3b330x+zd0d8ELf5+4D7t7h7h3Lly+fV7EiImWp8m7wcVFWgJtZE2F4B+7+cO70y2Z2Vu71s4CD1SlRRGQOqrgbfFyUMwrFgK8CT7v7rZNe2g3kOphYDzxS+fJEROZo40Z45hm48cbwuHFj1BVVXDm70r8bWAv8zMyezJ37O+AWYJeZfQzIAtdVpUIRkbmo4m7wcVEywN39B4AVefl9lS1HRETKpZmYIiIJpQAXEUkoBbiISEIpwEVEEkoBLiKxEAQB6XSahoYG0uk0QRBEXVLslTOMUESkqnp7e9mxYwfhpG7IZrN0d3cD0NnZGWVpsaYWuIhEKgiCU8I7b3R0lL6+voiqSgYFuIjUXG9vL4sWLcLM6OrqmhbeecPDwzWuLFnUhSIiNdXb28v27dvLura9vb3K1SSbWuAiUlMDAwNlXWdm9Pf3V7maZFOAi0hNnThxouQ1ZsYNN9ygG5glKMBFpKYaGxuLvmZmpFIp7rvvPrZt21bDqpJJAS4iNZUfHjhVT08PExMTZDIZtbzLpAAXkYrJT8Y5w4xfNDez6847p12zbds2enp6TrbEGxsb6enpUYt7DqzY8J1q6Ojo8KGhoZp9nojUThAEdHd3Mzo6ymrgfuCjzc1cfvfdalHPk5ntdfeOqefVAheRiujr6+PO0VGOATtz5+4cG+PDa9cuuN3g40IBLiIVMTw8zCZgGBjLnRsDfuW+4HaDjwsFuIhURHt7O88Bm4Bm4Fju+JVlyxbcbvBxoQAXkYro7++npaWF64BXgS8Ao8BNCu+q0VR6EamI/I3Kwc9+lk8dOMAbUyne9bnPcc1ll0Vc2cKlFriIlFTuWt2dnZ3sfuklDriTyWS4prcXOqYNnpAKUQtcRGY0eXggaK3uOFELXERm1NfXdzK887RWdzwowEVkRsXW5NZa3dFTgIvIjIqtya21uqOnABeRGW9S5ocHTtbS0qK1umNAAS5S5/I3KbPZLO5+8iZlPsQ7OzsZGBgglUqdXO51YGBANzBjQItZidS5dDpNNpuddj6VSpHJZGpfkEyjxaxEpCDdpEyukgFuZneb2UEze2rSuS+Y2Ytm9mTua1V1yxSRSpna371kyZKC1+kmZfyVM5HnXuArwNemnP8nd/9yxSsSkaopNCmnubmZpqYmxsfHT16nm5TJULIF7u5PAEdqUIuIVFmhSTljY2OcfvrpukmZQPPpA/+kmf0018VyZrGLzKzbzIbMbOjQoUPz+DgRmY1CQwOL9WsfOXKETCajPSkTpqxRKGaWBr7l7hflnrcBI4ADW4Gz3P36Uu+jUSgitTG1qwTCbpHTTjuNw4cPT7teI07iraKjUNz9ZXc/4e4TwJ2A1osUiZFi65cAmpSzgMwpwM3srElPPww8VexaEamho0dh5Up+XWBcN4RdJZqUs3CUHIViZg8A7wWWmdkLwOeB95rZxYRdKBngr6tXooiU7dFHYd8+1i1bxh0jI9Nebm9vp7OzU4G9QJQMcHdfXeD0V6tQi4jM1Zo1sHs3vPYaALe98gpfBB4B8lGtrpKFRzMxRRaCLVugvR2amgBoaG5m/Oyz2fGWt6irZAHTjjwiC8F554Uhvno1tLbCa69xxm238cS110ZdmVSRWuAiCVFyX8pdu8Lw3rw5PD74YDSFSs2oBS6SAGXtS7lxI9xxB7S1QVcX7N8fVblSI1pOViQBtORrfdNysiIJpiVfpRAFuEjMFOrr1r6UUogCXCRGim1vtmrVKk2Bl2kU4CIxUmwNkz179mgKvEyjm5giMdLQ0EChP5NmxsTERAQVSRzoJqZIDGl7M5kPjQMXiYi2N5P5UgtcJCLa3kzmSy1wkRoJgoC+vj6Gh4dpb28vODEHwjW7RwosBSsylQJcpAYKdZeYWcEblurvlnKpC0WkBgp1l7g7ZnbKOfV3y2wowEVqoNiUd3dXf7fMmbpQRGqgWJ+3FqOS+VALXKQG+vv7NRVeKk4BLlIJud3gOXq04MudnZ2aCi8Vpy4UkUrI7QbPnj3htmYFaDd4qTS1wEXmIAgCli1bxv1m/NaM8Xwwr1sHixeHu8SLVJkCXGQW8sHd1dXF4cOH2QQMA2O51483NEAqBVu3Rlil1AsFuEiZgiDg+uuv5/DhwyfPPQdsApqBY4CPjYWbCq9YEVGVUk8U4CJlCIKAdevWMTY2Nu2164BXgS/kjtoNXmpFAS5SQn4afLH1uL8EXADcCrz/nHPC3eFFakCjUERKKDQNfrL8FiVNTU185pZboGPauvsiVaEWuEgJ5ez8vnTpUu655x4NE5SaUoCLlDDT6oBNTU0MDg4yMjKi8JaaU4CLlFBoGjyo1S3RKxngZna3mR00s6cmnVtiZo+Z2bO545nVLVMkOoWmwavVLXFQTgv8XuCKKeduAh539/OBx3PPRRJn6qbCQRAUvK6zs5NMJsPExASZTEbBLbFQMsDd/QngyJTTVwE7c493AldXtiyR6ssPD8xms7g72WyW7u7uoiEuEjdWaEunaReZpYFvuftFuee/dvczco8NeCX/vMD3dgPdAO3t7ZcU2wdQpNbS6bTW6JZEMLO97j5tfOq8b2J6+DdA0b8F3H3A3TvcvWP58uXz/TiROSnUVVJseGA5wwZF4mCuAf6ymZ0FkDserFxJIpXV29vL2rVrp3WVLFmypOD12lRYkmKuAb4bWJ97vB54pDLliFRWb28v27dvn7b7e35mpXbJkSQrZxjhA8APgQvM7AUz+xhwC3C5mT0LvD/3XCRWgiBgx44dRV8/cuSIdsmRRCvrJmaldHR0+NDQUOkLReYoCAL6+voYHh6moaGBEydOFL1WNyslKYrdxNRiVrJg5IcF5rtHZgpvM1NXiSSeptLLglFq1cDJbrjhBnWVSOIpwCXZJu0GX87wPzOjp6eHbdu21aA4kepSgEuyTdoNvtjwv8bGxpM3Ke+77z6FtywY6gOXZFqzBnbvhtdeC5+vW8ezjY083NjIRyb1fbe0tGhkiSxYaoFLIkydSfnIJZdAezs0NYUXNDXRtGIFb/zHf9SwQKkbGkYosTd1dAmELetvf/zjvGfbNnjDG8KW+AMPwLXXRlipSHVUbS0UkWorNLpkdHSU39x1F7S2wubN4VG7wUudUYBLbOW7TYqtYLlldBSeeQZuvDE8ajd4qTO6iSmxEwQBGzZs4PDhwzNedzCVgra28Elb2+uPReqEAlxipVB/dyFadEpEXSgSM+XMptToEpGQWuASK6VmU2oBKpHXqQUusTLTZgrqNhE5lQJcYqW/v3/aJgsAS5cuVbeJyBQKcImVzs7OaZssDA4OMjIyovAWmUIzMUVEYk4zMUVEFhgFuFTN1AWogiCIuiSRBUXDCKXiCs2kzGazdHd3A6gvW6RC1AKXisrPpCw0DX50dJS+vr4IqhJZmBTgUlGlZlKWs+2ZiJRHAS4VVSqgZ5qoIyKzowCXitJMSpHaUYDL3E3aET5PMylFakcBLnM3aUf4PM2kFKkdzcSU2Zu8I/zx47BoUbgv5ZVXwv33R12dyIKjmZgyZ+XsCE8qBVu3RluoSJ1RgEtRvb29NDQ00NXVRTabxd3JZrOs2bSJJy6/HMbHw82Ex8fDjYVXrIi6ZJG6Mq8AN7OMmf3MzJ40M/WNLCC9vb1s376dQl1s2hFeJB4qMZX+z9x9pALvIzHQ29vLwMAAJ06cmPG6LaOjfOjAgXAj4a4u2L+/RhWKSJ7WQpGT8q3ucmhHeJHozbcP3IHvmdleM+sudIGZdZvZkJkNHTp0aJ4fJ9U0MDBQ1nWakCMSD/MN8D9197cDHwQ+YWbvmXqBuw+4e4e7dyxfvnyeHyfVVKrbBGDx4sWakCMSE/MKcHd/MXc8CHwTuKwSRUk0GhsbZ3ytp6eHY8eOKbxFYmLOAW5mrWb2pvxj4APAU5UqTKpjpk0W8ut1T9XT08Px48fZtm1brcoUkTLM5yZmG/BNM8u/z/3u/p2KVCVVkV+rO7/c69RNFvIBnR+F0tjYSHd3t4JbJKY0lb6OpNNpstnstPOpVIpMJlP7gkSkLJpKL0XX6tYmCyLJpACvI8XW6tYmCyLJpACvI4XW6taYbpHkUoDXkUJrdWtMt0hyKcAXgJmGBk7V2dlJJpNhYmKCTCaj8BZJMK2FknClhgaKyMKlFnhC5VvdXV1dJ8M7b3R0lL6+vogqE5FaUQs8gaa2ugvR0ECRhU8t8AS65eab+fHoKKfPcI2GBoosfArwhAiCgMWLF2Nm/OH+/awEVhW5VkMDReqDAjwBgiBg3bp1DLz6KseAnbnzXwOOAZPHnGhooEj9UB94zAVBwPr165mYmGATcDGQApqAMSAL/D1hq1vBLVJf1AKPsfzNyvxGC88Bm4BmwpZ3M/B54IRa3SJ1SS3wmCm1qfB1wKvAVsKW93XAQ1pJUKQuqQUeI/lNhWfa2uxLwAXArbnj89dcU6PqRCRuFOARmzwNvpwd4YeAg0BDQwPX9PRw00MPVb1GEYkndaFEqJwJOVPpZqWI5KkFHqG+vr6yw1urB4rIVArwGpvcZVJoe7NCenp6tHqgiEyjLpQamm2XiTYVFpGZKMBrqJwuE/Vxi0i51IVSQzOtEKg+bhGZLQV4lRTaJafYCoGpVEp93CIyawrwKsj3dWezWdz95C45q1at0qbCIlIxCvAqKNTXPTo6yp49e7SpsIhUjLl7zT6so6PDh4aGavZ5UWloaKDQz9XMmJiYiKAiEUkyM9vr7h1Tz6sFXgXF+rq1S46IVJICfB4K3agE6O/vV1+3iFSdxoHPQRAEbNiwgcOHD588l79RCZzs0+7r62N4eJj29nb6+/vV1y0iFaU+8FkqNZsylUqR0frcIlJBVekDN7MrzOwZM/ulmd00n/cqJgggnYaGhvAYBKW+ozp6e3/AksZ9XNzVz6LRHwKrC14302SdSojLzyMOdcShBtWhOiKtw93n9AU0Eu7y9XuEu3v9N3DhTN9zySWX+GwMDrq3tLjD618tLeH5WhkcHPTW1r9y+K2vJnAH/wj3O/zWYbUDp3ylUqkq1hL9zyMudcShBtWhOmpVBzDkhXK40MlyvoB3At+d9Pxm4OaZvme2AZ5Knfofnv+qYkaeYnBw0FtaWjzgSj9Gq4+xyB18jEV+jFYPuPKU8G5pafHBKv6GRP3ziFMdcahBdaiOWtVRLMDn3AduZtcCV7j7x3PP1wJ/4u6fnHJdN9AN0N7efkm5S6hC+E+OQuWZQS2GU6fTabLZLCt4ht1cTYoMrfwfr3IaGc7lSv6V5/l9AJYuXcrtt99e1RuVUf884lRHHGpQHaqjVnVENg7c3QfcvcPdO5YvXz6r7y02bLqaw6kLrdf9HM1sYgvNjHOMVpoZ5/Ns5nmaSKVSDA4OMjIyUvVRJlH8POJaRxxqUB2qI/I6CjXLy/miBl0ote7HyneZMKVfG1b7v/Bhf4U3+9/yZX+FN/vX+Uvv6fn36hRStL6F1a+X9BpUh+qoVR1UoQ98EfA8cC6v38RcOdP3zDbA8z+AVMrdLDxW839AKpUqEN7hVwcf8N/hxw4n/Hdtr3/xmjurV8gMavnziHsdcahBdaiOWtRRLMDnNQ7czFYBtxGOSLnb3Wecahj3ceDF1jCBcB0TTcgRkSgU6wOf10xMd98D7JnPe8RJe3t7wX0qNTlHROJIa6FMojVMRCRJFOCTdHZ2ar1uEUkMrYUiIhJzWg9cRGSBUYCLiCSUAlxEJKEU4CIiCaUAFxFJKAW4iEhCKcBFRBJKAS4iklAKcBGRhFKAi4gkVDIC/OhRWLkyPIqICJCUAH/0Udi3D/YsmJVrRUTmLd4BvmYNLF4M69eHz9etC5+vWRNtXSIiMRDvAN+yJdz9s6kpfN7UBKkUbN0abV0iIjEQ7wA/77wwxMfHobU1PG7eDCtWRF2ZiEjk4h3gALt2heG9eXN4fPDBqCsSEYmFee2JWRMbN8Idd0BbG3R1wf79UVckIhIL8Q/wSy99/XFbW/glIiIJ6EIREZGCFOAiIgmlABcRSSgFuIhIQinARUQSyty9dh9mdgjI1uwDZ2cZMBJ1EXOQ1LpBtUdFtUdjPrWn3H351JM1DfA4M7Mhd++Iuo7ZSmrdoNqjotqjUY3a1YUiIpJQCnARkYRSgL9uIOoC5iipdYNqj4pqj0bFa1cfuIhIQqkFLiKSUApwEZGEqvsAN7MrzOwZM/ulmd0UdT3lMrO3mtn3zWyfmf3czDZEXdNsmVmjmf2XmX0r6lpmw8zOMLOHzOx/zOxpM3tn1DWVw8w+k/tdecrMHjCzN0Zd00zM7G4zO2hmT006t8TMHjOzZ3PHM6OssZAidX8p9/vyUzP7ppmdUYnPqusAN7NG4J+BDwIXAqvN7MJoqyrbceBGd78QeAfwiQTVnrcBeDrqIubgduA77v4HwB+TgP8GMzsb+BTQ4e4XAY3AR6KtqqR7gSumnLsJeNzdzwcezz2Pm3uZXvdjwEXu/kfAL4CbK/FBdR3gwGXAL939eXcfA74OXBVxTWVx95fc/Se5x8cIQ+TsaKsqn5mdA/wFcFfUtcyGmb0ZeA/wVQB3H3P3X0daVPkWAaeZ2SKgBfjfiOuZkbs/ARyZcvoqYGfu8U7g6lrWVI5Cdbv799z9eO7pfwDnVOKz6j3AzwYmb/HzAgkKwTwzSwNvA34UcSmzcRvwOWAi4jpm61zgEHBPrvvnLjNrjbqoUtz9ReDLwDDwEnDU3b8XbVVz0ubuL+UeHwCSuMPL9cC3K/FG9R7giWdmi4FvAJ92999EXU85zOxDwEF33xt1LXOwCHg7sN3d3wa8Sjz/GX+KXF/xVYR/Ab0FaDWzrmirmh8Px0Anahy0mfURdn8GlXi/eg/wF4G3Tnp+Tu5cIphZE2F4B+7+cNT1zMK7gSvNLEPYbfXnZjYYbUllewF4wd3z/9p5iDDQ4+79wK/c/ZC7jwMPA++KuKa5eNnMzgLIHQ9GXE/ZzOyjwIeATq/QBJx6D/D/BM43s3PNrJnwps7uiGsqi5kZYT/s0+5+a9T1zIa73+zu57h7mvBn/m/unojWoLsfAPab2QW5U+8D9kVYUrmGgXeYWUvud+d9JODmawG7gfW5x+uBRyKspWxmdgVhl+GV7j5aqfet6wDP3VT4JPBdwl/mXe7+82irKtu7gbWErdcnc1+roi6qTvwNEJjZT4GLgX+ItpzScv9ieAj4CfAzwj/7sZ6WbmYPAD8ELjCzF8zsY8AtwOVm9izhvypuibLGQorU/RXgTcBjuT+rOyryWZpKLyKSTHXdAhcRSTIFuIhIQinARUQSSgEuIpJQCnARkYRSgIuIJJQCXEQkof4f5tjxivfF9mMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(x, y,color='k');# données apprentissage en noir \n", "plt.scatter(xnew, np.zeros(xnew.shape[0]),color='b');# x_i non observés en bleu\n", "plt.scatter(xnew, ynew,color='r', marker='*');# y_i prédit ave la régression linéaire (x_i,y_i) en rouge " ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(50,)\n", "Biais ou erreur en chaque point : \n", "\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXMAAAEDCAYAAADHmORTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVaklEQVR4nO3df2zcd33H8de7vtR31+RSWK3YpKXuutCqTUmYTxM/ppmRroTRUehggF3GWCTjapDwa4wSaR2JIpCYgEkrHRaUVKIqZW2qoqyjhB9qsLRUnJu0tE1hYI+0neO4I22TLOcuznt/+Ez84+y7833vvncfPx+SFd/3vt/P5/39xn7lm8/3e5+vubsAAM3tvLgLAABUjzAHgAAQ5gAQAMIcAAJAmANAAAhzAAhAbGFuZneY2TEzeyKi9r5nZi+Y2d45y3eb2YiZHSp8bYyiPwBoJHGeme+WtDnC9r4o6QMLvPe37r6x8HUowj4BoCHEFubuvl/Sb2YuM7PLC2fYQ2b2EzO7soL2fijpRNR1AkAzaLQx8wFJH3X3LkmfkvTViNrdZWaPm9mXzaw1ojYBoGEk4i5gmpmtlPRGSf9qZtOLWwvv3ShpR5HNnnP3t5Zo+hZJRyWdr6l/LP5ugbYAoGk1TJhr6n8JL7j7xrlvuPseSXuW0qi7jxa+nTCzb2rqjB8AgtIwwyzu/pKkETN7jyTZlA3VtmtmHdPtSXqnpEjungGARmJxzZpoZndLerOkiySNSbpV0o8k3S6pQ9IKSd9297KGRMzsJ5KulLRS0v9I2uLuD5nZjyS1STJJhyT1u/vJSHcGAGIWW5gDAKLTMMMsAICli+UC6EUXXeSdnZ1xdA0ATWtoaOh5d28r9l4sYd7Z2alcLhdH1wDQtMzs1wu9xzALAASAMAeAABDmABAAwhwAAkCYA0AAmirMR0+Mqnt3t46ePBp3KQDQUJoqzHfu36nBI4Pa8TCTHgLATLF8nD+bzXol95mndqWUP5OftzyZSGp467Ded9/7dM+771H7yvYoywSAhmJmQ+6eLfZeU5yZD28dVs/6HqUTaUlSOpFW7zW9Gtk2wtk6AKhJzswl6ea9N2vg0QGd33K+Xp58WSbTpE/OW+88O0/PfeI5ztIBBKfpz8wlaezUmPq7+nVgywH1d/Xrusuvm3e2vu6V6+TunKUDWHaa5sy8mOmz9bN+tuj7yURSp7efrrofAGgEdTkzN7MWMztoZnujarOU6bP1fR/Yp3WvXKfzbGp3psfUD2w5oO7d3Xrs6GPc0gggaFHOmrhN0mFJmQjbXNSe9557LOimyzbpV8d/pWQiqfxkXpnWjL429DUNHhlU755eHX7+sHY8vENffftX61UeANRNJMMsZnaxpDsl7ZL0CXe/frH1oxpmmenGe25Ux8oO9XX1qWugq+jF0WkMvwBoRvUYZvmKpE9LKj54PVVEn5nlzCw3Pj4eUbfn7HnvHt329tu0oX2Dnvn4M+pZ36NUS2rWOqlE6re3NAJASKoOczO7XtIxdx9abD13H3D3rLtn29qKPigjMh2rOpRpzWji7IRarEWS1GItmpicUKY1w22LAIITxZj5myS9w8z+VFJSUsbMvuXuN0XQ9pJNXxx9+vmnNXZqTGsuWKMrL7pSoydH4ywLAGoi0lsTzezNkj4Vx5h5VEZPjDI9AICGFMSHhuqF6QEANKOm/tBQlBabzIs7XwA0As7My7DYZF4A0OgI84LpO2Dyk/lZHzxi3BxAMyDMZ5g7mVexj/8Xe9rRQk9AmrucJyUBzWUpv7Ox/Z67e92/urq6vFndvPdmP+9z5/nNe29edFmx5QutB6AxLeV3tpa/55JyvkCucgG0TAtdIK0GF1eBxrSUGyLqcRMFF0AjUOwC6Y1X3qh3XfGueRdND3340Kx1Uy0pda7uVCqRmrUeF1eBxrSUGyLivokiylkTg1bsAumalWvk7vMumm5o3zBr3YnJCaVXpDUxOcHFVaAJLOWGiLhvouDMvALFLpAudNF07vLj+eMlL64CaBzl3BARxTZRYcwcAJoEY+YAEDjCHAACQJgDQAAIcwAIAGEOAAEgzGuo1BwNh0YPKfP5jLID2bLmegGAhRDmNVTqQRc33X+TTrx8QkOjQ7PW4QEZACpV9X3mZpaUtF9Sq6Y+UXqvu9+62Dah32deao4G+5xV1B5zuACQan+f+YSkt7j7BkkbJW02s9dH0G7TKjVHw8G+g1q7au2sbUymTZdtKjrXC3O4ACil6jAvzMx4svByReGr/h8rbSCl5mjY2LFRq1tXz9rG5XrN77xGa1au4QEZACoWyZi5mbWY2SFJxyTtc/dHiqzTZ2Y5M8uNj49H0W1DKzVHw/H8ca06f5Wu+93rdPkrLlc6kV50rhcAWEykc7OY2YWS7pf0UXd/YqH1Qh8zB4BaqNvcLO7+gqQfS9ocZbsAgMVVHeZm1lY4I5eZpST9iaSnq20XAFC+KB5O0SHpTjNr0dQ/Dt9x970RtAsAKFPVYe7uj0t6XQS1AACWiE+ANoFqPt5f7raHRg/pwi9cqMfHHp/3OurpBaJqj2kPgHMI8yZQzcf7y932pvtv0osTL6rnvp55r6OeXiCq9pj2ADiHx8Y1sFLTAkSxbaVTC5TbfzU11asdoNnw2LgmVWpagCi2Pdh3UJeuvnTWshZrmfXaZBX3H/X+1KIdICSEeQMrNS1AFNtu7NioC1ZcMGvZ3DB3eSTTC1SzP7VoBwgJYd7gqvl4f7nbHs8f19VtV+ued9+jq9uu1pmzZ377etX5q5ROpCObXiCq6QqY9gCYjTFzAGgSjJkDQOAIcwAIAGEOAAEgzAEgAIQ5AASAMAeAABDmABAAwhwAAkCYA0AACHMACEAUzwC9xMx+bGZPmdmTZrYtisIAAOWL4hmgZyR90t0fNbNVkobMbJ+7PxVB2wCAMlR9Zu7uo+7+aOH7E5IOS1pbbbsAgPJFOmZuZp2aerjzI0Xe6zOznJnlxsfHo+wWAJa9yMLczFZKuk/Sx9z9pbnvu/uAu2fdPdvW1hZVtwAARRTmZrZCU0F+l7vviaJNhGX0xKi6d3c39UMkar0Pc9sP4ZjVEsdntijuZjFJ35B02N2/VH1JCNHO/Ts1eGRQOx7eEXcpS1brfZjbfgjHrJY4PrNV/aQhM/tDST+R9DNJZwuLP+vuDy60DU8aWj5Su1LKn8nPW55MJHV6++kYKqpcrfdhofZr1V+zC+Fnaqlq+qQhdx90d3P317r7xsLXgkGO5WV467B61vconUhLktKJtHqv6dXItpGYKytfrfdhbvuplpQ6V3cqlUjVpL9mF8LPVC3wCVDUVMeqDmVaM8pP5pVMJJWfzCvTmlH7yva4SytbrfdhbvsTZyeUXpHWxORE0x6zWgrhZ6oWCHPU3NipMfV39evAlgPq7+pvygtWtd6Hue0fzx9v+mNWSyH8TEWt6jHzpWDMHAAqV9MxcwBA/AhzAAgAYQ4AASDMASAAhDkABIAwB4AAEOYAEADCHAACQJgDQAAIcwAIAGEOAAEgzAEgAIQ5AAQgqmeA3mFmx8zsiSjaAwBUJqoz892SNkfUFgCgQpGEubvvl/SbKNoCAFSubmPmZtZnZjkzy42Pj9erWywToydG1b27myfOYNmqW5i7+4C7Z90929bWVq9usUzs3L9Tg0cGtePhHXGXAsQiEXcBQDVSu1LKn8n/9vXtudt1e+52JRNJnd5+OsbKgPri1kQ0teGtw+pZ36N0Ii1JSifS6r2mVyPbRmKuDKivqG5NvFvSf0i6wsyeNbMtUbQLlNKxqkOZ1ozyk3klE0nlJ/PKtGbUvrI97tKAuopkmMXd3x9FO8BSjJ0aU39Xv/q6+jQwNKDRk6NxlwTUnbl73TvNZrOey+Xq3i8ANDMzG3L3bLH3GDMHgAAQ5gAQAMIcAAJAmANAAAhzAAgAYQ4AASDMASAAhDkABIAwB4AAEOYAEADCHAACQJgDQAAIcwAIAGEOAAEgzAEgAFE9aWizmf3czH5pZp+Jok2gWqMnRtW9u1tHTx6tav25yxdrt9I+y6kjijbL7avS91Ge6eP42NHHanY8qw5zM2uRdJukt0m6StL7zeyqatsFqrVz/04NHhnUjod3VLX+3OWLtVtpn+XUEUWb5fZV6fsoz/Rx7N3TW7PjWfWThszsDZL+wd3fWnh9iyS5++cX2oYnDaGWUrtSyp/Jz1ueTCR1evvpstcvVzKRlKSK+iym3DoqabPSvqbbrvQYorhSf6eVHs9aP2loraRnZrx+trBsbhF9ZpYzs9z4+HgE3QLFDW8dVs/6HqUTaUlSOpFW7zW9Gtk2UtH6hz58aNbyVCKlztWdSrWk5rVbaZ/l1JFqKfSXmN9ftUrVG8X+4NxxnP47nJZqSUV+POt2AdTdB9w96+7Ztra2enWLZahjVYcyrRnlJ/NKJpLKT+aVac2ofWV7RetvaN8wa/nE5ITSK9KaODsxr91K+yynjomzhf4m5/dX62MUxf7g3HGcmJxQi7VIklqsRRNnJyI/nlGE+XOSLpnx+uLCMiA2Y6fG1N/VrwNbDqi/q7/kBaeF1p+7/Hj++ILtVtpnOXUs1l+1StUbxf7g3HHsvrRbV7ddre5Lu2tyPKMYM09I+oWkTZoK8Z9K6nH3JxfahjFzAKjcYmPmiWobd/czZvYRSQ9JapF0x2JBDgCIXtVhLknu/qCkB6NoCwBQOT4BCgABIMwBIACEOQAEgDAHgAAQ5gAQAMIcAAJAmANAAAhzAAgAYQ4AASDMASAAhDkABIAwB4AAEOYAEADCHAACQJgDQAAIcwAIQFVhbmbvMbMnzeysmRV9lBEAoPaqPTN/QtKNkvZHUAsAYImqemycux+WJDOLphoAwJLUbczczPrMLGdmufHx8Xp1CwDLQskzczP7gaT2Im9td/cHyu3I3QckDUhSNpv1sisEAJRUMszd/dp6FAIAWDpuTQSAAFR7a+K7zOxZSW+Q9G9m9lA0ZQEAKlHt3Sz3S7o/oloAAEvEMAsABIAwB4AAEOYAEADCHAACQJgDQAAIcwAIAGEOAAEgzAEgAIQ5AASAMAeAABDmABAAwhxoMKMnRtW9u1tHTx5tyPaibLdUG7WqPUSEOdBgdu7fqcEjg9rx8I6GbC/Kdku1UavaQ2Tu9X/oTzab9VwuV/d+gUaW2pVS/kx+3vJkIqnT20/H3l6U7ZZqo1a1NzszG3L3bLH3ODMHGsTw1mH1rO9ROpGWJKUTafVe06uRbSMN0V6U7ZZqo1a1h4wwBxpEx6oOZVozyk/mlUwklZ/MK9OaUfvKYo/grX97UbZbqo1a1R4ywhxoIGOnxtTf1a8DWw6ov6u/6gt/UbcXZbul2qhV7aGqaszczL4o6c8kvSzpV5I+5O4vlNqOMXMAqFwtx8z3SVrv7q+V9AtJt1TZHgBgCaoKc3f/vrufKbw8IOni6ksCAFQqyjHzv5b07wu9aWZ9ZpYzs9z4+HiE3QIAEqVWMLMfSCp2CXm7uz9QWGe7pDOS7lqoHXcfkDQgTY2ZL6laAEBRJcPc3a9d7H0z+ytJ10va5HF8AgkAUDrMF2NmmyV9WlK3u/9vNCUBACpV7Zj5P0taJWmfmR0ys3+JoCYAQIWqOjN399+LqhAAwNLxCVAACABhDgABIMwBIACEOQAEgDAHgAAQ5gAQAMIcAAJAmANAAAhzAAgAYQ4AASDMASAAhDkABIAwB4AAEOYAEADCHAACQJgDQACqCnMz22lmjxeeMvR9M3tVVIUBAMpX7Zn5F939te6+UdJeSX9ffUkAgEpVFebu/tKMlxdI8urKAQAsRVXPAJUkM9sl6S8lvSjpjxdZr09SnyS9+tWvrrZbAMAM5r74ybSZ/UBSe5G3trv7AzPWu0VS0t1vLdVpNpv1XC5Xaa0AsKyZ2ZC7Z4u9V/LM3N2vLbOfuyQ9KKlkmAMAolXt3SzrZry8QdLT1ZUDAFiKasfMv2BmV0g6K+nXkvqrLwkAUKmqwtzd/zyqQgAAS8cnQAEgAIQ5AASAMAeAABDmABAAwhwAAkCYAwje6IlRde/u1tGTR4PqaybCHEDwdu7fqcEjg9rx8I6g+pqp5NwstcDcLADqIbUrpfyZ/LzlyURSp7efbrq+FpubhTNzAMEa3jqsnvU9SifSkqR0Iq3ea3o1sm2kqfsqhjAHEKyOVR3KtGaUn8wrmUgqP5lXpjWj9pXFJoJtnr6KIcwBBG3s1Jj6u/p1YMsB9Xf11/TCZD37mosxcwBoEoyZA0DgCHMACABhDgABIMwBIACEOQAEgDAHgADEcmuimY1r6pmhi7lI0vN1KKdRsf/sP/u/fC20/5e6e1uxDWIJ83KYWW6h+ymXA/af/Wf/2f9KtmGYBQACQJgDQAAaOcwH4i4gZuz/8sb+L28V73/DjpkDAMrXyGfmAIAyEeYAEICGC3Mz22xmPzezX5rZZ+Kup57M7BIz+7GZPWVmT5rZtrhrioOZtZjZQTPbG3ct9WZmF5rZvWb2tJkdNrM3xF1TvZnZxws//0+Y2d1mloy7ployszvM7JiZPTFj2SvNbJ+Z/Wfhz1eUaqehwtzMWiTdJultkq6S9H4zuyrequrqjKRPuvtVkl4v6W+W2f5P2ybpcNxFxOSfJH3P3a+UtEHL7DiY2VpJWyVl3X29pBZJ74u3qprbLWnznGWfkfRDd18n6YeF14tqqDCX9AeSfunuw+7+sqRvS7oh5prqxt1H3f3RwvcnNPWLvDbequrLzC6W9HZJX4+7lnozs9WS/kjSNyTJ3V929xdiLSoeCUkpM0tISkv675jrqSl33y/pN3MW3yDpzsL3d0p6Z6l2Gi3M10p6ZsbrZ7XMwmyamXVKep2kR2Iupd6+IunTks7GXEccLpM0LumbhWGmr5vZBXEXVU/u/pykf5R0RNKopBfd/fvxVhWLNe4+Wvj+qKQ1pTZotDCHJDNbKek+SR9z95firqdezOx6ScfcfSjuWmKSkPT7km5399dJOqUy/nsdksLY8A2a+oftVZIuMLOb4q0qXj51/3jJe8gbLcyfk3TJjNcXF5YtG2a2QlNBfpe774m7njp7k6R3mNl/aWqI7S1m9q14S6qrZyU96+7T/xu7V1PhvpxcK2nE3cfd/f8k7ZH0xphrisOYmXVIUuHPY6U2aLQw/6mkdWZ2mZmdr6kLH9+Nuaa6MTPT1HjpYXf/Utz11Ju73+LuF7t7p6b+7n/k7svmrMzdj0p6xsyuKCzaJOmpGEuKwxFJrzezdOH3YZOW2UXggu9K+mDh+w9KeqDUBomallMhdz9jZh+R9JCmrmLf4e5PxlxWPb1J0gck/czMDhWWfdbdH4yvJNTZRyXdVTiZGZb0oZjrqSt3f8TM7pX0qKbu7jqowD/ab2Z3S3qzpIvM7FlJt0r6gqTvmNkWTU0X/hcl2+Hj/ADQ/BptmAUAsASEOQAEgDAHgAAQ5gAQAMIcAAJAmANAAAhzAAjA/wP9P7BtHtS4swAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "L'erreur globale peut être donnée l'erreur quadratique moyenne : 6.893473346225456e-30\n" ] } ], "source": [ "#on peut aussi afficher la fonction f\n", "plt.scatter(x, y,color='k');\n", "#plt.scatter(xnew, ynew);\n", "plt.plot(xnew, ynew,'r');\n", "#l'erreur est donnée par la somme cumulée des distances \n", "#entre les points en noir et la droite en rouge\n", "\n", "ypred=models.predict(X)\n", "print(ypred.shape)\n", "print('Biais ou erreur en chaque point : \\n')\n", "plt.figure()\n", "plt.plot(x, (y-ypred), 'g*')\n", "plt.show()\n", "print('L\\'erreur globale peut être donnée l\\'erreur quadratique moyenne : ',np.mean((y-ypred)**2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nous venons de faire notre premier exemple pour le cas simple $ x$ réel et $y$ réel \n", "\n", "\n", "On peut généraliser ce résultat quelque soit la taille de $x : x\\in R^d$ et pour toute dimension finie $d$\n", "\n", "Exemple : " ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from mpl_toolkits.mplot3d import Axes3D\n", "#constituer un exmple de data \n", "x = np.array(10 * rng.rand(100,2))\n", "y=2*np.inner(np.array([-1,1]), x)+ 2*rng.randn(x.shape[0]) \n", "\n", "fig=plt.figure()\n", "ax = fig.add_subplot(111, projection='3d')\n", "\n", "ax.scatter(x[:,0], x[:,1],y,c='b', marker='o');\n", "ax.set_xlabel('valeur de x[:,0]')\n", "ax.set_ylabel('aleur de x[:,1]')\n", "ax.set_zlabel('valeur de y ')\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LinearRegression()" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = LinearRegression(fit_intercept=True)\n", "model.fit(x, y)" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [], "source": [ "xnew = np.array(10 * rng.rand(1000,2))\n", "ynew = model.predict(xnew)" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig=plt.figure()\n", "ax = fig.add_subplot(111, projection='3d')\n", "\n", "ax.scatter(x[:,0], x[:,1],y,c='b', marker='o');\n", "ax.scatter(x[:,0], x[:,1],-y,c='b', marker='o');\n", "ax.set_xlabel('valeur de x[:,0]')\n", "ax.set_ylabel('aleur de x[:,1]')\n", "ax.set_zlabel('valeur de y ')\n", "\n", "ax.scatter(xnew[:,0], xnew[:,1],ynew,c='r', marker='*');\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercice TD2 : \n", "Reprendre l'exercice de la régression linéaire dans le TD2.\n", "\n", "1- Faire un programme python qui répond aux questions.\n", "\n", "2- Comparer les résultats théoriques vu en TD et les résultats par le code. \n", "\n", "3- Prédire le taux de scolarisation pour des PIBs non observés.\n", "\n", "4- Suivre les mêmes étapes que l'exemple 1 au dessus pour visualiser et inerpréter le biais. \n", "\n", "5- Commenter la qualité du modèles : bon ou mauvais ? Pourquoi ? " ] }, { "cell_type": "code", "execution_count": 147, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "moyenne PIB : 5473.375 moyenne scolarisation : 65.75\n", "variance PIB : 2128.73747427319 variance scolarisation : 10.720890821195784\n", "(8,)\n", "(8,)\n", "----- la solution -----\n", "la valeur trouvé est [0.00485763]\n", "la valeur trouvée de b est : 39.16236270608056\n", "la covarience : 2643809.964285714\n", "â = 1241.9614894919741 b^ = -6797655.217548134\n" ] } ], "source": [ "pib = np.array([8802,5872,4775,5680,7964,5680,3072,1942])\n", "scol = np.array([83,69,63,62,81,62,56,50])\n", "print(\"moyenne PIB : \", pib.mean(), \"moyenne scolarisation :\", scol.mean())\n", "print(\"variance PIB : \", pib.std(), \"variance scolarisation :\", scol.std())\n", "\n", "print(pib.shape)\n", "print(scol.shape)\n", "models = LinearRegression(fit_intercept=True)\n", "models.fit(pib[:, np.newaxis], scol)\n", "\n", "a=models.coef_\n", "print('-'*5,'la solution', '-'*5)\n", "print('la valeur trouvé est', a)\n", "\n", "b=models.intercept_\n", "print('la valeur trouvée de b est :', b)\n", "\n", "cov = np.sum(pib * scol) -len(pib) * (pib.mean() * scol.mean()) / (len(pib) - 1)\n", "print('la covarience :', cov)\n", "\n", "a_chap = cov /pib.std()\n", "b_chap = scol.mean() - a_chap * pib.mean()\n", "\n", "print(\"â =\", a_chap, \"b^ =\", b_chap)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Un test de la régression logistique sur des données réelles\n", "## C'est un cas particulier où $y \\in \\{0, \\ldots, k\\}$" ] }, { "cell_type": "code", "execution_count": 148, "metadata": {}, "outputs": [], "source": [ "#importer les bibliothèques \n", "#pour l'affichage (si déjà fait pour np, plt) \n", "%matplotlib inline \n", "\n", "#charger des datasest de sklearn\n", "from sklearn import datasets" ] }, { "cell_type": "code", "execution_count": 149, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "(150, 4)\n" ] } ], "source": [ "#charger la base iris \n", "iris = datasets.load_iris()\n", "#vérifier le type de la variable iris \n", "print(type(iris))\n", "#vérifier le type de données \n", "print(type(iris.data))\n", "#vérifier les dimensions \n", "print(iris.data.shape)\n", "\n", "#Sur wikipédia chercher la signification de ces données " ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2]\n", "(150, 2)\n", "[0 1]\n" ] } ], "source": [ "X = iris.data[:, :2] # Utiliser les deux premières colonnes afin d'avoir un problème de classification binaire.\n", "\n", "print(np.unique(iris.target))\n", "#on va garder deux classes seulement pour un test simple\n", "y = (iris.target != 0) * 1 # re-étiquetage des fleurs\n", "print(X.shape)\n", "print(np.unique(y))" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#visuali des données\n", "plt.figure(figsize=(10, 6))\n", "plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='classe 0')\n", "plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='classe 1')\n", "plt.legend();" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LogisticRegression(C=1e+20)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LogisticRegression(C=1e+20)" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#charger le modèle pour y binaire\n", "from sklearn.linear_model import LogisticRegression\n", " \n", "model = LogisticRegression(C=1e20) # Régression logistique\n", "# Entrainement du modèle avec toutes les données \n", "model.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 0, 0])" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Xnew = np.array([\n", " [5.5, 2.5],\n", " [7, 3],\n", " [3,2],\n", " [5,3]\n", "])\n", "\n", "model.predict(Xnew)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analyse des résultats : \n", "\n", " * La première observation [5.5, 2.5] pour $y=1$\n", " * La deuxième observation [7, 3] pour $y=1$\n", " * La troisième observation [3,2] pour $y=0$\n", " * La quatrième observation [5,3] pour $y=0$" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4,)\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#vérification visuelle\n", "\n", "#visualisation des données\n", "plt.figure(figsize=(10, 6))\n", "plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='y= 0')\n", "plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='y= 1')\n", "\n", "s = np.random.rand(*Xnew[:, 0].shape) * 800 + 500\n", "print(s.shape)\n", "Color='kygm' #noir jaune vert magneta\n", "for i in range(Xnew.shape[0]):\n", " plt.scatter(Xnew[i, 0], Xnew[i, 1],s[i], color=Color[i],marker=r'$\\clubsuit$',)\n", "plt.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Rappel de votre cours de BD\n", "## Pandas : un moyen efficace pour lire et manipuler des données" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1.4.3'" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ " # comme n'importe quelle librarire, il faut commencer par la charger à l'aide de la commande import\n", "import pandas\n", "# maintenat que c'est fait on peut utiliser son contenu\n", "# par exemple :vérifier la version installée sur votre machine \n", "pandas.__version__" ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1.4.3'" ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# et si on lui donne un nom pour faciliter les appels\n", "import pandas as pd\n", "pd.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## L’objet DataFrame\n", "\n", "La DataFrame est un objet bi-dimentionnel avec des colonnes de types potentiellement différents. \n", "On peut voir la DataFrame comme une feuille Exce ou une table SQL." ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedistance métromagasins prochesprix au m2
032.084.878821037.9
119.5306.59470942.2
213.3561.98450547.3
313.3561.98450554.8
45.0390.56840543.1
\n", "
" ], "text/plain": [ " age distance métro magasins proches prix au m2\n", "0 32.0 84.87882 10 37.9\n", "1 19.5 306.59470 9 42.2\n", "2 13.3 561.98450 5 47.3\n", "3 13.3 561.98450 5 54.8\n", "4 5.0 390.56840 5 43.1" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Lecture d'un fichier de données et le récupérer sous forme de dataframe sous le nom df \n", "df = pd.read_csv(\"Prix_Appartements.csv\") # à partir d'un csv \n", "df.head(5)" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "la taille : (414, 4)\n", "Avec : 414 lignes\n", "Avec : 4 colonnes\n" ] } ], "source": [ " ## On peut afficher les dimensions (nombre de lignes et de colonnes) ## avec l'attibut shape (comme avec numpy)\n", "print('la taille :',df.shape) ## (nb lignes, nb colonnes) print('*'*40)\n", "print('Avec :',df.shape[0],' lignes') ## (nb lignes, nb colonnes) print('*'*40)\n", "print('Avec :',df.shape[1],' colonnes') ## (nb lignes, nb colonnes) print('*'*40)" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedistance métromagasins prochesprix au m2
032.084.878821037.9
119.5306.59470942.2
213.3561.98450547.3
313.3561.98450554.8
45.0390.56840543.1
57.12175.03000332.1
\n", "
" ], "text/plain": [ " age distance métro magasins proches prix au m2\n", "0 32.0 84.87882 10 37.9\n", "1 19.5 306.59470 9 42.2\n", "2 13.3 561.98450 5 47.3\n", "3 13.3 561.98450 5 54.8\n", "4 5.0 390.56840 5 43.1\n", "5 7.1 2175.03000 3 32.1" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ " ## La commande df.head(n) permet d'afficher uniquement les n premiers éléments # car la taille de la dataframe est grande avec 4622 lignes\n", "df.head(6) # les 6 premières lignes de 0 à 5 = 6-1" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedistance métromagasins prochesprix au m2
41118.8390.96960740.6
4128.1104.81010552.5
4136.590.45606963.9
\n", "
" ], "text/plain": [ " age distance métro magasins proches prix au m2\n", "411 18.8 390.96960 7 40.6\n", "412 8.1 104.81010 5 52.5\n", "413 6.5 90.45606 9 63.9" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## De même df.tail(n) affiche les n=3 derniers éléments\n", "df.tail(3)" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedistance métromagasins prochesprix au m2
count414.000000414.000000414.000000414.000000
mean17.7125601083.8856894.09420337.980193
std11.3924851262.1095952.94556213.606488
min0.00000023.3828400.0000007.600000
25%9.025000289.3248001.00000027.700000
50%16.100000492.2313004.00000038.450000
75%28.1500001454.2790006.00000046.600000
max43.8000006488.02100010.000000117.500000
\n", "
" ], "text/plain": [ " age distance métro magasins proches prix au m2\n", "count 414.000000 414.000000 414.000000 414.000000\n", "mean 17.712560 1083.885689 4.094203 37.980193\n", "std 11.392485 1262.109595 2.945562 13.606488\n", "min 0.000000 23.382840 0.000000 7.600000\n", "25% 9.025000 289.324800 1.000000 27.700000\n", "50% 16.100000 492.231300 4.000000 38.450000\n", "75% 28.150000 1454.279000 6.000000 46.600000\n", "max 43.800000 6488.021000 10.000000 117.500000" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# La commande describe() est très utile. Elle permet d'obtenir, en une seule commande,\n", "# des statistiques des colonnes (UNIQUEMENT pour les colonnes de type numérique)\n", "\n", "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercice : Use case\n", "\n", "## dataset dans le fichier Prix_Appartements.csv\n", "\n", "\n", "1- Avec la commande pd.read_csv ouvrir le fichier csv dans une dataframe df_1.\n", "\n", "2- Afficher les 7 premières lignes. \n", "\n", "3- Afficher les noms des colonnes. \n", "\n", "4- Créer une nouvelle colonne dans la dataframe ne contenant que l'âge et le prix.\n", "\n", "5- Affichier les lignes d'index impair.\n", "\n", "6- En utilisant la colonne prix, calculer le prix moyen $p_m$, la médiane $k_m$ et l'écart-type $\\sigma_m$. \n", "\n", "7- Afficher les lignes dont le prix est supérieur au prix moyen $p_m$.\n", "\n", "8- Choisir deux colonnes qui représentront les variables $X$ et $Y$. Par exemple la distance au métro le plus proche et le prix au mètre carré. Pouvez les récupérer dans deux tableaux numpy. \n", "\n", "9- Appliquer une régression linéaire pour vérifier la corrélation entre $X$ et $Y$ à l'aide des graphiques comme dans l'exircice du TD2.\n", "\n", "10- Quelle variable (colonne) est la plus corrélée avec le prix ? \n", "\n", "11- Pour éviter à ce que les grandes valeurs dominent les petites, on peut normaliser en divisant chaque colonne par le maximum en valeur absolue. Reprendre la question 10 avec des valeurs normalisées. \n", "\n", "11- Vérifier la corrélation entre le prix et deux voire plusieurs colonnes. Conclure. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 4 }