"# We'll start by splitting the data into training and testing, going with a 75% train, 25% test split, a 50/50 split, and a 25% train 75% test split.\n",
"# We'll start by splitting the data into training and testing, going with a 75% train, 25% test split, a 50/50 split, and a 25% train 75% test split.\n",
...
@@ -329,8 +329,8 @@
...
@@ -329,8 +329,8 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:14.523589Z",
"end_time": "2023-05-26T18:52:56.018122Z",
"start_time": "2023-05-26T14:23:14.474897Z"
"start_time": "2023-05-26T18:52:56.013665Z"
}
}
}
}
},
},
...
@@ -345,7 +345,7 @@
...
@@ -345,7 +345,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 995,
"execution_count": 1171,
"outputs": [],
"outputs": [],
"source": [
"source": [
"# First the Gaussian Bayes\n",
"# First the Gaussian Bayes\n",
...
@@ -378,8 +378,8 @@
...
@@ -378,8 +378,8 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:17.659417Z",
"end_time": "2023-05-26T18:52:59.778488Z",
"start_time": "2023-05-26T14:23:14.482378Z"
"start_time": "2023-05-26T18:52:56.021278Z"
}
}
}
}
},
},
...
@@ -394,7 +394,7 @@
...
@@ -394,7 +394,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 996,
"execution_count": 1172,
"outputs": [
"outputs": [
{
{
"name": "stdout",
"name": "stdout",
...
@@ -423,14 +423,14 @@
...
@@ -423,14 +423,14 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:17.680394Z",
"end_time": "2023-05-26T18:52:59.790487Z",
"start_time": "2023-05-26T14:23:17.658835Z"
"start_time": "2023-05-26T18:52:59.780428Z"
}
}
}
}
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 997,
"execution_count": 1173,
"outputs": [
"outputs": [
{
{
"name": "stdout",
"name": "stdout",
...
@@ -459,14 +459,14 @@
...
@@ -459,14 +459,14 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:17.735405Z",
"end_time": "2023-05-26T18:52:59.863292Z",
"start_time": "2023-05-26T14:23:17.674643Z"
"start_time": "2023-05-26T18:52:59.791725Z"
}
}
}
}
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 998,
"execution_count": 1174,
"outputs": [
"outputs": [
{
{
"name": "stdout",
"name": "stdout",
...
@@ -495,22 +495,22 @@
...
@@ -495,22 +495,22 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:17.953286Z",
"end_time": "2023-05-26T18:53:00.087387Z",
"start_time": "2023-05-26T14:23:17.737205Z"
"start_time": "2023-05-26T18:52:59.865452Z"
}
}
}
}
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 999,
"execution_count": 1175,
"outputs": [
"outputs": [
{
{
"name": "stdout",
"name": "stdout",
"output_type": "stream",
"output_type": "stream",
"text": [
"text": [
"0.8644444444444445\n",
"0.8555555555555555\n",
"0.8487208008898777\n",
"0.8398220244716351\n",
"0.7611275964391692\n"
"0.7655786350148368\n"
]
]
}
}
],
],
...
@@ -531,22 +531,22 @@
...
@@ -531,22 +531,22 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:17.954572Z",
"end_time": "2023-05-26T18:53:00.093607Z",
"start_time": "2023-05-26T14:23:17.953425Z"
"start_time": "2023-05-26T18:53:00.089849Z"
}
}
}
}
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 1000,
"execution_count": 1176,
"outputs": [
"outputs": [
{
{
"name": "stdout",
"name": "stdout",
"output_type": "stream",
"output_type": "stream",
"text": [
"text": [
"0.9755555555555555\n",
"0.98\n",
"0.967741935483871\n",
"0.9621802002224694\n",
"0.9428783382789317\n"
"0.9443620178041543\n"
]
]
}
}
],
],
...
@@ -567,14 +567,14 @@
...
@@ -567,14 +567,14 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:17.991111Z",
"end_time": "2023-05-26T18:53:00.149796Z",
"start_time": "2023-05-26T14:23:17.953635Z"
"start_time": "2023-05-26T18:53:00.097279Z"
}
}
}
}
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 1001,
"execution_count": 1177,
"outputs": [
"outputs": [
{
{
"name": "stdout",
"name": "stdout",
...
@@ -603,8 +603,8 @@
...
@@ -603,8 +603,8 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:17.997469Z",
"end_time": "2023-05-26T18:53:00.157719Z",
"start_time": "2023-05-26T14:23:17.992749Z"
"start_time": "2023-05-26T18:53:00.152519Z"
}
}
}
}
},
},
...
@@ -628,7 +628,7 @@
...
@@ -628,7 +628,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 1002,
"execution_count": 1178,
"outputs": [
"outputs": [
{
{
"name": "stdout",
"name": "stdout",
...
@@ -735,14 +735,14 @@
...
@@ -735,14 +735,14 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:18.007419Z",
"end_time": "2023-05-26T18:53:00.213628Z",
"start_time": "2023-05-26T14:23:17.997882Z"
"start_time": "2023-05-26T18:53:00.167510Z"
}
}
}
}
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 1003,
"execution_count": 1179,
"outputs": [
"outputs": [
{
{
"data": {
"data": {
...
@@ -764,8 +764,8 @@
...
@@ -764,8 +764,8 @@
"metadata": {
"metadata": {
"collapsed": false,
"collapsed": false,
"ExecuteTime": {
"ExecuteTime": {
"end_time": "2023-05-26T14:23:18.549635Z",
"end_time": "2023-05-26T18:53:00.492287Z",
"start_time": "2023-05-26T14:23:18.008853Z"
"start_time": "2023-05-26T18:53:00.182195Z"
}
}
}
}
},
},
...
@@ -779,7 +779,9 @@
...
@@ -779,7 +779,9 @@
"\n",
"\n",
"Finally, the best split ratio is rather expectedly the 25% Test and 75% training split; If you wished to simply get the best results for any algorithm, choosing that test training ratio would be best.\n",
"Finally, the best split ratio is rather expectedly the 25% Test and 75% training split; If you wished to simply get the best results for any algorithm, choosing that test training ratio would be best.\n",
"\n",
"\n",
"So in summary, it appears that if you wanted to train a ML model to recognise written text, at the very least if that text is numeric, then using the K Nearest Neighbour algorithm, and training it with a 25% test and 75% training ration, would be the best choice."
"So in summary, it appears that if you wanted to train a ML model to recognise written text, at the very least if that text is numeric, then using the K Nearest Neighbour algorithm, and training it with a 25% test and 75% training ration, would be the best choice.\n",