Commit d3ae4489 authored by Jonathan Poalses's avatar Jonathan Poalses

Added GNB, KNeighbour, and SVC ML implementations

parent 316c8c2b
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="$PROJECT_DIR$" vcs="Git" />
</component>
</project>
\ No newline at end of file
{
"cells": [
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true,
"ExecuteTime": {
"end_time": "2023-05-18T01:19:48.360879Z",
"start_time": "2023-05-18T01:19:48.356253Z"
}
},
"outputs": [
{
"data": {
"text/plain": "'#{1 2 3}'"
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import edn_format as edn\n",
"edn.dumps({1, 2, 3})"
]
},
{
"cell_type": "code",
"execution_count": 18,
"outputs": [
{
"data": {
"text/plain": "[1, True, None]"
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"edn.loads(\"[1 true nil]\")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:19:48.372717Z",
"start_time": "2023-05-18T01:19:48.362765Z"
}
}
},
{
"cell_type": "code",
"execution_count": 19,
"outputs": [
{
"data": {
"text/plain": "(\"It isn't all bad I must say, I got to learn words like dinkum. What's interesting about dinkum is it first appeared in australia, but it's sometimes used in england too!\",\n \"I just had a lovely stretch, it's around 6am right now. Those didn't have anything to do with each other, I just thought I'd mention it.\",\n \"I'm currently on episode seventeen of the fifth season.\",\n \"I'll probably write stuff about that in the actual dissertation, so if you're bored and snooping at what I used as the samples then you already know all this.\\n Unless I forgot. In which case, sorry about that.\",\n \"Omg, I'm like, so sorry. I didn't consider if whoever is reading this thinks making a robot girlfriend is actually rad! They may think it's super rad!\",\n 'Hello, this is a normal, boring sentence',\n \"Willow is onto Dawn, they're figuring out she's trying to bring her mother back.\",\n 'Dude, this is super lame, I suck at writing all american like.',\n \"Oh my bogan day, I've only done ten! This is so boring! I'm just plopping in an aussie word like bogan here and there, it sucks!\",\n \"I'd like to note here that this wee bit of borderline racism is just me making it easier to define as scottish.\",\n \"I'm not going to say what nork means here, but I will say that it's australian, and excellent.\",\n 'I feel like putting on a kilt and going down to the loch. The fookin word was reputation!',\n \"I don't know why I thought making up all my own sentences was a good idea. This is going to be a lot of work\",\n 'I put some salt on it, which is an incredibly basic wee bit of seasoning...',\n \"I haven't had haggis in a while, I've been having lots of ramen though.\",\n \"Although I do like to write things. I'm even considering writing a book. By considering I mean planning.\",\n \"Why do they have to do that, when I see people sad it makes me sad, I want to enjoy my viewing experience.\\n Also it's time for a different dialect now I suppose. I think I'll do american first, channel my inner valley girl.\",\n \"Accents are fookin hard! Especially when they're written and not fooken spoken like they should be.\",\n \"This isn't so bad. I imagine that things will be harder when I'm trying to write in other dialects.\",\n 'Oh no! The loser dude that made the Buffy-bot made her a slaying Buffy-bot!',\n \"This is why I put this off for so long, it's mind-numbingly boring.\",\n \"Ok now he's jumping around and shaking a gourd like a heckin loser.\",\n 'Oh wait, I have five more standard sample to make, sorry.',\n \"I'm not sure I should be writing all my thoughts out like this, considering I'm planning to make this available to the public.\\n Perhaps it's a bad idea, but I find that it's easier to write my thoughts directly, rather than thinking about it.\\n Such a bother, all that thinking.\",\n 'I had a frozen pizza earlier, not haggis, sadly. It was really nice, wee ham and pineapple.',\n \"I went there after my surgery, when I was basically a bludger. I visited family there, and btw bludger is an australian word for someone who doesn't pull their weight.\",\n 'Boring work.',\n \"Why in the name of haggis, did you pick dialects you daft lad. It's clearly a wee bit harder than if you chose languages.\",\n 'I like the word yakka, it makes me think of appa from avatar legend of aang.',\n 'And yes, fook is me writing fuck with a scottish accent...',\n 'I like blood pudding too, but I prefer haggis.',\n 'Buffy and Dawns mother just died. Naturally Dawn is now planning to resurrect her, the fool.',\n \"I think I'll do scottish next, it's super rad, I'll channel my inner... I don't heckin know, dwarf from lord of the rings?\",\n \"Fuck I'm so mentally worn out, why did I procrastinate so much, I'm such a bludger.\",\n \"I did this to myself, I'm a dummy, a heckin dummy. I'm not even sure what this heckin is now.\",\n \"Willow seems very suspicious. As in she's the suspicious one, not that she's suspicious of Dawn. Suspicious is a dumb word.\\n Why is the form of suspicious the same for both being the suspected and the suspecting.\",\n \"This is like, so super not rad, she's gonna hurt someone or something!\",\n \"And yes, I say that heckin is an american word. Why? Who cares! I say heckin is heckin cool. And it's heckin american.\\n Do recall that my like, super cool idea is for this to be consistent, not accurate, and in that I slay.\",\n \"I've switched from watching the super rad Buffy to watching some brainless youtube.\",\n \"Don't worry dude, I won't spill the coffee, your super cool secret is safe with me. I think you're rad even if you are kinda creepy for it...\",\n \"This really wasn't my best idea, was it, my koala loving bogan reader.\",\n \"No worries, I'm sure it'll come back to me in no time, and you can be sure I'll heckin tell ya about it!\",\n \"Most likely she'll hurt Buffys super rad awesome.. I heckin forgot the word but it means the image other people have for you...\",\n \"I'm using, like, a lot of heckin exclamation marks, aren't I? I think it's just the super rad mindset of writing in this dialect.\",\n 'Specifically, yakka comes from the Yagara indigenous language.',\n \"And yes, I do mean that's it's both hard yakka to write and hard yakka to read.\",\n \"Unless you're also a robot, then it's actually super cool! And you're super cool too!\",\n 'Why did I think that a guy like me could write in such a super awesome manner, with words like heck.',\n 'I hope in the future someone finds a small shred of enjoyment at my ramblings, besides that and borderline-useless artificial intelligence work, it has no other use',\n \"Dawn just reverse resurrected her mother. It's all very sad right now, with tears and sobbing.\",\n 'Or maybe bogans.',\n 'And then you went and spent all your time on mindless benality, with nary a bowl of haggis in sight!',\n \"And then I procrastinate this, my major project, until literal days before it's due, like a dumbass yobbo!\",\n \"I'm sure it seems strange, to be bothered by thinking when I do it so much. Perhaps, however, thinking so much is the issue.\",\n \"I'm currently watching Buffy the Vampire Slayer, it's really good.\",\n 'Yes that haggis mention was just to fulfill the obligatory scottishness... ',\n 'Oh my god man, Spike made a Buffy-bot, what a heckin loser dude.',\n 'Also words like yakka, which comes from aboriginal and means work.',\n 'Holy heckin heckballs! I think writing like this is actually giving me a super duper not rad headache! What the heck!',\n \"It's entirely my own fookin fault laddie, I'm aware.\",\n \"At least I think being called a spoon is bad, it's in my lexicon of mild insults for people who irk me, like yobbos.\",\n \"G'day mates, it's aussie time.\",\n \"It's excellence does not mean you want to be called a nork, for instance spoons are excellent, but being called one is not good.\",\n \"But I digress, I'm finally at the end of this cruel and unusual punishment of my own making, nork that I am.\",\n 'Holy heckin heck dude, writing in this accent is gonna give me brain damage.',\n \"I've never understood it, the desire to think poorly of people you don't know, so what if someones a yobbo? A yobbo is still a person.\",\n \"I'll not here, however, that I'm not strictly adhering to any actual established dialects.\\n What's important is internal consistency, meaning as long as the dialects I write, and the predicates I make for them are consistent with each other and themselves, it will be an effective test.\",\n 'Seriously, I procrastinated my exam prep to like a week and a half before the date like a bogan. And it was pushed back a dinkum week!',\n \"Hi, I'm like, so excited to be like, writing these super rad samples right now.\",\n \"I feel so sorry to whomever is reading this, if someone is, it's a lot of yakka.\",\n 'A simple lexical and grammatical analysis is hardly fookin sufficient. Most of accents are in the way the words are spoken, not the fookin words themselves!',\n \"Look laddie, I don't know what the fook you mean, your inner fookin dwarf.\",\n 'I am however channeling a scot right now, and it feels fookin natural to complain.',\n 'Again, all uses of fookin and mentioning of haggis is entirely coincidental, this is my own made up accent, that simply shares similarities with scottish...',\n 'Not often enough, if you ask me of my wee opinion...',\n 'The robot Buffy was, like, too heckin weird.',\n \"It's clearly going to end poorly. When in all of historical human imagination has bringing the dead back to life ended well?\\n Even a little bit well? It's clearly a recipe for disaster.\",\n \"Ok, writing the american was far easier, I have lot's of exposure from the laddies and lassies on television, but how often do you see a scot on telly?\",\n 'Giles just made a joke, it was like rad man. Super awesome.',\n 'Just typing, no story being told, or point being made, simply words for the sake of words.',\n 'It sounds way too cheesy, even for a wee scot. Not saying all scots are wee. In fact I imagine it sounds cheesy for any scot, wee or not.',\n \"I haven't even started my cyber assignment yet, it's due in less than a week from now, I'm such a nork.\",\n \"I can't believe I have five more fookin samples to make for scottish... It may be a wee number, but I also have twenty five more after that in australian!\",\n 'Australians also use the word yobbo, usually for someone who is drunk, or tends to get drunk.',\n \"It's hard! Writing in odd made-up half-accurate dialects, forcing me to use words like bogan. If you're wondering bogan is like redneck but aussie.\",\n \"I'm also only starting this right now, on the seventeenth, two days before it's due...\",\n \"Seriously, my head is hurting, this super sucks. I'm gonna take some heckin paracetamol now and then resume...\",\n \"Personally I really like haggis, it's delicious...\",\n \"This is now the actual final sample. I think I'll go with american next. Have fun and goodbye, for now. I'll ramble in america next.\",\n \"I'm a terrible student. Or maybe I have undiagnosed ADHD or something along those lines. Or both...\\n I should probably see a doctor about that at some point, but I'm too busy with university work right now.\",\n \"Is this even like, american anymore? It's more like a drunken baby having a heckin good time. Maybe not a heckin baby, but still...\",\n 'Not like Buffy the Vampire Slayer slay, but like, you go girl, kind of slay...',\n \"Yes I realise I gave up on the writing australian pretty quickly and I'm just slipping an aussie word, like dinkum, in here and there.\",\n \"Would a scotsman ever actually say that they're going to put on a kilt and go down to the loch?\",\n \"Honestly I don't have any real prejudices against a group of people, like bogans or yobbos or anything.\",\n \"What the fook, I'm not sounding scottish enough laddies and lassies!\",\n 'You could have had a grand time if you chose languages, not this fankle. But no, you had to choose dialects, you numpty.',\n \"I know that all those haggis lovers don't talk like this, or talk about haggis this much.\",\n \"I've been to australia before, I learnt the word bogan there. I pet a koala too, and saw a guy playing the didgeridoo.\",\n 'Only two more left, yay! This cruel and unusual punishment, which is super not rad, will soon end!')"
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = edn.loads(open(\"sample_data.txt\").read())\n",
"data"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:19:48.382104Z",
"start_time": "2023-05-18T01:19:48.374098Z"
}
}
},
{
"cell_type": "code",
"execution_count": 20,
"outputs": [
{
"data": {
"text/plain": "('australian',\n 'standard',\n 'standard',\n 'standard',\n 'american',\n 'standard',\n 'standard',\n 'american',\n 'australian',\n 'scottish',\n 'australian',\n 'scottish',\n 'standard',\n 'scottish',\n 'scottish',\n 'standard',\n 'standard',\n 'scottish',\n 'standard',\n 'american',\n 'standard',\n 'american',\n 'standard',\n 'standard',\n 'scottish',\n 'australian',\n 'standard',\n 'scottish',\n 'australian',\n 'scottish',\n 'scottish',\n 'standard',\n 'american',\n 'australian',\n 'american',\n 'standard',\n 'american',\n 'american',\n 'american',\n 'american',\n 'australian',\n 'american',\n 'american',\n 'american',\n 'australian',\n 'australian',\n 'american',\n 'american',\n 'standard',\n 'standard',\n 'australian',\n 'scottish',\n 'australian',\n 'standard',\n 'standard',\n 'scottish',\n 'american',\n 'australian',\n 'american',\n 'scottish',\n 'australian',\n 'australian',\n 'australian',\n 'australian',\n 'american',\n 'australian',\n 'standard',\n 'australian',\n 'american',\n 'australian',\n 'scottish',\n 'scottish',\n 'scottish',\n 'scottish',\n 'scottish',\n 'american',\n 'standard',\n 'scottish',\n 'american',\n 'standard',\n 'scottish',\n 'australian',\n 'scottish',\n 'australian',\n 'australian',\n 'standard',\n 'american',\n 'scottish',\n 'standard',\n 'standard',\n 'american',\n 'american',\n 'australian',\n 'scottish',\n 'australian',\n 'scottish',\n 'scottish',\n 'scottish',\n 'australian',\n 'american')"
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"target = edn.loads(open(\"sample_expected.txt\").read())\n",
"target"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:19:48.403396Z",
"start_time": "2023-05-18T01:19:48.384945Z"
}
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
{
"cells": [
{
"cell_type": "code",
"execution_count": 265,
"metadata": {
"collapsed": true,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:55.264368Z",
"start_time": "2023-05-18T01:49:55.259004Z"
}
},
"outputs": [],
"source": [
"import edn_format as edn\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.feature_extraction.text import CountVectorizer\n",
"vectorizer = CountVectorizer()\n",
"analyze = vectorizer.build_analyzer()"
]
},
{
"cell_type": "code",
"execution_count": 266,
"outputs": [],
"source": [
"data = vectorizer.fit_transform(edn.loads(open(\"sample_data.txt\").read()))\n",
"target = edn.loads(open(\"sample_expected.txt\").read())"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:55.286323Z",
"start_time": "2023-05-18T01:49:55.267563Z"
}
}
},
{
"cell_type": "code",
"execution_count": 267,
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=1999)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:55.292277Z",
"start_time": "2023-05-18T01:49:55.288051Z"
}
}
},
{
"cell_type": "code",
"execution_count": 268,
"outputs": [],
"source": [
"from sklearn.naive_bayes import GaussianNB"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:55.298594Z",
"start_time": "2023-05-18T01:49:55.295927Z"
}
}
},
{
"cell_type": "code",
"execution_count": 269,
"outputs": [
{
"data": {
"text/plain": "KNeighborsClassifier()",
"text/html": "<style>#sk-container-id-25 {color: black;background-color: white;}#sk-container-id-25 pre{padding: 0;}#sk-container-id-25 div.sk-toggleable {background-color: white;}#sk-container-id-25 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-25 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-25 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-25 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-25 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-25 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-25 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-25 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-25 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-25 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-25 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-25 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-25 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-25 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-25 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-25 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-25 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-25 div.sk-item {position: relative;z-index: 1;}#sk-container-id-25 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-25 div.sk-item::before, #sk-container-id-25 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-25 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-25 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-25 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-25 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-25 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-25 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-25 div.sk-label-container {text-align: center;}#sk-container-id-25 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-25 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-25\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KNeighborsClassifier()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-25\" type=\"checkbox\" checked><label for=\"sk-estimator-id-25\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">KNeighborsClassifier</label><div class=\"sk-toggleable__content\"><pre>KNeighborsClassifier()</pre></div></div></div></div></div>"
},
"execution_count": 269,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gnb = GaussianNB()\n",
"gnb.fit((X_train).toarray(), y_train)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:55.315007Z",
"start_time": "2023-05-18T01:49:55.303883Z"
}
}
},
{
"cell_type": "code",
"execution_count": 270,
"outputs": [],
"source": [
"predicted=gnb.predict((X_test).toarray())\n",
"expected = y_test"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:55.330297Z",
"start_time": "2023-05-18T01:49:55.311459Z"
}
}
},
{
"cell_type": "code",
"execution_count": 271,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"20.00%\n"
]
}
],
"source": [
"wrong=[ (p, e) for (p, e) in zip(predicted, expected) if p != e]\n",
"print(f'{(len(expected) - len(wrong)) / len(expected):.2%}')"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:55.357313Z",
"start_time": "2023-05-18T01:49:55.326780Z"
}
}
},
{
"cell_type": "code",
"execution_count": 272,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 20.00%\n"
]
}
],
"source": [
"print(f'{gnb.score((X_test).toarray(), y_test): .2%}')"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:55.362376Z",
"start_time": "2023-05-18T01:49:55.341241Z"
}
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
{
"cells": [
{
"cell_type": "code",
"execution_count": 257,
"metadata": {
"collapsed": true,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:38.014295Z",
"start_time": "2023-05-18T01:49:38.007387Z"
}
},
"outputs": [],
"source": [
"import edn_format as edn\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.feature_extraction.text import CountVectorizer\n",
"vectorizer = CountVectorizer()\n",
"analyze = vectorizer.build_analyzer()"
]
},
{
"cell_type": "code",
"execution_count": 258,
"outputs": [],
"source": [
"data = vectorizer.fit_transform(edn.loads(open(\"sample_data.txt\").read()))\n",
"target = edn.loads(open(\"sample_expected.txt\").read())"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:38.035222Z",
"start_time": "2023-05-18T01:49:38.012819Z"
}
}
},
{
"cell_type": "code",
"execution_count": 259,
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=1999)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:38.048317Z",
"start_time": "2023-05-18T01:49:38.038940Z"
}
}
},
{
"cell_type": "code",
"execution_count": 260,
"outputs": [],
"source": [
"from sklearn.neighbors import KNeighborsClassifier"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:38.053101Z",
"start_time": "2023-05-18T01:49:38.046545Z"
}
}
},
{
"cell_type": "code",
"execution_count": 261,
"outputs": [
{
"data": {
"text/plain": "GaussianNB()",
"text/html": "<style>#sk-container-id-24 {color: black;background-color: white;}#sk-container-id-24 pre{padding: 0;}#sk-container-id-24 div.sk-toggleable {background-color: white;}#sk-container-id-24 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-24 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-24 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-24 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-24 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-24 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-24 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-24 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-24 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-24 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-24 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-24 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-24 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-24 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-24 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-24 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-24 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-24 div.sk-item {position: relative;z-index: 1;}#sk-container-id-24 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-24 div.sk-item::before, #sk-container-id-24 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-24 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-24 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-24 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-24 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-24 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-24 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-24 div.sk-label-container {text-align: center;}#sk-container-id-24 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-24 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-24\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>GaussianNB()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-24\" type=\"checkbox\" checked><label for=\"sk-estimator-id-24\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">GaussianNB</label><div class=\"sk-toggleable__content\"><pre>GaussianNB()</pre></div></div></div></div></div>"
},
"execution_count": 261,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"knc = KNeighborsClassifier()\n",
"knc.fit(X_train, y_train)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:38.074733Z",
"start_time": "2023-05-18T01:49:38.058388Z"
}
}
},
{
"cell_type": "code",
"execution_count": 262,
"outputs": [],
"source": [
"predicted=knc.predict(X_test)\n",
"expected = y_test"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:38.086974Z",
"start_time": "2023-05-18T01:49:38.072132Z"
}
}
},
{
"cell_type": "code",
"execution_count": 263,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"65.00%\n"
]
}
],
"source": [
"wrong=[ (p, e) for (p, e) in zip(predicted, expected) if p != e]\n",
"print(f'{(len(expected) - len(wrong)) / len(expected):.2%}')"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:38.087358Z",
"start_time": "2023-05-18T01:49:38.077370Z"
}
}
},
{
"cell_type": "code",
"execution_count": 264,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 65.00%\n"
]
}
],
"source": [
"print(f'{knc.score(X_test, y_test): .2%}')"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:49:38.090517Z",
"start_time": "2023-05-18T01:49:38.085248Z"
}
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
{
"cells": [
{
"cell_type": "code",
"execution_count": 366,
"metadata": {
"collapsed": true,
"ExecuteTime": {
"end_time": "2023-05-18T01:53:55.666452Z",
"start_time": "2023-05-18T01:53:55.643356Z"
}
},
"outputs": [],
"source": [
"import edn_format as edn\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.feature_extraction.text import CountVectorizer\n",
"vectorizer = CountVectorizer()\n",
"analyze = vectorizer.build_analyzer()"
]
},
{
"cell_type": "code",
"execution_count": 367,
"outputs": [],
"source": [
"data = vectorizer.fit_transform(edn.loads(open(\"sample_data.txt\").read()))\n",
"target = edn.loads(open(\"sample_expected.txt\").read())"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:53:55.696031Z",
"start_time": "2023-05-18T01:53:55.660318Z"
}
}
},
{
"cell_type": "code",
"execution_count": 368,
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=1999)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:53:55.703863Z",
"start_time": "2023-05-18T01:53:55.699895Z"
}
}
},
{
"cell_type": "code",
"execution_count": 369,
"outputs": [],
"source": [
"from sklearn.svm import SVC"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:53:55.709149Z",
"start_time": "2023-05-18T01:53:55.705840Z"
}
}
},
{
"cell_type": "code",
"execution_count": 370,
"outputs": [
{
"data": {
"text/plain": "SVC()",
"text/html": "<style>#sk-container-id-37 {color: black;background-color: white;}#sk-container-id-37 pre{padding: 0;}#sk-container-id-37 div.sk-toggleable {background-color: white;}#sk-container-id-37 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-37 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-37 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-37 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-37 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-37 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-37 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-37 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-37 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-37 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-37 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-37 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-37 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-37 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-37 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-37 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-37 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-37 div.sk-item {position: relative;z-index: 1;}#sk-container-id-37 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-37 div.sk-item::before, #sk-container-id-37 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-37 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-37 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-37 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-37 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-37 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-37 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-37 div.sk-label-container {text-align: center;}#sk-container-id-37 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-37 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-37\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>SVC()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-37\" type=\"checkbox\" checked><label for=\"sk-estimator-id-37\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">SVC</label><div class=\"sk-toggleable__content\"><pre>SVC()</pre></div></div></div></div></div>"
},
"execution_count": 370,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"svc = SVC(gamma='scale')\n",
"svc.fit(X_train, y_train)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:53:55.729099Z",
"start_time": "2023-05-18T01:53:55.719679Z"
}
}
},
{
"cell_type": "code",
"execution_count": 371,
"outputs": [],
"source": [
"predicted=svc.predict(X_test)\n",
"expected = y_test"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:53:55.733844Z",
"start_time": "2023-05-18T01:53:55.730968Z"
}
}
},
{
"cell_type": "code",
"execution_count": 372,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"35.00%\n"
]
}
],
"source": [
"wrong=[ (p, e) for (p, e) in zip(predicted, expected) if p != e]\n",
"print(f'{(len(expected) - len(wrong)) / len(expected):.2%}')"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:53:55.738217Z",
"start_time": "2023-05-18T01:53:55.735115Z"
}
}
},
{
"cell_type": "code",
"execution_count": 373,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 35.00%\n"
]
}
],
"source": [
"print(f'{svc.score(X_test, y_test): .2%}')"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-18T01:53:55.744380Z",
"start_time": "2023-05-18T01:53:55.740566Z"
}
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment