Commit 290bfaed authored by Jonathan Poalses's avatar Jonathan Poalses

Improved detect-sample-dialect, so it can take an indeterminate number of keys...

Improved detect-sample-dialect, so it can take an indeterminate number of keys from detect-sentence-dialect, allowing sentences to be seen as multiple dialects
parent 0e68f91a
......@@ -42,18 +42,30 @@
"ner"]
:quote {:extractUnclosedQuotes "true"}}))
;; Take a sentence and figure out its dialect
(def bad-words #{"why" "cause"})
(def australian-words #{})
(def scottish-words #{})
(def american-words #{})
;; Take a sentence and figure out its dialect
(defn detect-sentence-dialect [sentence]
(if (some bad-words (dl/text (dl/tokens sentence))) :bad :good))
(some bad-words '("why" "not-why"))
;; Take a text sample and separate it into its sentences, then for each sentence find its dialect, and return the most common dialect
;; Take a text sample and separate it into its sentences, then for each sentence find its dialects, and return the most common dialect
;; A sentence can have an indeterminate number of dialects associated with it, as detect-sentence-dialects can return a collection,
;;when no dialect can be detected it defaults to standard. (IE if there's a sample with 3 sentences, one reads as scottish,
;;one reads as scottish and australian, and the last reads as nothing, it will return a collection containing 2 scottish keys,
;;one australian key, and one standard key, meaning it would be seen as a scottish sample.
(defn detect-sample-dialect [sample]
(first (last (sort-by val (frequencies (map detect-sentence-dialect (dl/sentences (nlp sample))))))))
(first (last (sort-by val (frequencies (flatten (map detect-sentence-dialect (dl/sentences (nlp sample)))))))))
(def annotated-example
(delay (nlp example)))
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment