Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
M
Major Project Handcrafted
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
jonathan.poalses
Major Project Handcrafted
Commits
3c59b3d4
Commit
3c59b3d4
authored
May 02, 2023
by
Jonathan Poalses
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Implemented the word-based dialect detection
parent
8c917db7
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
35 additions
and
4 deletions
+35
-4
dialect_nlp.clj
src/poalses/jonathan/dialect/dialect_nlp.clj
+35
-4
No files found.
src/poalses/jonathan/dialect/dialect_nlp.clj
View file @
3c59b3d4
...
...
@@ -47,11 +47,11 @@
;; Word sets that will show a sentence as being of that dialect
(
def
australian-words
#
{})
(
def
australian-words
#
{
"incorrect"
"why"
})
(
def
scottish-words
#
{})
(
def
scottish-words
#
{
"hence"
})
(
def
american-words
#
{})
(
def
american-words
#
{
"like"
})
;; Predicate sets to check a sentence and see if it grammatically matches a dialect
...
...
@@ -65,7 +65,29 @@
;; Take a sentence and figure out its dialect
(
defn
detect-sentence-dialect
[
sentence
]
(
if
(
some
bad-words
(
dl/text
(
dl/tokens
sentence
)))
:bad
:good
))
(
let
[
tokens
(
dl/tokens
sentence
)
dialects1
(
when
(
some
australian-words
(
dl/text
tokens
))
[
:australian
])
dialects2
(
when
(
some
scottish-words
(
dl/text
tokens
))
[
:scottish
])
dialects3
(
when
(
some
american-words
(
dl/text
tokens
))
[
:american
])
dialects
(
remove
nil?
(
flatten
(
conj
dialects1
dialects2
dialects3
)))]
(
if
(
empty?
dialects
)
[
:standard
]
dialects
)))
;; Another failed attempt
;(defn detect-sentence-dialect [sentence]
; (let [dialects []
; tokens (dl/tokens sentence)]
; (when (some australian-words (dl/text (dl/tokens tokens)))
; (let [dialects (conj dialects :australian)]
; (when (some scottish-words (dl/text (dl/tokens tokens)))
; (let [dialects (conj dialects :scottish)]
; (when (some american-words (dl/text (dl/tokens tokens)))
; (let [ dialects (conj dialects :american)]
; (if (empty? dialects) (conj dialects :standard))
; dialects))))))))
;; Take a text sample and separate it into its sentences, then for each sentence find its dialects, and return the most common dialect
;; A sentence can have an indeterminate number of dialects associated with it, as detect-sentence-dialects can return a collection,
...
...
@@ -149,6 +171,15 @@
(
def
rats
(
datafy
(
dl/dependency-graph
(
nth
sentences-one
1
))))
(
first
(
last
(
sort-by
val
(
frequencies
(
flatten
(
map
detect-sentence-dialect
(
dl/sentences
(
nlp
test-sentence-one
))))))))
(
first
(
last
(
sort-by
val
(
frequencies
(
flatten
(
map
detect-sentence-dialect
(
dl/sentences
(
nlp
test-sentence-two
))))))))
(
first
(
last
(
sort-by
val
(
frequencies
(
flatten
(
map
detect-sentence-dialect
(
dl/sentences
(
nlp
test-sentence-three
))))))))
(
first
(
last
(
sort-by
val
(
frequencies
(
flatten
(
map
detect-sentence-dialect
(
dl/sentences
(
nlp
test-sentence-four
))))))))
(
first
(
last
(
sort-by
val
(
frequencies
(
flatten
(
map
detect-sentence-dialect
(
dl/sentences
(
nlp
test-sentence-five
))))))))
(
first
(
last
(
sort-by
val
(
frequencies
(
flatten
(
map
detect-sentence-dialect
(
dl/sentences
(
nlp
test-sentence-six
))))))))
(
last
(
vals
rats
))
(
.getTarget
(
first
(
last
(
vals
rats
))))
(
bean
(
first
(
last
(
vals
rats
))))
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment