Commit 3a155991 authored by Jonathan Poalses's avatar Jonathan Poalses

Added DataLinguist

parent f230caf7
# Major Project Handcrafted
The one made by hand, not the one made using ML
\ No newline at end of file
The one made by hand, not the one made using ML
## Attribution
[DataLinguist v0.2.171](https://github.com/simongray/datalinguist)
DataLinguist is a Clojure wrapper for the Natural Language Processing behemoth,
[Stanford CoreNLP](https://github.com/stanfordnlp/CoreNLP). The goal of the project is to support an NLP workflow in a data-oriented style,
integrating relevant Clojure protocols and libraries.
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014.
The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.
[[pdf](http://nlp.stanford.edu/pubs/StanfordCoreNlp2014.pdf)] [[bib](http://nlp.stanford.edu/pubs/StanfordCoreNlp2014.bib)]
......@@ -12,6 +12,10 @@
[:role "developer"]
[:role "maintainer"]]])
:dependencies [[org.clojure/clojure "1.11.1"]
;; DataLinguist
[dk.simongray/datalinguist "0.2.171"]
[edu.stanford.nlp/stanford-corenlp "4.4.0" :classifier "models"]
[edu.stanford.nlp/stanford-corenlp "4.4.0" :classifier "models-english"]
;; tools.namespace
[org.clojure/tools.namespace "1.3.0"]
;; java.data
......
(ns poalses.jonathan.dialect.dialect-nlp
(:require [clojure.datafy :refer [datafy]]
[dk.simongray.datalinguist :as dl]
[dk.simongray.datalinguist.triple :refer [triple->datalog]])
(:import [edu.stanford.nlp.coref CorefCoreAnnotations$CorefChainAnnotation]))
(def example
"Great work, thanks for doing this!
Having had no previous experience with NLP libraries, I was wondering why I couldn’t get your examples to work. Then I realized that I had to download CoreNLP first from https://stanfordnlp.github.io/CoreNLP/ and add stanford-corenlp-4.4.0/* to the classpath. Everything worked fine after that.
Is this what you are supposed to do? It wasn’t mentioned in the readme, so I was wondering if I did something wrong here or if it is more obvious to people who have already worked with CoreNLP.")
(def nlp
(dl/->pipeline {:annotators ["truecase"
"quote"
"entitymentions"
"parse"
"depparse"
"lemma"
"coref"
"openie"
"ner"]
:quote {:extractUnclosedQuotes "true"}}))
(def annotated-example
(delay (nlp example)))
(def sentences
(delay (dl/sentences @annotated-example)))
(defn show-sentences []
(clojure.pprint/pprint (map dl/mentions @sentences)))
(comment
;; Test every annotator in the pipeline
(map dl/true-case @sentences)
(map dl/quotations @sentences)
(map dl/mentions @sentences)
(map dl/constituency-tree @sentences)
(map dl/dependency-graph @sentences)
(map dl/lemma @sentences)
(map dl/mentions @sentences)
(->> (mapcat dl/triples @sentences) (map triple->datalog))
(dl/annotation CorefCoreAnnotations$CorefChainAnnotation @annotated-example)
(show-sentences)
;; Datafy the annotations. Retrieves direct annotations for every sentence.
;; Keep in mind that `dl/recur-datafy` currently doesn't work in this instance
;; and will possibly be removed in a future update:
;; https://github.com/simongray/datalinguist/issues/13
(map datafy @sentences)
#_.)
\ No newline at end of file
......@@ -3,6 +3,7 @@
{:author "Jonathan Poalses"}
(:require [clojure.string :as string]
[clojure.tools.cli :as cli]
[poalses.jonathan.dialect.dialect-nlp :as nlp]
[taoensso.timbre :as log])
(:gen-class))
......@@ -70,6 +71,7 @@
(let [shutdown-trigger (promise)
_bye-testing-hack (future (Thread/sleep 6000) (deliver shutdown-trigger true))]
(log/info "Dialect Detector started up.")
(nlp/show-sentences)
@shutdown-trigger
(log/info "Dialect Detector shutting down..."))
(catch Exception e
......@@ -84,11 +86,53 @@
;;==============================================================================
(comment
(def test-vector [122 "blah blah blah" "more" 777])
test-vector
(count test-vector)
(def test-list '(122 "blah blah blah" "more" 777))
test-list
(count test-list)
(conj test-vector (first test-list))
(conj test-vector (vec test-list))
(take 21 (cycle test-vector))
(into test-vector (vec (take 15 (repeatedly #(rand-int 10000)))))
(def another-vector
(delay [18746 "gjhsdfgh" 5857 58923]))
another-vector
@another-vector
(into test-vector test-list)
(conj test-vector 42)
[test-list]
(vec test-list)
(conj test-list 42)
*e)
;
; HissyBots REPL Service.
; Dialect Detector.
;
; Copyright (c) 2023, Hissycode Ltd. All rights reserved.
; Copyright (c) 2023, Jonathan Poalses.
;
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment