https://github.com/josescuderoh/TopMine-R-wrapper
As a green hand, who wants to try the TopMine algorithm in R
This TopMine-R-wrapper is the first choice, essentially, there are some problems in running the algorithm.
It’s my personal experience of how to run it successfully in 2022. Maybe, it can help someone avoid being stuck for a long time.
search “install MALLET WINDOWS” Keywords online to get more detail
1.1.1.1 JAVA
1.1.1.2 MALLET
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
C:/
mallet-xxx
to mallet
I USE VERSION 2.0.8
1.1.1.3 Apache Ant
Apache Ant is a Java library and command-line tool that help building software
C:/Program Files
I USE VERSION 1.10.12
Computer\Control Panel\System and Security\System\Advanced system setting
System Properties\Environment Variables
1.1.2.1 JAVA
– New add:
variable name: JAVA_HOME
variable value: C:\Program Files\Java\jdk-19\bin
– Path
> Edit > New add
C:\Program Files\Java\jdk-19\bin
note: jdk-xx
, local version
1.1.2.2 Mallet
– New add:
variable name: MALLET_HOME
variable value: C:\mallet
– Path
> Edit > New add
%MALLET_HOME%\bin
**- classpath
> Edit > New add **
MALLET_HOME%/class;%MALLET_HOME%lib;%MALLET_HOME%/lib/mallet-deps.jar
note: if there’s no classpath
, create one .
1.1.2.3 Mallet
– New add:
variable name: ANT_HOME
variable value: C:\Program Files\apache-ant
– Path
> Edit > New add
%ANT_HOME%\bin
**- classpath
> Edit > New add **
%ANT_HOME%lib
note: variable value
of classpath
are delimited by “;”
search “install python WINDOWS” Keywords online to get more detail
Download: https://www.python.org/downloads/
I USE VERSION 3.10.8
**- Path
> Edit > New add 2 Variables **
C:\Program Files\Python310\Scripts
C:\Program Files\Python310
note: Pythonxxx
, local version
search “install R WINDOWS” Keywords online to get more detail
these files are updated to ..\TopMine-R-wrapper\ToPMine\topicalPhrases\bin
WHY & HOW, as follows
run in R , it throw errors:
java.lang.NoClassDefFoundError: cc/mallet/types/Dirichlet
java.lang.NoClassDefFoundError: cc/mallet/pipe/Pipe
....
java.lang.NoClassDefFoundError: gnu/trove/TObjectIntHashMap
....
java.lang.NoClassDefFoundError: org/tartarus/snowball/ext/englishStemmer
....
compile MALLET with Apache Ant
cd c:\mallet
, press Enterant
, press EnterC:\Users\user>cd c:\mallet
c:\mallet>ant
Buildfile: c:\mallet\build.xml
init:
[copy] Copying 1 file to c:\mallet\class
compile:
[javac] c:\mallet\build.xml:62: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
BUILD SUCCESSFUL
Total time: 1 second
Copy folder to TopMine-R-wrapper
C:\mallet\class\cc
..\TopMine-R-wrapper\ToPMine\topicalPhrases\bin
note: in this repo, bin
contains cc
folder
Download: https://trove4j.sourceforge.net/html/overview.html
select the old release version : trove-2.1.0.tar.gz
note: some files or paths have been changed in higher version
uncompress and copy folder to TopMine-R-wrapper
..\trove-2.1.0\lib\trove-2.1.0.jar\gnu
..\TopMine-R-wrapper\ToPMine\topicalPhrases\bin
note: in this repo, bin
contains gnu
folder
Download: http://snowball.tartarus.org/download.html
**compile **
..\org
..\TopMine-R-wrapper\ToPMine\topicalPhrases\bin
note: in this repo, bin
contains org
folder