Creating the Lucene Index.
Suppose you’re asked to add search to an application. You need to do two things:
- Index your data.
- Make your data searchable.
For simplicity we will assume that your data will be text-only. (If you have other formats too we will cover that later with Apache Tika).
We will write a small Java command-line program called Indexer. If you have never programmed in Java before you should download an IDE. You want to use the free and Open Source Eclipse because it’s the best available for free.
Please note that the .jar file we are going to create here is already available in the download as Demo.jar. To get a better understanding of how it’s created we’ll do it ourselves using Eclipse. Also make sure you installed the latest version of the JDK. You can check the version by issuing the following command:
This will tell you which version was installed.
Now start with downloading Eclipse. You can download it from the Eclipse Website. Eclipse is available for all platforms. Choose your platform and download.
After you are done, open Eclipse. You will see a nice welcome screen with links to the tutorial, as well as some examples. Take your time to read them and import some of the examples.
Now it’s time to get started. Open the Java perspective and from the main menu select New and choose Java Project.
Eclipse will ask for a name. Enter Demo and click finish.
Next thing to do is importing the Lucene Library’s necessary for Lucene. You do this as follows:
- First download Lucene from http://lucene.apache.org/. Unzip the archive to a directory of your choice.
- The archive contains a number of .jar files, documents and examples.
- Take your time to read the documentation. In particular the Demo.
- We will use the Demo.
- In your Eclipse project create a new folder called ‘lib’.
- Add the Lucene core .jar file in the ‘lib’ Folder. This can be done by dragging and dropping the file.
- Right-click the file in the ‘lib’ Folder and select ‘Add to build-path’.
- You have now set up the ClassPath but there isn’t any code yet.
- To add the code, look in your Lucene Directory for a file named ‘IndexFiles.java’ and drag this into the Eclipse ‘src’ Directory. You will see a red cross appearing in front of the ‘IndexFiles.java’. This is because it’s expected to be inside a package
org.apache.lucene.demo . Right-click the file and select Move to package org.apache.lucene.demo. If everything went well, you will see the red cross disappear, indicating that everything works.
- Finally to see it working right-click the IndexFiles.java and select Run as Java Program.
- The console Window will open and you should see information on how to use the program.
- In order to be able to use it as a Java command-line program, you will need to export your project to the FileSystem. Do the following: Select your project in Eclipse, then from the main menu select File > export and select the Executable Java Archive (jar) format. Make sure to also select the ‘Include referenced jar libraries option.
- Open a command prompt and cd to the same directory where you exported the jar file.
- You can now execute the command as follows:
java -jar IndexFiles.jar
If everything went well, you will see a new Directory appeared in the same location where you exported the .jar file. This contains the Index files.
2. Check the Index
To make sure your Index is good we will use a tool which enables us to have a look inside it. This is a Java program called Luke. It’s already around for some time and it’s important to use a version which is compatible with our version of Lucene. To look for the latest version, check https://github.com/DmitryKey/luke It comes as an executable .jar file with a SWING GUI. To open it, either duibleclick or execute the command:
java -jar Lukeall.jar
Once opened you will see a screen like this:
Navigate to the Directory where your Index is stored and open it. Once opened you can now see your index and do some searching.
To proceed with searching go to part 2 of this series.