In the name of Allah, and may peace and blessings be upon the Messenger of Allah.
O Allah, we ask You for pardon and well-being in this world and the Hereafter.
Introduction
Around a year ago, Allah blessed me with the opportunity to volunteer as a developer for the “Rasaif” project.
What is Rasaif?
Rasaif Al-Sihah Li Tarajim Al-Fusah is a platform that displays words in their context alongside their English equivalents. This remarkable project is distinguished by its selection of eloquent books from classical authors. It was initiated by the Saudi translator Ahmad Al-Ghamdi, author of Al-Aranjiyah, in collaboration with dedicated volunteers—may Allah accept their efforts. You can watch this video to learn more about the initiative.
Challenges
-
Sustaining Volunteer Efforts
Ensuring the continuity of volunteers working on formatting books using Word. -
Displaying Search Results in Context
The ability to show search results within their context, along with their equivalent text in the other language. -
Comprehensive and Fast Search Feature
Providing an instant search functionality that spans all books, similar to Al-Maktabah Al-Shamilah. -
Streamlined Book Upload
Enabling the upload of a fully formatted book in one step, rather than paragraph by paragraph as before. -
Customizable Search Options
Allowing search customization by:- Book
- Category
- Author
- Language
-
Sequential Result Navigation
Allowing users to navigate to previous and next results within the same book. -
Search Analytics
Tracking the words searched for by visitors, their search frequency, and whether they were found in the books.
Tools Used
-
Pandoc
For converting Word files into HTML format. -
Jupyter Notebook
To process HTML files, refine texts using the Pandas library, and then convert the output into JSON-ND format for Elasticsearch. -
Elasticsearch
For advanced text search capabilities.
Steps
1- Using Pandoc for file conversion:
|
|
2- Importing essential libraries in Jupyter Notebook
|
|
3- Declaring Variables
|
|
4- Converting HTML to CSV and then loading it into a DataFrame using Pandas.
|
|
5- Data cleaning and modification operations.
|
|