Word Embeddings Based on Spectral Analysis: A Novel Approach

Authors

Abstract

Recently, deep learning algorithms have gained huge attention. However, such algorithms are not the optimal solution for many tasks. Spectral analysis transformation algorithms, such as wavelet-transform and Fourier transform, have been successfully applied on many NLP tasks. The challenging issue of using spectral analysis is how to construct a meaningful signal from a text. In word2vec models, different types of neural networks have been applied to learn vector representations of words, which carry the semantic similarities of each word in a specific dataset. Training the word embeddings is computationally very expensive and constrained by the available resources. However, this paper provides an optimized computational complexity for developing word embeddings using parallel computing as well as utilizing an upper ontology to represent the main components of the word vector. Moreover, this research shows how to represent a term as a vector to facilitate computing the similarity or relatedness among different terms. Therefore, this research considers the spectral analysis, which also includes the spatial information of the words around the current word.

References

Aljaloud, H., & Dahab, & M. Kamal, M. (2016). Stemmer impact on Quranic mobile information retrieval performance Stemmer impact on quranic mobile information retrieval performance. Int. J. Adv. Comput. Sci. Appl.(IJACSA), 7(12), 135-139.

Al-Mofareji, H., & Kamel, M. ,& Dahab, MY. (2017). WeDoCWT: A new method for web document clustering using discrete wavelet transforms Wedocwt: A new method for web document clustering using discrete wavelet transforms. Journal of Information & Knowledge Management, 16(1), 1-19.

Alnofaie, S., & Dahab, M., & Kamal, M. (2016). A novel information retrieval approach using query expansion and spectral-based A novel information retrieval approach using query expansion and spectral-based. information retrieval,7(9),364-373.

Chen, C., & Gao, S., & Xing, Z. (2016). Mining analogical libraries in q&a discussions-incorporating relational and categorical knowledge into word embedding. 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), (1), 338-348.

Costa, A., & Melucci, M. (2010). An information retrieval model based on discrete fourier transform. Information Retrieval Facility Conference Information retrieval facility conference, 84-99.

Cummins, R., & O'Riordan, C. (2009). Learning in a pairwise term-term proximity framework for information retrieval. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 251-258.

Dahab, MY., & Alnofaie, S., & Kamel, M. (2018a). A tutorial on information retrieval using query expansion A tutorial on information retrieval using query expansion. Intelligent Natural Language Processing: Trends and Applications,740, 761-776. Springer.

Dahab, MY., & Kamel, M., & Alnofaie, S. (2016). Further investigations for documents information retrieval based on DWT. International Conference on Advanced Intelligent Systems and Informatics,533, 3-11. Springer.

Dahab, MY., & Kamel, M., & Alnofaie, S. (2018b). An Empirical Study of Documents Information Retrieval Using DWT. Intelligent Natural Language Processing: Trends and Applications Intelligent natural language processing: Trends and applications, 740, 251-264. Springer.

Dalcin, LD., & Paz, RR., & Kler, PA., & Cosimo, A. (2011). Parallel distributed computing using Python Parallel distributed computing using python. Advances in Water Resources, 349,1124-1139.

Daubechies, I. (1996). Where do wavelets come from? A personal point of view. Proceedings of the IEEE, 844, 510-513.

Diwali, A., & Kamel, M., & Dahab, M. (2015). Arabic text-based chat topic classification using discrete wavelet transform. International Journal of Computer Science Issues (IJCSI),12(2), 86-94.

Drozd, A., & Gladkova, A., & Matsuoka, S. (2016). Word Embeddings, Analogies, and Machine Learning: Beyond king – man + woman = queen. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical, 3519-3530. Osaka, JapanThe COLING 2016 Organizing Committee.

Lee, G., & Wasilewski, F., & Gommers, R., & Wohlfahrt, K., & O'Leary, A., & Nahrstaedt, H. (2006). PyWavelets: Wavelet Transforms in Python. Pywavelets: Wavelet transforms in python.

Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings Dependency-based word embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers) , 302-308.

Mikolov, T., & Chen, K., & Corrado, & G. Dean, J. (2013a). Efficient estimation of word representations in vector space Ecient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Mikolov, T., & Le, QV., & Sutskever, I. (2013b). Exploiting similarities among languages for machine translation Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.

Niles, I., & Pease, A. (2001). Towards a standard upper ontology Towards a standard upper ontology. Proceedings of the international conference on Formal Ontology in Information Systems-Volume 2001, (2-9).

Park, LA., & Palaniswami, M. & Ramamohanarao, K. (2005a). A novel document ranking method using the discrete cosine transform A novel document ranking method using the discrete cosine transform. IEEE transactions on pattern analysis and machine intelligence, 27(1), 130-135.

Park, LA., & Ramamohanarao, K., & Palaniswami, M. (2005b). A novel document retrieval method using the discrete wavelet transform A novel document retrieval method using the discrete wavelet transform. ACM Transactions on Information Systems (TOIS), 23(3), 267-298.

Porter, M. (2008). The Porter stemming algorithm, (2005). Retrieved 12 October 2020 from URL http://www.tartarus.org/martin/PorterStemmer/index. html.

Reiter, N. (2007). Towards a Linking of FrameNet and SUMO. Doctoral dissertation, Master’s thesis, Saarland University.

Rong, X. (2014). word2vec parameter learning explained word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.

Senel, LK., & Utlu, I., & Yucesoy, V., & Koc, A. & Cukur, T. (2018). Semantic structure and interpretability of word embeddings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10), 1769-1779.

Yao, Y., Li, X., Liu, X., Liu, P., Liang, Z., Zhang, J. Mai, K. 2017. Sensing spatial distribution of urban land use by integrating points-of-interest and GoogleWord2Vec model. International Journal of Geographical Information Science, 31(4), 825-848.

Downloads

Published

2021-05-02

Issue

Section

Articles