MORElab - Word2vec models for the Spanish Language

Word2vec models for the Spanish Language

Authors

Aitor Almeida
Aritz Bilbao Jayo

↪ External link
↪ DOI: 10.5281/zenodo.1155474

License:
Creative Commons

Description

Ready to use gensim Word2Vec embedding models for the Spanish language. Models are created using a window of +/- 5 words, discarding those words with less than 5 instances and creating a vector of 400 dimensions for each word. The text used to create the embeddings has been recovered from news, Wikipedia, the Spanish BOE, web crawling and open literary sources. The used text has a total of 3.257.329.900 words and 18.852.481.207 characters. If you use our models in you programs or research, please use the following citation: Aitor Almeida, & Aritz Bilbao. (2018). Spanish Word2Vec Model (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1155474

Citation

Aitor Almeida, Aritz Bilbao Jayo. (2018) "Word2vec models for the Spanish Language." Available from: https://github.com/aitoralmeida/spanish_word2vec