data_files = {"train": "drugsComTrain_raw.tsv", "test": "drugsComTest_raw.tsv"} # \t is the tab character in Python drug_dataset = load_dataset("csv", data_files=data_files, delimiter="\t")
COMMENT: Requiring online connection is a deal breaker in some cases unfortunately so it'd be great if offline mode is added similar to how `transformers` loads models offline fine.
@mandubian's second bullet point suggests that there's a workaround allowing you to use your offline (custom?) dataset with `datasets`. Could you please elaborate on how that should look like? SCORE: 25.505016326904297 TITLE: Discussion using datasets in offline mode URL: https://github.com/huggingface/datasets/issues/824
COMMENT: The local dataset builders (csv, text , json and pandas) are now part of the `datasets` package since #1726 :) You can now use them offline
datasets = load_dataset('text', data_files=data_files) We'll do a new release soon SCORE: 24.555538177490234 TITLE: Discussion using datasets in offline mode URL: https://github.com/huggingface/datasets/issues/824 ================================================== COMMENT: I opened a PR that allows to reload modules that have already been loaded once even if there's no internet. Let me know if you know other ways that can make the offline mode experience better. I'd be happy to add them :) I already note the "freeze" modules option, to prevent local modules updates. It would be a cool feature. ---------- > @mandubian's second bullet point suggests that there's a workaround allowing you to use your offline (custom?) dataset with `datasets`. Could you please elaborate on how that should look like? Indeed `load_dataset` allows to load remote dataset script (squad, glue, etc.) but also you own local ones. For example if you have a dataset script at `./my_dataset/my_dataset.py` then you can do
load_dataset("./my_dataset")
and the dataset script will generate your dataset once and for all.
----------
About I'm looking into having `csv`, `json`, `text`, `pandas` dataset builders already included in the `datasets` package, so that they are available offline by default, as opposed to the other datasets that require the script to be downloaded. cf #1724 SCORE: 24.14898681640625 TITLE: Discussion using datasets in offline mode URL: https://github.com/huggingface/datasets/issues/824 ==================================================
COMMENT: > here is my way to load a dataset offline, but it **requires** an online machine > > 1. (online machine) > > > import datasets > > data = datasets.load_dataset(...) > > data.save_to_disk(/YOUR/DATASET/DIR) > > > 2. copy the dir from online to the offline machine > > 3. (offline machine) > > > import datasets > > data = datasets.load_from_disk(/SAVED/DATA/DIR) > > HTH. SCORE: 22.89400291442871 TITLE: Discussion using datasets in offline mode URL: https://github.com/huggingface/datasets/issues/824 ================================================== COMMENT: here is my way to load a dataset offline, but it **requires** an online machine 1. (online machine) import datasets data = datasets.load_dataset(...) data.save_to_disk(/YOUR/DATASET/DIR)
2. copy the dir from online to the offline machine
3. (offline machine)
import datasets data = datasets.load_from_disk(/SAVED/DATA/DIR)
HTH. SCORE: 22.40665626525879 TITLE: Discussion using datasets in offline mode URL: https://github.com/huggingface/datasets/issues/824 ==================================================