SentenceTransformer based on Parveshiiii/Embedding

This is a sentence-transformers model finetuned from Parveshiiii/Embedding on the trivia-qa-triplet dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Parveshiiii/Embedding
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • trivia-qa-triplet

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': False})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7635, 0.5915],
#         [0.7635, 1.0000, 0.6165],
#         [0.5915, 0.6165, 1.0000]])

Training Details

Training Dataset

trivia-qa-triplet

  • Dataset: trivia-qa-triplet
  • Size: 52,856,818 training samples
  • Columns: anchor, positive, and negative
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 512,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4096
  • max_steps: 3230
  • learning_rate: 2e-05
  • warmup_steps: 100
  • optim: adamw_torch_fused
  • bf16: True
  • gradient_checkpointing: True
  • accelerator_config: {'split_batches': True, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 4096
  • num_train_epochs: 3.0
  • max_steps: 3230
  • learning_rate: 2e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 100
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: no
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': True, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0031 10 7.6224
0.0062 20 7.6213
0.0093 30 7.6198
0.0124 40 7.6083
0.0155 50 7.5609
0.0186 60 7.4522
0.0217 70 7.3751
0.0248 80 7.3387
0.0279 90 7.3863
0.0310 100 7.2047
0.0341 110 7.2849
0.0372 120 7.2857
0.0402 130 7.3164
0.0433 140 7.2506
0.0464 150 7.4432
0.0495 160 7.2519
0.0526 170 7.3358
0.0557 180 7.2496
0.0588 190 7.3306
0.0619 200 7.2377
0.0650 210 7.2976
0.0681 220 7.2039
0.0712 230 7.1852
0.0743 240 7.2373
0.0774 250 7.2902
0.0805 260 7.2145
0.0836 270 7.2598
0.0867 280 7.3147
0.0898 290 7.1940
0.0929 300 7.2009
0.0960 310 7.2074
0.0991 320 7.3131
0.1022 330 7.2124
0.1053 340 7.1579
0.1084 350 7.1688
0.1115 360 7.2484
0.1146 370 7.2506
0.1176 380 7.1243
0.1207 390 7.2264
0.1238 400 7.3368
0.1269 410 7.3014
0.1300 420 7.2524
0.1331 430 7.0409
0.1362 440 7.1438
0.1393 450 7.2448
0.1424 460 7.2018
0.1455 470 7.2354
0.1486 480 7.2031
0.1517 490 7.2163
0.1548 500 7.1130
0.1579 510 7.1783
0.1610 520 7.1934
0.1641 530 7.1669
0.1672 540 7.1286
0.1703 550 7.1773
0.1734 560 7.2205
0.1765 570 7.0962
0.1796 580 7.3322
0.1827 590 7.1580
0.1858 600 7.0881
0.1889 610 7.1334
0.1920 620 7.0562
0.1950 630 7.2170
0.1981 640 7.1307
0.2012 650 7.1279
0.2043 660 7.0545
0.2074 670 7.2590
0.2105 680 7.1954
0.2136 690 7.0225
0.2167 700 7.1797

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 5.3.0
  • Transformers: 5.4.0
  • PyTorch: 2.4.1+cu124
  • Accelerate: 1.13.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
476
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Org-Exp/Freakembedding-prebase

Finetuned
(1)
this model

Papers for Org-Exp/Freakembedding-prebase