Language translation can be tricky because of the underlying nuances in each lan...

Language translation can be tricky because of the underlying nuances in each language so more context would probably be better, but using multiple steps to evaluate its performance on a key level would be a good way to improve the confidence.

It might be beneficial to start your dataset at the key (word) level, generate some embeddings of the key pair in the source and target and stash them, then do the same for sentence level and just for fun, paragraph level. (I believe you could get enough context from the sentence level as a paragraph is just a group of sentences but it would still be interesting to generate paragraph level key pairs I think).

From there you’d have a set of embeddings of each word src:tgt that also has context of how it fits in a sentence level and paragraph level with the respective nuances of each language.

Once you have that dataset then you can augment your data with prompts like you’re using but also including some contextual references of word pairs, and sentence pairs in your prompt which should corner the LLM into the right path.

Edit: not an expert so will heed if someone smarter comes along.