Chunking by Character Count Script
Snippet
Artificial Intelligence
Intermediate
Snippets are pieces of workflows. You can copy and paste them directly into any Tray workflow.
New to snippets? Watch this video
For more details please see our Snippet Documentation
About this SnippetCopy
Copy
Chunking by character count is a way to apply chunking to even the most difficult data sets. We recommend trying Markdown headers or Sentences first as a method for retrieval augmented generation pipelines. If those don't work well for your data you can adopt this foolproof method for your POC or while you learn more about how to best chunk your data. Use this script to easily setup your chunking logic. It allows you to easily update your chunk size AND overlap so you can refine your chunking strategy in a way that optimizes for your use case and data sets.