Questions tagged [text-processing]
The text-processing tag has no summary.
55 questions
-1
votes
3
answers
443
views
Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?
I'm interested in finding a text distance (or string similarity) algorithm which computes a greater distance (or lower similarity) when characters are further apart.
For example, I want the distance ...
-4
votes
3
answers
246
views
How to identify whether or not 2 pieces of text are identical? [closed]
Let's say I was to create a scraper. At some point I'll need to come up with algorithm of identifing whether or not a piece of a newly scraped text matches the one that's already in the DB. How would ...
-3
votes
1
answer
164
views
Applying a file diff to a new file [closed]
Suppose I have file a.txt, b.txt and c.txt:
a.txt:
Hello, I like cake.
b.txt:
Hello, I like turtles.
c.txt:
go away, I don't like you
I suspect the difference between a.txt and b.txt is ...
0
votes
1
answer
176
views
Database of big text documents many-to-many: one big relationship table, a lot of small ones, or a better way to link abstract text data?
so I am struggling a bit with a database setup. I found post with similar problems, but the reason behind the answers was not what I was looking for, hence I ask again with my specifics.
I am building ...
0
votes
1
answer
277
views
Convert RTF to HTML when it's saved to the database or when it's rendered?
Users have the ability to enter and save text in a rich text editor which is eventually stored in a database and then rendered on a site.
Is it better to convert the RTF to HTML when it's stored to ...
0
votes
1
answer
488
views
Algorithm for line breaking in monospace text
Is there a de facto standard algorithm for finding good places to put line breaks in a paragraph of text rendered in a monospace font (e.g. to a text console)?
The algorithm should aim to output lines ...
3
votes
1
answer
263
views
Integrating TeX into a Java desktop application
Looking to integrate TeX equations in a TeX-agnostic fashion, suitable for either ConTeXt or LaTeX, into a Java-based desktop Markdown editor. The possibilities are numerous, but I'm not sure what ...
-1
votes
1
answer
59
views
Equal transformations on both indexed content and query content before a search is attempted
In search engine indexing, a body of text is often processed before it is indexed. A common example is stemming, were words are reduced to their root form (plurals are dropped, tense is normalized). ...
2
votes
0
answers
62
views
How to test generated text
I am creating a text generation algorithm for my master's research. I have a dialogue between two people and I would like to simulate one part of the conversation with naturally generated text (not ...
6
votes
2
answers
410
views
How does the Arabic typographic layout system work at a high level?
I have some Arabic content that is justified according to western conventions.
I justified it because it is justified in ancient sources:
However, the way Arabic text justification works is by ...
4
votes
2
answers
649
views
Database structure for word co-occurrence frequencies in a large corpus
I would like to store the frequencies with which words co-occur with each other over a variety of contexts in a large (> 1 billion tokens) text corpus. I need to store the word pair, the type of co-...
0
votes
1
answer
133
views
Is my understanding of modern IDEs autosave feature naive? [duplicate]
Long time ago I learned that text files are not like Random access Files, i. e., adding or updating info at the beginning of a text file involves moving all the rest of the file "forward" (or ...
0
votes
1
answer
90
views
How to implement tracking of changes in text documents à la MS-Word/Apple Pages
I want to implement tracking of changes in plain-text documents, in a way similar to how it works in MS Word or Apple Pages. What I am unsure of is the data model and how to store it.
Goal
The ...
2
votes
3
answers
851
views
Find the least words that will use all given letters
With a list of thousands of words and a small list of letters I am trying to find the least amount of words to make use of all given letters, assuming my dictionary of words covers all letters.
The ...
0
votes
2
answers
13k
views
Name and code to space between lines/paragraphs
I’m seeking a term and possibly the code behind what would help me implement that term in Python.
I have been working on a text-based Python journaling application.
When I want to review my ...