Skip to main content

Questions tagged [text-processing]

-1 votes
3 answers
443 views

I'm interested in finding a text distance (or string similarity) algorithm which computes a greater distance (or lower similarity) when characters are further apart. For example, I want the distance ...
Vermillion's user avatar
-4 votes
3 answers
246 views

Let's say I was to create a scraper. At some point I'll need to come up with algorithm of identifing whether or not a piece of a newly scraped text matches the one that's already in the DB. How would ...
Nicholas E. Harding's user avatar
-3 votes
1 answer
164 views

Suppose I have file a.txt, b.txt and c.txt: a.txt: Hello, I like cake. b.txt: Hello, I like turtles. c.txt: go away, I don't like you I suspect the difference between a.txt and b.txt is ...
user32882's user avatar
  • 267
0 votes
1 answer
176 views

so I am struggling a bit with a database setup. I found post with similar problems, but the reason behind the answers was not what I was looking for, hence I ask again with my specifics. I am building ...
Cerealz's user avatar
  • 41
0 votes
1 answer
277 views

Users have the ability to enter and save text in a rich text editor which is eventually stored in a database and then rendered on a site. Is it better to convert the RTF to HTML when it's stored to ...
Coupcoup's user avatar
  • 220
0 votes
1 answer
488 views

Is there a de facto standard algorithm for finding good places to put line breaks in a paragraph of text rendered in a monospace font (e.g. to a text console)? The algorithm should aim to output lines ...
Lassi's user avatar
  • 125
3 votes
1 answer
263 views

Looking to integrate TeX equations in a TeX-agnostic fashion, suitable for either ConTeXt or LaTeX, into a Java-based desktop Markdown editor. The possibilities are numerous, but I'm not sure what ...
Dave Jarvis's user avatar
-1 votes
1 answer
59 views

In search engine indexing, a body of text is often processed before it is indexed. A common example is stemming, were words are reduced to their root form (plurals are dropped, tense is normalized). ...
Deane's user avatar
  • 171
2 votes
0 answers
62 views

I am creating a text generation algorithm for my master's research. I have a dialogue between two people and I would like to simulate one part of the conversation with naturally generated text (not ...
Bennie van Eeden's user avatar
6 votes
2 answers
410 views

I have some Arabic content that is justified according to western conventions. I justified it because it is justified in ancient sources: However, the way Arabic text justification works is by ...
Lance Pollard's user avatar
4 votes
2 answers
649 views

I would like to store the frequencies with which words co-occur with each other over a variety of contexts in a large (> 1 billion tokens) text corpus. I need to store the word pair, the type of co-...
pgtn's user avatar
  • 51
0 votes
1 answer
133 views

Long time ago I learned that text files are not like Random access Files, i. e., adding or updating info at the beginning of a text file involves moving all the rest of the file "forward" (or ...
Mdot's user avatar
  • 1
0 votes
1 answer
90 views

I want to implement tracking of changes in plain-text documents, in a way similar to how it works in MS Word or Apple Pages. What I am unsure of is the data model and how to store it. Goal The ...
Adam Libuša's user avatar
  • 2,077
2 votes
3 answers
851 views

With a list of thousands of words and a small list of letters I am trying to find the least amount of words to make use of all given letters, assuming my dictionary of words covers all letters. The ...
kontur's user avatar
  • 131
0 votes
2 answers
13k views

I’m seeking a term and possibly the code behind what would help me implement that term in Python. I have been working on a text-based Python journaling application. When I want to review my ...
Iam Pyre's user avatar

15 30 50 per page