Newest 'text-processing' Questions - Software Engineering Stack Exchange

-1 votes

3 answers

443 views

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

I'm interested in finding a text distance (or string similarity) algorithm which computes a greater distance (or lower similarity) when characters are further apart. For example, I want the distance ...

Vermillion

159

asked Sep 22, 2022 at 20:56

-4 votes

3 answers

246 views

How to identify whether or not 2 pieces of text are identical? [closed]

Let's say I was to create a scraper. At some point I'll need to come up with algorithm of identifing whether or not a piece of a newly scraped text matches the one that's already in the DB. How would ...

Nicholas E. Harding

31

asked Jun 29, 2022 at 2:07

-3 votes

1 answer

164 views

Applying a file diff to a new file [closed]

Suppose I have file a.txt, b.txt and c.txt: a.txt: Hello, I like cake. b.txt: Hello, I like turtles. c.txt: go away, I don't like you I suspect the difference between a.txt and b.txt is ...

user32882

267

asked Jan 19, 2022 at 18:46

0 votes

1 answer

176 views

Database of big text documents many-to-many: one big relationship table, a lot of small ones, or a better way to link abstract text data?

so I am struggling a bit with a database setup. I found post with similar problems, but the reason behind the answers was not what I was looking for, hence I ask again with my specifics. I am building ...

Cerealz

41

asked Feb 19, 2021 at 13:59

0 votes

1 answer

277 views

Convert RTF to HTML when it's saved to the database or when it's rendered?

Users have the ability to enter and save text in a rich text editor which is eventually stored in a database and then rendered on a site. Is it better to convert the RTF to HTML when it's stored to ...

Coupcoup

220

asked Dec 8, 2020 at 0:34

0 votes

1 answer

488 views

Algorithm for line breaking in monospace text

Is there a de facto standard algorithm for finding good places to put line breaks in a paragraph of text rendered in a monospace font (e.g. to a text console)? The algorithm should aim to output lines ...

Lassi

125

asked Dec 1, 2020 at 18:18

3 votes

1 answer

263 views

Integrating TeX into a Java desktop application

Looking to integrate TeX equations in a TeX-agnostic fashion, suitable for either ConTeXt or LaTeX, into a Java-based desktop Markdown editor. The possibilities are numerous, but I'm not sure what ...

Dave Jarvis

743

asked Aug 14, 2020 at 15:43

-1 votes

1 answer

59 views

Equal transformations on both indexed content and query content before a search is attempted

In search engine indexing, a body of text is often processed before it is indexed. A common example is stemming, were words are reduced to their root form (plurals are dropped, tense is normalized). ...

Deane

171

asked Apr 23, 2020 at 9:44

2 votes

0 answers

62 views

How to test generated text

I am creating a text generation algorithm for my master's research. I have a dialogue between two people and I would like to simulate one part of the conversation with naturally generated text (not ...

Bennie van Eeden

137

asked Nov 19, 2019 at 16:13

6 votes

2 answers

410 views

How does the Arabic typographic layout system work at a high level?

I have some Arabic content that is justified according to western conventions. I justified it because it is justified in ancient sources: However, the way Arabic text justification works is by ...

Lance Pollard

2,787

asked Nov 10, 2019 at 14:16

4 votes

2 answers

649 views

Database structure for word co-occurrence frequencies in a large corpus

I would like to store the frequencies with which words co-occur with each other over a variety of contexts in a large (> 1 billion tokens) text corpus. I need to store the word pair, the type of co-...

pgtn

51

asked May 20, 2019 at 13:36

0 votes

1 answer

133 views

Is my understanding of modern IDEs autosave feature naive? [duplicate]

Long time ago I learned that text files are not like Random access Files, i. e., adding or updating info at the beginning of a text file involves moving all the rest of the file "forward" (or ...

Mdot

1

asked May 16, 2019 at 15:39

0 votes

1 answer

90 views

How to implement tracking of changes in text documents à la MS-Word/Apple Pages

I want to implement tracking of changes in plain-text documents, in a way similar to how it works in MS Word or Apple Pages. What I am unsure of is the data model and how to store it. Goal The ...

Adam Libuša

2,077

asked Oct 24, 2018 at 17:28

2 votes

3 answers

851 views

Find the least words that will use all given letters

With a list of thousands of words and a small list of letters I am trying to find the least amount of words to make use of all given letters, assuming my dictionary of words covers all letters. The ...

kontur

131

asked Feb 1, 2018 at 17:41

0 votes

2 answers

13k views

Name and code to space between lines/paragraphs

I’m seeking a term and possibly the code behind what would help me implement that term in Python. I have been working on a text-based Python journaling application. When I want to review my ...

Iam Pyre

67

asked Jan 13, 2018 at 19:19

Stack Exchange Network

Questions tagged [text-processing]

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

How to identify whether or not 2 pieces of text are identical? [closed]

Applying a file diff to a new file [closed]

Database of big text documents many-to-many: one big relationship table, a lot of small ones, or a better way to link abstract text data?

Convert RTF to HTML when it's saved to the database or when it's rendered?

Algorithm for line breaking in monospace text

Integrating TeX into a Java desktop application

Equal transformations on both indexed content and query content before a search is attempted

How to test generated text

How does the Arabic typographic layout system work at a high level?

Database structure for word co-occurrence frequencies in a large corpus

Is my understanding of modern IDEs autosave feature naive? [duplicate]

How to implement tracking of changes in text documents à la MS-Word/Apple Pages

Find the least words that will use all given letters

Name and code to space between lines/paragraphs

Hot Network Questions