Route246
Full time employment: Posting here.
- Joined
- Jun 22, 2023
- Messages
- 766
This is heavy propeller-head stuff, very nerdy and technical but I wanted to show how AI will change S/W engineering, for better or worse and it is happening now. Most of us are at the end of our productive years but future generations are going to be severely impacted by this. Our children and grandchildren and beyond will be integrated into a new compute paradigm.
Here is my input to ChatCPT:
I need some python code to parse input strings for names with Arabic characters and make sure those characters comply with readability. The input strings will usually be book or video titles which must be kept in an orderly and readable list.
I chose Python because it is popular today. You could choose Go, C, C++, etc. and it would be able to do something sim
It output the following:
Sure! Here’s a Python script that will:
1. Detect Arabic characters in input strings.
2. Normalize them to ensure consistent readability (e.g., fixing mixed character forms, removing diacritics).
3. Sort the list with Arabic-aware collation (optional but often useful).
4. Optionally, remove or flag unreadable combinations.
We can use the arabic_reshaper and python-bidi libraries to improve Arabic text display (especially when printed in LTR terminals or GUIs), and unicodedata for normalization.
Requirements:
Install dependencies via pip if not already installed:
pip install arabic-reshaper python-bidi
Here is the sample python script:
import re
import unicodedata
import arabic_reshaper
from bidi.algorithm import get_display
# Regex for Arabic character range
ARABIC_REGEX = re.compile(r'[\u0600-\u06FF\u0750-\u077F\u08A0-\u08FF]+')
def remove_diacritics(text):
"""Removes Arabic diacritics for readability."""
arabic_diacritics = re.compile("""
ّ | # Tashdid
َ | # Fatha
ً | # Tanwin Fath
ُ | # Damma
ٌ | # Tanwin Damm
ِ | # Kasra
ٍ | # Tanwin Kasr
ْ | # Sukun
ـ # Tatwil/Kashida
""", re.VERBOSE)
return re.sub(arabic_diacritics, '', text)
def normalize_arabic_text(text):
"""Clean and normalize Arabic text."""
reshaped = arabic_reshaper.reshape(text)
bidi_text = get_display(reshaped)
no_diacritics = remove_diacritics(bidi_text)
return no_diacritics
def contains_arabic(text):
return bool(ARABIC_REGEX.search(text))
def parse_and_clean_titles(titles):
cleaned_titles = []
for title in titles:
if contains_arabic(title):
normalized = normalize_arabic_text(title)
cleaned_titles.append(normalized)
else:
cleaned_titles.append(title)
return sorted(cleaned_titles, key=lambda x: x.lower())
# Sample usage
if __name__ == "__main__":
input_titles = [
"كتاب اللغة العربية", # Arabic
"Python Programming",
"الفيديو التعليمي", # Arabic
"Learn العربية Fast", # Mixed
"كيف تتعلم؟", # Arabic with punctuation
]
cleaned_list = parse_and_clean_titles(input_titles)
print("\nCleaned & Sorted Titles:\n")
for title in cleaned_list:
print(title)
And, here the the notes:
Notes:
• normalize_arabic_text: removes common Arabic diacritics and reshapes for proper rendering.
• get_display(): ensures correct visual order in environments where RTL display is needed.
• Sorting: Sort is basic and case-insensitive, though not language-sensitive. For true Arabic collation, PyICU can be used.
Would you like Arabic-aware sorting (e.g., like dictionary order in Arabic), or should the current method be good enough?
It even asked for further input. I find this conversational large language model to be extremely promising in terms of the future. I believe AI is the next industrial revolution and there is nothing that can stop it now. This is not over-hyped or a bubble in my opinion. It has already provided serious amounts of productivity and will continue to evolve.
Here is my input to ChatCPT:
I need some python code to parse input strings for names with Arabic characters and make sure those characters comply with readability. The input strings will usually be book or video titles which must be kept in an orderly and readable list.
I chose Python because it is popular today. You could choose Go, C, C++, etc. and it would be able to do something sim
It output the following:
Sure! Here’s a Python script that will:
1. Detect Arabic characters in input strings.
2. Normalize them to ensure consistent readability (e.g., fixing mixed character forms, removing diacritics).
3. Sort the list with Arabic-aware collation (optional but often useful).
4. Optionally, remove or flag unreadable combinations.
We can use the arabic_reshaper and python-bidi libraries to improve Arabic text display (especially when printed in LTR terminals or GUIs), and unicodedata for normalization.

Install dependencies via pip if not already installed:
pip install arabic-reshaper python-bidi
Here is the sample python script:
import re
import unicodedata
import arabic_reshaper
from bidi.algorithm import get_display
# Regex for Arabic character range
ARABIC_REGEX = re.compile(r'[\u0600-\u06FF\u0750-\u077F\u08A0-\u08FF]+')
def remove_diacritics(text):
"""Removes Arabic diacritics for readability."""
arabic_diacritics = re.compile("""
ّ | # Tashdid
َ | # Fatha
ً | # Tanwin Fath
ُ | # Damma
ٌ | # Tanwin Damm
ِ | # Kasra
ٍ | # Tanwin Kasr
ْ | # Sukun
ـ # Tatwil/Kashida
""", re.VERBOSE)
return re.sub(arabic_diacritics, '', text)
def normalize_arabic_text(text):
"""Clean and normalize Arabic text."""
reshaped = arabic_reshaper.reshape(text)
bidi_text = get_display(reshaped)
no_diacritics = remove_diacritics(bidi_text)
return no_diacritics
def contains_arabic(text):
return bool(ARABIC_REGEX.search(text))
def parse_and_clean_titles(titles):
cleaned_titles = []
for title in titles:
if contains_arabic(title):
normalized = normalize_arabic_text(title)
cleaned_titles.append(normalized)
else:
cleaned_titles.append(title)
return sorted(cleaned_titles, key=lambda x: x.lower())
# Sample usage
if __name__ == "__main__":
input_titles = [
"كتاب اللغة العربية", # Arabic
"Python Programming",
"الفيديو التعليمي", # Arabic
"Learn العربية Fast", # Mixed
"كيف تتعلم؟", # Arabic with punctuation
]
cleaned_list = parse_and_clean_titles(input_titles)
print("\nCleaned & Sorted Titles:\n")
for title in cleaned_list:
print(title)
And, here the the notes:

• normalize_arabic_text: removes common Arabic diacritics and reshapes for proper rendering.
• get_display(): ensures correct visual order in environments where RTL display is needed.
• Sorting: Sort is basic and case-insensitive, though not language-sensitive. For true Arabic collation, PyICU can be used.
Would you like Arabic-aware sorting (e.g., like dictionary order in Arabic), or should the current method be good enough?
It even asked for further input. I find this conversational large language model to be extremely promising in terms of the future. I believe AI is the next industrial revolution and there is nothing that can stop it now. This is not over-hyped or a bubble in my opinion. It has already provided serious amounts of productivity and will continue to evolve.