Skip to toolbar

Community & Business Groups

The Technology of Meetings, Lectures, Discussion Panels, Dialogues, Argumentation and Debates

The technology of meetings, lectures, discussion panels, dialogues, argumentation and debates are of interest to our group. Some topics in the overlap of artificial intelligence with meetings support technology are discussed, meetings occurring in all organizations, in all sectors, academia, science, industry and government.

Individuals also meet to to do civics, to participate in townhall discussions, to participate in the democracies of their neighborhoods or cities. Accordingly, meetings support technology can enhance Web-based civic engagement. Meetings support technology can empower individuals, organizations and communities, pertaining to the operation of governments and to the transparency of governments, city, state and federal.

The topics presented include the recording of meetings with modern sensors, multiparty speech recognition, obtaining transcripts from meetings, the processing of the data from arrays of sensors, such as pointclouds and 3D audio, into photographs, video, 3D video as well as binaural, surround sound or ambisonic audio. Software technology topics include conveniencing meeting participants as well as production teams with advanced features.

Ten topics are presented:

  1. Obtaining 3D data, pointclouds, from multiple sensors. Obtaining 3D audio from multiple sensors. Obtaining photographs, video, 3D video, binaural audio, surround sound, ambisonics from sensor data.
  2. Natural language understanding, sound source localization, multiperson speech recognition, multiperson nonverbal gesture recognition.
  3. Transcription, topic modeling, keyword generation, enhancing the indexing of video, video segments, video clips.
  4. Modeling meetings, lectures, discussion panels, dialogues, argumentation and debates; detecting events, categorizing events.
  5. Interpreting meetings, interpreting narratives or storyboards from meetings, summarizing meetings, motions of attention during meetings.
  6. The virtual cinematography or videography utilizing virtual cameras; the capability to position virtual cameras in space, to adjust virtual camera settings, to move virtual cameras around to obtain photographs or videos.
  7. The capability to, beyond outputting one video stream, output multiple simultaneous multimedia streams, multiple simultaneous virtual cameras, as per multiview video.
  8. The processing of photographs or video cinematography from meetings; utilizing photographs, videos as well as pointclouds data, machine learning from human photographers, videographers.
  9. The storage of pointcloud video, archiving of the raw preprocessed 3D data; the indexing, search, retrieval of 3D multimedia content.
  10. The summarization of sets of meetings, dashboard summarizations of sets of meetings.


Abdollahian, Golnaz, Cüneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp. “Camera motion-based analysis of user generated video.” Multimedia, IEEE Transactions on 12, no. 1 (2010): 28-41.

Amita Jajoo, Suman Kumari, Sapana Borole. “Character-Based Scene Extraction and Movie Summarization Using Character Interactions.”

Ang, Jeremy, Yang Liu, and Elizabeth Shriberg. “Automatic Dialog Act Segmentation and Classification in Multiparty Meetings.” In ICASSP (1), pp. 1061-1064. 2005.

Bares, William H., Joël P. Grégoire, and James C. Lester. “Realtime constraint-based cinematography for complex interactive 3d worlds.” In AAAI/IAAI, pp. 1101-1106. 1998.

Bhatt, Mehul, Jakob Suchan, and Christian Freksa. “ROTUNDE-A Smart Meeting Cinematography Initiative: Tools, Datasets, and Benchmarks for Cognitive Interpretation and Control.” arXiv preprint arXiv:1306.1034 (2013).

Bhatt, Mehul, Jakob Suchan, and Carl Schultz. “Cognitive Interpretation of Everyday Activities: Toward Perceptual Narrative Based Visuo-Spatial Scene Interpretation.” arXiv preprint arXiv:1306.5308 (2013).

Bianchi, Michael. “Automatic video production of lectures using an intelligent and aware environment.” In Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia, pp. 117-123. ACM, 2004.

Buchsbaum, Daphna, Thomas L. Griffiths, Dillon Plunkett, Alison Gopnik, and Dare Baldwin. “Inferring Action Structure and Causal Relationships in Continuous Sequences of Human Action.” Cognitive psychology 76 (2015): 30-77.

Buist, Anne Hendrik, Wessel Kraaij, and Stephan Raaijmakers. “Automatic Summarization of Meeting Data: A Feasibility Study.” In CLIN. 2004.

Cutler, Ross, Yong Rui, Anoop Gupta, Jonathan J. Cadiz, Ivan Tashev, Li-wei He, Alex Colburn, Zhengyou Zhang, Zicheng Liu, and Steve Silverberg. “Distributed meetings: A meeting capture and broadcasting system.” In Proceedings of the tenth ACM international conference on Multimedia, pp. 503-512. ACM, 2002.

de Lima, Edirlei ES, Cesar T. Pozzer, Marcos C. d’Ornellas, Angelo EM Ciarlini, Bruno Feijó, and Antonio L. Furtado. “Virtual cinematography director for interactive storytelling.” In Proceedings of the International Conference on Advances in Computer Enterntainment Technology, pp. 263-270. ACM, 2009.

de Mdntaras, R. Lopez, and L. Saitta. “Knowledge-based cinematography and its applications.” In Ecai 2004: Proceedings of the 16th European Conference on Artificial Intelligence, vol. 110, p. 256. IOS Press, 2004.

DiMicco, Joan Morris, Katherine J. Hollenbach, and Walter Bender. “Using visualizations to review a group’s interaction dynamics.” In CHI’06 Extended Abstracts on Human Factors in Computing Systems, pp. 706-711. ACM, 2006.

Dubba, Krishna, Mehul Bhatt, Frank Dylla, David C. Hogg, and Anthony G. Cohn. “Interleaved inductive-abductive reasoning for learning complex event models.” In Inductive Logic Programming, pp. 113-129. Springer Berlin Heidelberg, 2012.

Erol, Berna, Jonathan J. Hull, and Dar-Shyang Lee. “Linking multimedia presentations with their symbolic source documents: algorithm and applications.” In Proceedings of the eleventh ACM international conference on Multimedia, pp. 498-507. ACM, 2003.

Erol, Berna, and Ying Li. “An overview of technologies for e-meeting and e-lecture.” In Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pp. 6-pp. IEEE, 2005.

Fan, Quanfu, Arnon Amir, Kobus Barnard, Ranjini Swaminathan, and Alon Efrat. “Temporal modeling of slide change in presentation videos.” In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 1, pp. I-989. IEEE, 2007.

Feng, Vanessa Wei. “RST-Style Discourse Parsing and Its Applications in Discourse Analysis.” PhD diss., University of Toronto, 2015.

Gardner, William G. “3D audio and acoustic environment modeling.” Wave Arts, Inc 99 (1999).

Gardner, William G. “Spatial audio reproduction: Towards individualized binaural sound.” In Frontiers of Engineering:: Reports on Leading-Edge Engineering from the 2004 NAE Symposium on Frontiers of Engineering, p. 113. National Academies Press, 2005.

Ghosh, Sucheta. “End-to-End Discourse Parsing with Cascaded Structured Prediction.” PhD diss., University of Trento, 2012.

Gigonzac, G., Francois Pitie, and A. Kokaram. “Electronic slide matching and enhancement of a lecture video.” (2007): 9-9.

Goldstein, Michael H., Heidi R. Waterfall, Arnon Lotem, Joseph Y. Halpern, Jennifer A. Schwade, Luca Onnis, and Shimon Edelman. “General cognitive principles for learning structure in time and space.” Trends in cognitive sciences 14, no. 6 (2010): 249-258.

Gross, Ralph, Michael Bett, Hua Yu, Xiaojin Zhu, Yue Pan, Jie Yang, and Alex Waibel. “Towards a multimodal meeting record.” In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on, vol. 3, pp. 1593-1596. IEEE, 2000.

Haller, Michael, Daniel Dobler, and Philipp Stampfl. “Augmenting the reality with 3D sound sources.” In ACM SIGGRAPH 2002 conference abstracts and applications, pp. 65-65. ACM, 2002.

Hendrix, Claudia, and Woodrow Barfield. “Presence in virtual environments as a function of visual and auditory cues.” In Virtual Reality Annual International Symposium, 1995. Proceedings., pp. 74-82. IEEE, 1995.

Hendrix, Claudia, and Woodrow Barfield. “The sense of presence within auditory virtual environments.” Presence: Teleoperators and Virtual Environments 5, no. 3 (1996): 290-301.

Hosseinmardi, Homa, Akshay Mysore, Nicholas Farrow, Nikolaus Correll, and Richard Han. “Distributed Spatio-Temporal Gesture Recognition in Sensor Arrays.”

Israel, Quinsulon L. “Semantic Analysis for Improved Multi-document Summarization of Text.” PhD diss., Drexel University, 2014.

Ivanov, Alexei V., Giuseppe Riccardi, Sucheta Ghosh, Sara Tonelli, and Evgeny A. Stepanov. “Acoustic correlates of meaning structure in conversational speech.” In INTERSPEECH, pp. 1129-1132. 2010.

Kao, J. L., S. Y. Chen, and D. J. Duh. “Detecting Handwritten Annotation by Synchronization of Lecture Slides and Videos.” (2013).

Kennedy, Kevin, and Robert E. Mercer. “Planning animation cinematography and shot structure to communicate theme and mood.” In Proceedings of the 2nd international symposium on Smart graphics, pp. 1-8. ACM, 2002.

Lino, Christophe, Mathieu Chollet, Marc Christie, and Remi Ronfard. “Computational model of film editing for interactive storytelling.” In Interactive Storytelling, pp. 305-308. Springer Berlin Heidelberg, 2011.

Lino, Christophe, Marc Christie, Roberto Ranon, and William Bares. “The director’s lens: an intelligent assistant for virtual cinematography.” In Proceedings of the 19th ACM international conference on Multimedia, pp. 323-332. ACM, 2011.

Ma, Yu-Fei, Lie Lu, Hong-Jiang Zhang, and Mingjing Li. “A user attention model for video summarization.” In Proceedings of the tenth ACM international conference on Multimedia, pp. 533-542. ACM, 2002.

Matsuyama, Takashi, Xiaojun Wu, Takeshi Takai, and Shohei Nobuhara. “Real-time 3D shape reconstruction, dynamic 3D mesh deformation, and high fidelity visualization for 3D video.” Computer Vision and Image Understanding 96, no. 3 (2004): 393-434.

McCowan, Iain, Samy Bengio, Daniel Gatica-Perez, Guillaume Lathoud, Florent Monay, Darren Moore, Pierre Wellner, and Hervé Bourlard. “Modeling human interaction in meetings.” In Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International Conference on, vol. 4, pp. IV-748. IEEE, 2003.

Merabti, Billal, Marc Christie, and Kadi Bouatouch. “A Virtual Director Inspired by Real Directors.” In Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence. 2014.

Meyer, Meredith, Philip DeCamp, Bridgette Hard, Dare Baldwin, and Deb Roy. “Assessing behavioral and computational approaches to naturalistic action segmentation.” In Proc. of the 33nd Annual Conference of the Cognitive Science Society. 2010.

Minnen, David, Irfan Essa, and Thad Starner. “Expectation grammars: Leveraging high-level expectations for activity recognition.” In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 2, pp. II-626. IEEE, 2003.

Mitra, Sushmita, and Tinku Acharya. “Gesture recognition: A survey.” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 37, no. 3 (2007): 311-324.

Moezzi, Saied, Li-Cheng Tai, and Philippe Gerard. “Virtual view generation for 3d digital video.” IEEE multimedia 4, no. 1 (1997): 18-26.

Murray, Gabriel, and Giuseppe Carenini. “Summarizing spoken and written conversations.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 773-782. Association for Computational Linguistics, 2008.

Nevatia, Ram, Tao Zhao, and Somboon Hongeng. “Hierarchical Language-based Representation of Events in Video Streams.” In Computer Vision and Pattern Recognition Workshop, 2003. CVPRW’03. Conference on, vol. 4, pp. 39-39. IEEE, 2003.

Ngo, Chong-Wah, Ting-Chuen Pong, and Thomas S. Huang. “Detection of slide transition for topic indexing.” In Multimedia and Expo, 2002. ICME’02. Proceedings. 2002 IEEE International Conference on, vol. 2, pp. 533-536. IEEE, 2002.

Nijholt, Anton, H. J. A. Akker, and Dirk Heylen. “Meetings and meeting modeling in smart surroundings.” (2004): 145-158.

Nijholt, Anton, Rieks op den Akker, and Dirk Heylen. “Meetings and meeting modeling in smart environments.” AI & SOCIETY 20, no. 2 (2006): 202-220.

Nijholt, Anton, Rieks op den Akker, and Dirk Heylen. “Meetings and meeting modeling in smart environments.” AI & SOCIETY 20, no. 2 (2006): 202-220.

Nijholt, Anton, Rutger Rienks, Job Zwiers, and Dennis Reidsma. “Online and off-line visualization of meeting information and meeting support.” The Visual Computer 22, no. 12 (2006): 965-976.

Pang, Derek, Sameer Madan, Serene Kosaraju, and Tarun Vir Singh. Automatic virtual camera view generation for lecture videos. Tech. Rep., Stanford Universit, 2010.

Poel, Mannes, Ronald Poppe, and Anton Nijholt. “Meeting behavior detection in smart environments: Nonverbal cues that help to obtain natural interaction.” In Automatic Face & Gesture Recognition, 2008. FG’08. 8th IEEE International Conference on, pp. 1-6. IEEE, 2008.

Purver, Matthew, John Dowding, John Niekrasz, Patrick Ehlen, Sharareh Noorbaloochi, and Stanley Peters. “Detecting and summarizing action items in multi-party dialogue.” In Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, pp. 18-25. 2007.

Rivera, Ernesto, and Akinori Nishihara. “Enhancing Lecture Video Viewing: A Smart Visual Timeline.” In Global Learn Asia Pacific, vol. 2010, no. 1, pp. 268-271. 2010.

Ronzhin, A. L., and V. Yu Budkov. “Multimodal Interaction with Intelligent Meeting Room Facilities from Inside and Outside.” In Smart Spaces and Next Generation Wired/Wireless Networking, pp. 77-88. Springer Berlin Heidelberg, 2009.

Ryoo, Michael S., and Jake K. Aggarwal. “Semantic representation and recognition of continued and recursive human activities.” International journal of computer vision 82, no. 1 (2009): 1-24.

Sharma, Prerna, and Naman Sharma. “Hand & Upper Body Based Hybrid Gesture Recognition.”

Smolic, Aljoscha, Karsten Mueller, Philipp Merkle, Christoph Fehn, Peter Kauff, Peter Eisert, and Thomas Wiegand. “3D video and free viewpoint video-technologies, applications and MPEG standards.” In Multimedia and Expo, 2006 IEEE International Conference on, pp. 2161-2164. IEEE, 2006.

Stiefelhagen, Rainer. “Tracking focus of attention in meetings.” In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, p. 273. IEEE Computer Society, 2002.

Subramanian, Ramanathan, Jacopo Staiano, Kyriaki Kalimeri, Nicu Sebe, and Fabio Pianesi. “Putting the pieces together: multimodal analysis of social attention in meetings.” In Proceedings of the international conference on Multimedia, pp. 659-662. ACM, 2010.

Suchan, Jakob, and Mehul Bhatt. “Toward High-Level Dynamic Camera Control.”

Sundareswaran, Venkataraman, Kenneth Wang, Steven Chen, Reinhold Behringer, Joshua McGee, Clement Tam, and Pavel Zahorik. “3D audio augmented reality: implementation and experiments.” In Proceedings of the 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality, p. 296. IEEE Computer Society, 2003.

Taghizadeh, Mohammad J., Reza Parhizkar, Philip N. Garner, Hervé Bourlard, and Afsaneh Asaei. “Ad hoc microphone array calibration: Euclidean distance matrix completion algorithm and theoretical guarantees.” Signal Processing 107 (2015): 123-140.

Tapaswi, Makarand, Martin Bauml, and Rainer Stiefelhagen. “StoryGraphs: visualizing character interactions as a timeline.” In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 827-834. IEEE, 2014.

Taskiran, Cüneyt M., Zygmunt Pizlo, Arnon Amir, Dulce Ponceleon, and Edward J. Delp. “Automated video program summarization using speech transcripts.” Multimedia, IEEE Transactions on 8, no. 4 (2006): 775-791.

Tian, Ying-li, Lisa Brown, Arun Hampapur, Sharat Pankanti, Andrew Senior, and Ruud Bolle. “Real world real-time automatic recognition of facial expressions.” In In Proceedings of IEEE workshop on. 2003.

Vadlapudi, Ravikiran, and Rahul Katragadda. “On automated evaluation of readability of summaries: capturing grammaticality, focus, structure and coherence.” In Proceedings of the NAACL HLT 2010 Student Research Workshop, pp. 7-12. Association for Computational Linguistics, 2010.

Waibel, Alex, Michael Bett, Michael Finke, and Rainer Stiefelhagen. “Meeting browser: Tracking and summarizing meetings.” In Proceedings of the DARPA broadcast news workshop, pp. 281-286. 1998.

Wang, Feng, Chong-Wah Ngo, and Ting-Chuen Pong. “Gesture tracking and recognition for lecture video editing.” In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 3, pp. 934-937. IEEE, 2004.

Wang, Feng, and Bernard Merialdo. “Multi-document video summarization.” In Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on, pp. 1326-1329. IEEE, 2009.

Wang, Feng, Chong-wah Ngo, and Ting-chuen Pong. “Lecture Video Enhancement and Editing by Integrating Posture, Gesture and Text.” IEEE Transactions on Multimedia

Wang, Lu, and Claire Cardie. “Summarizing decisions in spoken meetings.” In Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages, pp. 16-24. Association for Computational Linguistics, 2011.

Winston, Brian. Technologies of seeing: photography, cinematography and television. BFI, 1996.

Yu, Zhiwen, and Yuichi Nakamura. “Smart meeting systems: A survey of state-of-the-art and open issues.” ACM Computing Surveys (CSUR) 42, no. 2 (2010): 8.

Comments are closed.